Sunday, June 28, 2009

Converting a delimited string of values into columns

I have seen a few questions asking how to transform a delimited values into columns, so I thought I would talk about it here.  In most cases it is recommended to use a string parser function to split the string; however, today I want to talk about another method.  This method takes advantage of the XML data type that was introduced in SQL Server 2005.  What is nice about the XML data type is it preserves the document order.  The document order is critical because it guarantees the string is kept in the same order when it is converted to XML.  Let’s start by creating a sample table with a few rows of data.

DECLARE @t TABLE(
ProductId INT,
ProductName VARCHAR(25),
SupplierId INT,
Descr VARCHAR(50)
)
 
INSERT INTO @t VALUES (1,'Product1',1,'A1,10in,30in,2lbs');
INSERT INTO @t VALUES (2,'Product2',2,'T6,15in,30in,');
INSERT INTO @t VALUES (3,'Product3',1,'A2,1in,,0.5lbs');

Okay now we have our sample data, let’s talk about our data.  The column “Descr” contains 4 attributes of the product.  The attributes that we are storing are Type,Length,Height, and Weight.  Anyone of these attributes can be null in the string but MUST be represented in the string.  For example, “A1,10in,30in,” has a comma at the end because the weight is unknown and a string like “A1,10in,,1lbs” has a empty space for height.  If we do not have this space, how could we ever determine which attributes are associated with a particular column?  The answer is there would be no real way of knowing with this method or any other method.

Now that we have laid all the ground work, it is time to start building our query.  We will start off by creating a CTE (common table expression) making sure to convert our delimited string into valid XML.  Below is the first part of our CTE (this is not the complete code).

;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT 
    [ProductId],
    [ProductName],
    [SupplierId],
    CONVERT(XML,'<Product><Attribute>' 
        + REPLACE([Descr],',', '</Attribute><Attribute>') 
        + '</Attribute></Product>') AS Prod_Attributes
FROM @t
)

You may be asking yourself, what does this code do?  This CTE code selects all the data from the table @t, but it also converts the delimited string into valid XML.  How does it do that?  Well let’s break down the code to figure it out.  

This is the conversion code of interest:

CONVERT(XML,'<Product><Attribute>' 
    + REPLACE([Descr],',', '</Attribute><Attribute>') 
    + '</Attribute></Product>') AS Prod_Attributes

This code takes the input string and uses the replace function to insert the XML tags, so that it has the look and feel of valid xml.  For example, the string “A1,10in,30in,5lbs”  will be transformed into

<Product>
    <Attribute>A1</Attribute>
    <Attribute>10in</Attribute>
    <Attribute>30in</Attribute>
    <Attribute>5lbs</Attribute>
</Product>

Note: It should be noted that we are using a comma delimitation.  If your delimitation is different, you will need to change the delimiter in the replace function.  The delimiter is between the column and the closing XML tag. In the code following, REPLACE([Descr],',', '</Attribute> . the delimiter is denoted in Red.

Now that the product description values are in a valid XML format, we can easily get the values by using a hardcoded singleton value, as shown below. 

;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT 
    [ProductId],
    [ProductName],
    [SupplierId],
    CONVERT(XML,'<Product><Attribute>' 
        + REPLACE([Descr],',', '</Attribute><Attribute>') 
        + '</Attribute></Product>') AS Prod_Attributes
FROM @t
)
SELECT 
    [ProductID],
    [SupplierId],
    Prod_Attributes.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
    Prod_Attributes.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
    Prod_Attributes.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
    Prod_Attributes.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM cte

The singleton of each attribute element is denoted by a hardcoded value 1-4.  If you have more columns you will need to specify more singletons. Here are the results:

image

So there you have it.  An easy to implement solution to “pivot” or transform delimited values into columns. 

**************** Update ********************

I have reworked my code to work with XML special characters.  XML special characters are quite problematic but Brad Schulz, http://bradsruminations.blogspot.com/, has come up with a great method to work around the encoding issues.  Here is the modified code:

SELECT 
    [ProductID],
    [SupplierId],
    NewXML.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
    NewXML.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
    NewXML.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
    NewXML.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM @t t1
CROSS APPLY (SELECT XMLEncoded=(SELECT Descr AS [*] FROM @t t2 WHERE t1.ProductId = t2.[ProductId] FOR XML PATH(''))) EncodeXML
CROSS APPLY (SELECT NewXML=CAST('<Product><Attribute>'+REPLACE(XMLEncoded,',','</Attribute><Attribute>')+'</Attribute></Product>' AS XML)) CastXML

11 comments:

John Huang said...

good idea, I wrote a similar stuff before :)
http://sqlx.blogspot.com/2009/06/convert-comma-separated-text-to-table.html

Amar said...

Really awesome. its helped me alot.

Anonymous said...

Thanks man, that was precisely what I needed.

Arvind Ravish said...
This comment has been removed by the author.
Raymond Amegadjin said...

hi, great it did help me out on a table which contained many column with delimited data

selinitur said...

I believe you may also know about the way of fix a pdf file

aarti dogra said...

Thanks, this is what I exactly wanted to do.

Md. Mainul Hassan said...

Thanks for the post. But what happens if the comma delimited string is of variable length? for example in your query you assumed that only 4 commas will be there. For my requirement number of comma sperated values can be variable length-may be 5,9,10 etc.
What to do then? can you help?

Adam Haines said...

Hi MD. Mainul Hassan,

In the case of a variable length string, you will need to go with a more dynamic approach as outlined int his article, http://jahaines.blogspot.com/2009/11/splitting-delimited-string-part-1.html

Amit said...

How to use the same code for column with Text Datatype.

I am getting following error:
Msg 8116, Level 16, State 1, Line 1
Argument data type text is invalid for argument 1 of replace function.

Adam Haines said...

Hi Amit,

If you are using SQL Server 2005+, you can cast the text column to nvarchar(max). You can then perform string operations against it.

Thanks,
Adam