Sunday, June 28, 2009

Converting a delimited string of values into columns

I have seen a few questions asking how to transform a delimited values into columns, so I thought I would talk about it here.  In most cases it is recommended to use a string parser function to split the string; however, today I want to talk about another method.  This method takes advantage of the XML data type that was introduced in SQL Server 2005.  What is nice about the XML data type is it preserves the document order.  The document order is critical because it guarantees the string is kept in the same order when it is converted to XML.  Let’s start by creating a sample table with a few rows of data.

DECLARE @t TABLE(
ProductId INT,
ProductName VARCHAR(25),
SupplierId INT,
Descr VARCHAR(50)
)
 
INSERT INTO @t VALUES (1,'Product1',1,'A1,10in,30in,2lbs');
INSERT INTO @t VALUES (2,'Product2',2,'T6,15in,30in,');
INSERT INTO @t VALUES (3,'Product3',1,'A2,1in,,0.5lbs');

Okay now we have our sample data, let’s talk about our data.  The column “Descr” contains 4 attributes of the product.  The attributes that we are storing are Type,Length,Height, and Weight.  Anyone of these attributes can be null in the string but MUST be represented in the string.  For example, “A1,10in,30in,” has a comma at the end because the weight is unknown and a string like “A1,10in,,1lbs” has a empty space for height.  If we do not have this space, how could we ever determine which attributes are associated with a particular column?  The answer is there would be no real way of knowing with this method or any other method.

Now that we have laid all the ground work, it is time to start building our query.  We will start off by creating a CTE (common table expression) making sure to convert our delimited string into valid XML.  Below is the first part of our CTE (this is not the complete code).

;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT 
    [ProductId],
    [ProductName],
    [SupplierId],
    CONVERT(XML,'<Product><Attribute>' 
        + REPLACE([Descr],',', '</Attribute><Attribute>') 
        + '</Attribute></Product>') AS Prod_Attributes
FROM @t
)

You may be asking yourself, what does this code do?  This CTE code selects all the data from the table @t, but it also converts the delimited string into valid XML.  How does it do that?  Well let’s break down the code to figure it out.  

This is the conversion code of interest:

CONVERT(XML,'<Product><Attribute>' 
    + REPLACE([Descr],',', '</Attribute><Attribute>') 
    + '</Attribute></Product>') AS Prod_Attributes

This code takes the input string and uses the replace function to insert the XML tags, so that it has the look and feel of valid xml.  For example, the string “A1,10in,30in,5lbs”  will be transformed into

<Product>
    <Attribute>A1</Attribute>
    <Attribute>10in</Attribute>
    <Attribute>30in</Attribute>
    <Attribute>5lbs</Attribute>
</Product>

Note: It should be noted that we are using a comma delimitation.  If your delimitation is different, you will need to change the delimiter in the replace function.  The delimiter is between the column and the closing XML tag. In the code following, REPLACE([Descr],',', '</Attribute> . the delimiter is denoted in Red.

Now that the product description values are in a valid XML format, we can easily get the values by using a hardcoded singleton value, as shown below. 

;WITH cte (ProductId, ProductName,SupplierId,Prod_Attributes)
AS
(
SELECT 
    [ProductId],
    [ProductName],
    [SupplierId],
    CONVERT(XML,'<Product><Attribute>' 
        + REPLACE([Descr],',', '</Attribute><Attribute>') 
        + '</Attribute></Product>') AS Prod_Attributes
FROM @t
)
SELECT 
    [ProductID],
    [SupplierId],
    Prod_Attributes.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
    Prod_Attributes.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
    Prod_Attributes.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
    Prod_Attributes.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM cte

The singleton of each attribute element is denoted by a hardcoded value 1-4.  If you have more columns you will need to specify more singletons. Here are the results:

image

So there you have it.  An easy to implement solution to “pivot” or transform delimited values into columns. 

**************** Update ********************

I have reworked my code to work with XML special characters.  XML special characters are quite problematic but Brad Schulz, http://bradsruminations.blogspot.com/, has come up with a great method to work around the encoding issues.  Here is the modified code:

SELECT 
    [ProductID],
    [SupplierId],
    NewXML.value('/Product[1]/Attribute[1]','varchar(25)') AS [Type],
    NewXML.value('/Product[1]/Attribute[2]','varchar(25)') AS [Length],
    NewXML.value('/Product[1]/Attribute[3]','varchar(25)') AS [Height],
    NewXML.value('/Product[1]/Attribute[4]','varchar(25)') AS [Weight]
FROM @t t1
CROSS APPLY (SELECT XMLEncoded=(SELECT Descr AS [*] FROM @t t2 WHERE t1.ProductId = t2.[ProductId] FOR XML PATH(''))) EncodeXML
CROSS APPLY (SELECT NewXML=CAST('<Product><Attribute>'+REPLACE(XMLEncoded,',','</Attribute><Attribute>')+'</Attribute></Product>' AS XML)) CastXML

12 comments:

John Huang said...

good idea, I wrote a similar stuff before :)
http://sqlx.blogspot.com/2009/06/convert-comma-separated-text-to-table.html

Amar said...

Really awesome. its helped me alot.

Anonymous said...

Thanks man, that was precisely what I needed.

Arvind Ravish said...
This comment has been removed by the author.
Raymond Amegadjin said...

hi, great it did help me out on a table which contained many column with delimited data

selinitur said...

I believe you may also know about the way of fix a pdf file

aarti dogra said...

Thanks, this is what I exactly wanted to do.

Md. Mainul Hassan said...

Thanks for the post. But what happens if the comma delimited string is of variable length? for example in your query you assumed that only 4 commas will be there. For my requirement number of comma sperated values can be variable length-may be 5,9,10 etc.
What to do then? can you help?

Adam Haines said...

Hi MD. Mainul Hassan,

In the case of a variable length string, you will need to go with a more dynamic approach as outlined int his article, http://jahaines.blogspot.com/2009/11/splitting-delimited-string-part-1.html

Amit said...

How to use the same code for column with Text Datatype.

I am getting following error:
Msg 8116, Level 16, State 1, Line 1
Argument data type text is invalid for argument 1 of replace function.

Adam Haines said...

Hi Amit,

If you are using SQL Server 2005+, you can cast the text column to nvarchar(max). You can then perform string operations against it.

Thanks,
Adam

Sherry Samples said...

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):

ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT

ROW_NUMBER() OVER (ORDER BY CableID)-1 as ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]','varchar(8000)'),1,13) AS Routing
FROM
(
SELECT CableID,CAST('' + REPLACE([RouteNodeList],',','') + '' AS XML) AS x
FROM [dbo].[Cables]
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

ORDER BY ID

Now I need to Concatenate data from the "Routing" column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:

Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],

I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.

PLEASE HELP!!!

Thanks,
Sherry S.