Monday 28 November 2011

PIVOT and UNPIVOT T-SQL Operator : Part 2 (UNPIVOT)


In my previous post I explained about Pivoting using PIVOT operator or Standard SQL. In this post I will explain Unpivoting in SQL. I suggest you to read my previous post on Pivoting (http://sqlenthusiast.blogspot.com/2011/11/pivot-and-unpivot-t-sql-operator-part-1.html). This will help you to understand unpivoting better.


Unpivoting Data :
Unpivoting is the technique to rotate from state of column to state of rows. It involves querying the pivoted state of data, producing out of each input to multiple output rows. It is very difficult to understand Unpivoting by definition, here I will explain unpivoting with example.

For demonstration puppose please run the following code in Listing 1.1. Hear I am creating projects table and filling it with sample data. Then I create 'MgrClientProject' table. Which is filled with rows of resultset that we got by Pivoting data of Projects table for manager and clients. 
 

Listing1.1
USE tempdb;

IF OBJECT_ID('dbo.Projects', 'U') IS NOT NULL DROP TABLE dbo.Projects;

CREATE TABLE dbo.Projects
(
projectid INT NOT NULL,
admissiondate DATE NOT NULL,
mgrid INT NOT NULL,
clientid VARCHAR(5) NOT NULL,
daysCount INT NOT NULL,
CONSTRAINT PK_Projects PRIMARY KEY(projectid)
);

INSERT INTO dbo.Projects(projectid, admissiondate, mgrid, clientid, daysCount)
VALUES
(30001, '20090802', 3, 'A', 10),
(10001, '20091224', 2, 'A', 12),
(10005, '20091224', 1, 'B', 20),
(40001, '20100109', 2, 'A', 40),
(10006, '20100118', 1, 'C', 14),
(20001, '20100212', 2, 'B', 12),
(40005, '20110212', 3, 'A', 10),
(20002, '20110216', 1, 'C', 20),
(30003, '20110418', 2, 'B', 15),
(30004, '20090418', 3, 'C', 22),
(30007, '20110907', 3, 'D', 30);

Select mgrid, [A], [B], [C], [D]
INTO MgrClientProject
from (Select mgrid, clientid, daysCount from dbo.Projects) as D
PIVOT (SUM(daysCount) FOR clientid in (A,B,C,D)) as P;

Select * from MgrClientProject


Here is the output of the query to select rows of MgrClientProject table.
mgrid A B C D
1 NULL 20 34 NULL
2 52 27 NULL NULL
3 20 NULL 22 30
Fig1.1


The table has rows for each manager and column for each client and in the intersection of mgrid and clientid we have daysCount values for the projects. For each irrelevent combination of manager and client we have null in the intersection. We are requested to give unpivoted data from this table. We need to generate output having rows for each mgrid and client along with their dayscount value of their project. The desired output should be this.

mgrid clientid daysCount
1 B 20
1 C 34
2 A 52
2 B 27
3 A 20
3 C 22
3 D 30
Fig1.2


Unpivoting involves implementing three logical processing phases : producing copies, extracting elements and eliminating irrelevent intersections.

We can do Unpivoting in two ways.
a) By standard SQL
b) By T-SQL UNPIVOT operator


a) By standard SQL
Unpivoting with Standard Sql implements all three logical phases in very explicit manner.

The first step involves producing multiple copies of input rows. We need to create copies of input rows for each column in the source table. In our our example we have four colums : A, B, C, D. Each colum represents clientid values. We need to do cross join between MgrClientProject table with the table having rows for each client.

To get table having rows for each client we will use table value constructor in the form of a VALUES clause to create a virtual table with a row for each client. Table value constructor is new feature in SQL Server 2008. Following is the query to do cross Join of MgrClientProject table with the table values constructor having each client value.

Select * from dbo.MgrClientProject
CROSS JOIN (Values('A'),('B'),('C'),('D')) AS P(clientid);

The output of this query is.
mgrid A B C D clientid
1 NULL 20 34 NULL A
1 NULL 20 34 NULL B
1 NULL 20 34 NULL C
1 NULL 20 34 NULL D
2 52 27 NULL NULL A
2 52 27 NULL NULL B
2 52 27 NULL NULL C
2 52 27 NULL NULL D
3 20 NULL 22 30 A
3 20 NULL 22 30 B
3 20 NULL 22 30 C
3 20 NULL 22 30 D
Fig1.3


As you can see that four copies of each input rows is created for each Client : A, B, C, D

Next Step is to create new column, let say daysCount, for each mgrid and and custid (newly created colum) combination. The daysCount value will be based on custid value. For example when custid values is 'A' then daysCount value will be the value of colum A, when custid values is 'B' then daysCount value will be the value of colum B. We will use case expression for each custid value to get daysCount value. Here is the query.

Select mgrid, clientid,
CASE WHEN clientid='A' then A
WHEN clientid='B' then B
WHEN clientid='C' then C
WHEN clientid='D' then D END AS daysCount
from dbo.MgrClientProject
CROSS JOIN (Values('A'),('B'),('C'),('D')) AS P(clientid)

The output of this query is.
mgrid clientid daysCount
1 A NULL
1 B 20
1 C 34
1 D NULL
2 A 52
2 B 27
2 C NULL
2 D NULL
3 A 20
3 B NULL
3 C 22
3 D 30
Fig1.4


The third step is to eliminate irrelevent rows where we have NULL value in daysCount column.
Following is final query is to produce Unpivoted result that I have shown in Fig 1.2.

Select * from (Select mgrid, clientid,
CASE WHEN clientid='A' then A
WHEN clientid='B' then B
WHEN clientid='C' then C
WHEN clientid='D' then D END AS daysCount
from dbo.MgrClientProject
CROSS JOIN (Values('A'),('B'),('C'),('D')) AS P(clientid)) AS D
WHERE daysCount is not NULL


b) By T-SQL UNPIVOT operator
Unpivoting data means producing two colums from all colums of input table other then group by colum. In our example input table is MgrClientProject and two output colums created will be custid having values(A, B, C, D) and daysCount.

The standard format of T-SQL UNPIVOT operator is.
SELECT ...
FROM <source_table_or_table_expression>
UNPIVOT(<target_col_to_hold_source_col_values>
FOR <target_col_to_hold_source_col_names> IN(<list_of_source_columns>)) AS
<result_table_alias>

Like PIVOT operator UNPIVOT operator also works as a table operator. In the UNPIVOT operator we need to specify result column name (daysCount in our case) that will have input table spreading column values(A, B, C, D column values in our case). We will specify result coulmn name that will have input table colum names(A, B, C, D column name in our case) and in parenthesis we need to specify input colum names(A, B, C, D column name in our case). We will close the parenthesis and give alias name to the table resulting from UNPIVOT table operator. Following is the Final query to Unpivot MgrClientProject table using T-SQL UNPIVOT operator. This query will result Output exactly like Fig1.2.

Select mgrid, clientid, daysCount
from dbo.MgrClientProject
UNPIVOT(daysCount FOR clientid in (A,B,C,D)) as P;

Note that the UNPIVOT operator implements the same logical processing phases described earlier -- generating copies, extracting elements, and eliminating NULL intersections. The last phase is not an optional phase as in the solution based on standard SQL.

Unpivoting does not necessarilly bring back original table before pivoting. Because Pivoting can cause data loss which can't be bring back by pivoting the data.


Post comment and queries on this topic and share your thoughts with me.
Hope you will like this post.

No comments:

Post a Comment