Monday, 28 November 2011

PIVOT and UNPIVOT T-SQL Operator : Part 1 (PIVOT)

Pivoting Data
 
Pivoting data means rotating data from the state of rows to state of column and possibly, aggregate data along the way.


To explain the definition of pivoting data i will use an example. For demonstration purpose please run the below code in new query window of SSMS (SQL Server Management Studio). In this example I use a Sample Projects table that you create in tempdb database and populate it with sample data by running the code below.


Listing 1.1
USE tempdb;

IF OBJECT_ID('dbo.Projects', 'U') IS NOT NULL DROP TABLE dbo.Projects;

CREATE TABLE dbo.Projects
(
projectid INT NOT NULL,
admissiondate DATE NOT NULL,
mgrid INT NOT NULL,
clientid VARCHAR(5) NOT NULL,
daysCount INT NOT NULL,
CONSTRAINT PK_Projects PRIMARY KEY(projectid)
);

INSERT INTO dbo.Projects(projectid, admissiondate, mgrid, clientid, daysCount)
VALUES
(30001, '20090802', 3, 'A', 10),
(10001, '20091224', 2, 'A', 12),
(10005, '20091224', 1, 'B', 20),
(40001, '20100109', 2, 'A', 40),
(10006, '20100118', 1, 'C', 14),
(20001, '20100212', 2, 'B', 12),
(40005, '20110212', 3, 'A', 10),
(20002, '20110216', 1, 'C', 20),
(30003, '20110418', 2, 'B', 15),
(30004, '20090418', 3, 'C', 22),
(30007, '20110907', 3, 'D', 30);

SELECT * FROM dbo.Projects;


Before explaing pivoting, consider the query to get total daysCount for each mgrid and clientid combination. The solution query for this request is simple...

Select mgrid, clientid, SUM(daysCount) as total_daysCount
from dbo.Projects group by mgrid, clientid

the output is of this query is :
mgrid clientid total_daysCount
2 A 52
3 A 20
1 B 20
2 B 27
1 C 34
3 C 22
3 D 30
Fig 1.1


Now Suppose that the requirement is to generate output in this format :
mgrid A B C D
1 NULL 20 34 NULL
2 52 27 NULL NULL
3 20 NULL 22 30
Fig : 1.2


Here we have rows for each mgrid value and column for each clientd value and in the intersaction of each combination we have toal_daysCount value. This view of data (Fig1.2) is called pivoted view of data of Projects table and the technique to generate output in this format is called Pivoting.


Every pivoting request involves three logical processing phases, each with associated elements: a grouping phase with an associated grouping or on rows element, a spreading phase with an associated spreading or on cols element, and an aggregation phase with an associated aggregation element and aggregate function.
In our example, you need to produce a single row in the result for each unique manager ID. This means that the rows from the Projects table need to be grouped by the mgrid attribute, and therefore the grouping element in our case is the mgrid attribute.
The Projects table has a single column holding all client ID values, and a single column holding their daysCount values. The pivoting process is supposed to produce a different result column for each unique client ID, and each column contains the aggregated quantities for that client. You can think of this process as "spreading" quantities by client IDs. The spreading element in our case is the clientid attribute.
Finally, because pivoting involves grouping, you will need to aggregate data to produce the result values in the "intersection" of the grouping and spreading elements. You will need to identify the aggregate function (SUM in our case), and the aggregation element (daysCount attribute in our case).


We can pivot data in two ways.
a) By using Standard Sql Commands.
b) By using T-SQL PIVOT operator.

Here I will explain both solution.

a) By using Standard Sql Commands

Pivoting using Standard Sql follows all three pivoting phases.

We use mgrid as our grouping element.

The spreading phase is achieved in the select clause by spreading all spreading colum value. Here spreading colum value is clientid. For pivoting you must know all possible spreading column values before writing pivoting query because you need to specify those values for each column in output query. In our example spreading cloumn(clientid) has for values which are A, B, C and D. We need to specify seperate CASE expression in Sqlect clause for each spreading column value. For example, here's the CASE expression for client A:

CASE WHEN clientid='A' THEN daysCount

This expression returns the daysCount from the current row only when the current row represents an project for client A. Otherwise the expression returns null.

Finally, the aggregation phase is achieved by applying the relevant aggregate function (SUM in our case) to the result of each CASE expression. For example, here's the expression that produces the result column for client A:

SUM(CASE WHEN clientid = 'A' THEN daysCount END) AS A

Here's the complete solution query pivoting project data, returning the total days on projects for each manager (on rows) and client (on cols):

SELECT mgrid,
SUM(CASE WHEN clientid = 'A' THEN daysCount END) AS A,
SUM(CASE WHEN clientid = 'B' THEN daysCount END) AS B,
SUM(CASE WHEN clientid = 'C' THEN daysCount END) AS C,
SUM(CASE WHEN clientid = 'D' THEN daysCount END) AS D
FROM dbo.Projects
GROUP BY mgrid;


This query retun pivoted days show in the Fig 1.2.


b) By using T-SQL PIVOT operator

SQL Server 2005 introduced a T-SQL–specific table operator called PIVOT. The PIVOT operator operates in the context of the FROM clause of a query. It operates on some source table or table expression, pivots the data, and returns a result table. The PIVOT operator involves the same logical processing phases as described earlier (grouping, spreading, and aggregating) with the same pivoting elements, but uses different, native syntax.

The general form of a query with the PIVOT operator is:

SELECT ...
FROM <source_table_or_table_expression>
PIVOT(<agg_func>(<aggregation_element>)
FOR <spreading_element>
IN (<list_of_target_columns>)) AS <result_table_alias>;

In the parentheses of the PIVOT operator you specify the aggregate function (SUM in our example), aggregation element (daysCount), spreading element (clientid), and the list of target column names (A, B, C, D). Following the parentheses of the PIVOT operator you specify an alias for the result table.

It is important to note that with the PIVOT operator, you do not explicitly specify the grouping elements, removing the need for a GROUP BY in the query. The PIVOT operator figures out the grouping elements implicitly as all attributes from the source table (or table expression) that were not specified as either the spreading element or the aggregation element. You need to ensure that the source table for the PIVOT operator has no attributes besides the grouping, spreading, and aggregation elements, so that after specifying the spreading and aggregation elements, the only attributes left are those you intend as grouping elements. You achieve this is by not applying the PIVOT operator to the original table directly (Projects in our case), but instead to a table expression that includes only the attributes representing the pivoting elements and no others. For example, here's the solution query to our original pivoting request, using the native PIVOT operator:

SELECT mgrid, A, B, C, D
FROM (SELECT mgrid, clientid, daysCount
FROM dbo.Projects) AS D
PIVOT(SUM(daysCount) FOR clientid IN(A, B, C, D)) AS P;

As another example for a pivoting request, suppose that instead of returning managers on rows and clients on columns, you want it the other way around: the grouping element is clientid, the spreading element is mgrid, and the aggregation element and aggregate function remain SUM(daysCount). After you learn the "template" for a pivoting solution (standard or native), it's just a matter of fitting those elements in the right places. The following solution query uses the native PIVOT operator:

SELECT clientid, [1], [2], [3]
FROM (SELECT mgrid, clientid, daysCount
FROM dbo.Projects) AS D
PIVOT(SUM(daysCount) FOR mgrid IN([1], [2], [3])) AS P;


The client IDs 1, 2, and 3 are values in the clientid column in the source table, but in terms of the result, these values become target column names. Therefore, in the PIVOT IN clause you must refer to them as identifiers. When identifiers are irregular (for example, when they start with a digit), you need to delimit them—hence the use of square brackets.

This query returns the following output:
clientid 1 2 3
A NULL 52 20
B 20 27 NULL
C 34 NULL 22
D NULL NULL 30
Fig. 1.3




Hope u like this explaination of PIVOT operator. If u have any confusion or want to share ur thoughts on this topic please write or ask me in the comment section of this post.

In my next post I will explain about UNPIVOT operator with example.

No comments:

Post a Comment