Pivoting data means rotating data from the state of rows to state of column and possibly, aggregate data along the way.
To explain the definition of pivoting data i will use an example. For demonstration purpose please run the below code in new query window of SSMS (SQL Server Management Studio). In this example I use a Sample Projects table that you create in tempdb database and populate it with sample data by running the code below.
IF OBJECT_ID('dbo.Projects', 'U') IS NOT NULL DROP TABLE dbo.Projects;
CREATE TABLE dbo.Projects
projectid INT NOT NULL,
admissiondate DATE NOT NULL,
mgrid INT NOT NULL,
clientid VARCHAR(5) NOT NULL,
daysCount INT NOT NULL,
CONSTRAINT PK_Projects PRIMARY KEY(projectid)
INSERT INTO dbo.Projects(projectid, admissiondate, mgrid, clientid, daysCount)
(30001, '20090802', 3, 'A', 10),
(10001, '20091224', 2, 'A', 12),
(10005, '20091224', 1, 'B', 20),
(40001, '20100109', 2, 'A', 40),
(10006, '20100118', 1, 'C', 14),
(20001, '20100212', 2, 'B', 12),
(40005, '20110212', 3, 'A', 10),
(20002, '20110216', 1, 'C', 20),
(30003, '20110418', 2, 'B', 15),
(30004, '20090418', 3, 'C', 22),
(30007, '20110907', 3, 'D', 30);
SELECT * FROM dbo.Projects;
Before explaing pivoting, consider the query to get total daysCount for each mgrid and clientid combination. The solution query for this request is simple...
Select mgrid, clientid, SUM(daysCount) as total_daysCount
from dbo.Projects group by mgrid, clientid
the output is of this query is :
Now Suppose that the requirement is to generate output in this format :
Fig : 1.2
Here we have rows for each mgrid value and column for each clientd value and in the intersaction of each combination we have toal_daysCount value. This view of data (Fig1.2) is called pivoted view of data of Projects table and the technique to generate output in this format is called Pivoting.
We can pivot data in two ways.
a) By using Standard Sql Commands.
b) By using T-SQL PIVOT operator.
Here I will explain both solution.
a) By using Standard Sql Commands
Pivoting using Standard Sql follows all three pivoting phases.
We use mgrid as our grouping element.
The spreading phase is achieved in the select clause by spreading all spreading colum value. Here spreading colum value is clientid. For pivoting you must know all possible spreading column values before writing pivoting query because you need to specify those values for each column in output query. In our example spreading cloumn(clientid) has for values which are A, B, C and D. We need to specify seperate CASE expression in Sqlect clause for each spreading column value. For example, here's the CASE expression for client A:
CASE WHEN clientid='A' THEN daysCount
This expression returns the daysCount from the current row only when the current row represents an project for client A. Otherwise the expression returns null.
Finally, the aggregation phase is achieved by applying the relevant aggregate function (SUM in our case) to the result of each CASE expression. For example, here's the expression that produces the result column for client A:
SUM(CASE WHEN clientid = 'A' THEN daysCount END) AS A
Here's the complete solution query pivoting project data, returning the total days on projects for each manager (on rows) and client (on cols):
SUM(CASE WHEN clientid = 'A' THEN daysCount END) AS A,
SUM(CASE WHEN clientid = 'B' THEN daysCount END) AS B,
SUM(CASE WHEN clientid = 'C' THEN daysCount END) AS C,
SUM(CASE WHEN clientid = 'D' THEN daysCount END) AS D
GROUP BY mgrid;
This query retun pivoted days show in the Fig 1.2.
b) By using T-SQL PIVOT operator
SQL Server 2005 introduced a T-SQL–specific table operator called PIVOT. The PIVOT operator operates in the context of the FROM clause of a query. It operates on some source table or table expression, pivots the data, and returns a result table. The PIVOT operator involves the same logical processing phases as described earlier (grouping, spreading, and aggregating) with the same pivoting elements, but uses different, native syntax.
The general form of a query with the PIVOT operator is:
IN (<list_of_target_columns>)) AS <result_table_alias>;
In the parentheses of the PIVOT operator you specify the aggregate function (SUM in our example), aggregation element (daysCount), spreading element (clientid), and the list of target column names (A, B, C, D). Following the parentheses of the PIVOT operator you specify an alias for the result table.
It is important to note that with the PIVOT operator, you do not explicitly specify the grouping elements, removing the need for a GROUP BY in the query. The PIVOT operator figures out the grouping elements implicitly as all attributes from the source table (or table expression) that were not specified as either the spreading element or the aggregation element. You need to ensure that the source table for the PIVOT operator has no attributes besides the grouping, spreading, and aggregation elements, so that after specifying the spreading and aggregation elements, the only attributes left are those you intend as grouping elements. You achieve this is by not applying the PIVOT operator to the original table directly (Projects in our case), but instead to a table expression that includes only the attributes representing the pivoting elements and no others. For example, here's the solution query to our original pivoting request, using the native PIVOT operator:
SELECT mgrid, A, B, C, D
FROM (SELECT mgrid, clientid, daysCount
FROM dbo.Projects) AS D
PIVOT(SUM(daysCount) FOR clientid IN(A, B, C, D)) AS P;
As another example for a pivoting request, suppose that instead of returning managers on rows and clients on columns, you want it the other way around: the grouping element is clientid, the spreading element is mgrid, and the aggregation element and aggregate function remain SUM(daysCount). After you learn the "template" for a pivoting solution (standard or native), it's just a matter of fitting those elements in the right places. The following solution query uses the native PIVOT operator:
SELECT clientid, , , 
FROM (SELECT mgrid, clientid, daysCount
FROM dbo.Projects) AS D
PIVOT(SUM(daysCount) FOR mgrid IN(, , )) AS P;
The client IDs 1, 2, and 3 are values in the clientid column in the source table, but in terms of the result, these values become target column names. Therefore, in the PIVOT IN clause you must refer to them as identifiers. When identifiers are irregular (for example, when they start with a digit), you need to delimit them—hence the use of square brackets.
This query returns the following output:
Hope u like this explaination of PIVOT operator. If u have any confusion or want to share ur thoughts on this topic please write or ask me in the comment section of this post.
In my next post I will explain about UNPIVOT operator with example.