Sunday, 22 January 2012

XML and XQUERY Language: Part 2 (FOR XML EXPLICIT and FOR XML PATH)


In my last post in covered two options (FOR XML RAW and FOR XML AUTO) to convert relational data into XML representation. Click on this link to read that post (http://sqlenthusiast.blogspot.com/2012/01/xml-and-xquery-language-part-1for-xml.html). In today's post I will explain other two vary important options for the same task.

Here are The Options that I will cover today for converting relational data into xml.
  1. FOR XML EXPLICIT
  2. FOR XML PATH
With EXPLICIT and PATH options you can define Format of XML returned manually. With these two options you have total control on the XML returned. The EXPLICIT mode is included for backward compatibility only; it uses proprietary T-SQL syntax for formatting XML. The PATH mode uses standard XML Xpath expressions to define the elements and attributes of the XML you are creating.

In case you have not run the script given in my previous post, to create SAMPLE database and create Doctors and Patients tables in it, then I am give that script here again, so that you can run it to get the database required for demonstration purpose. Here is the script:

Listing 1.1
USE master;
IF DB_ID('SAMPLE') IS NOT NULL
DROP DATABASE SAMPLE;

CREATE DATABASE SAMPLE;
USE SAMPLE;
CREATE TABLE Doctors
(
doctorid int IDENTITY(1,1) NOT NULL,
doctorname VARCHAR(50) NULL
CONSTRAINT PK_Customers PRIMARY KEY(doctorid)
)
GO
CREATE TABLE Patients
(
patientid int IDENTITY(1,1) NOT NULL,
doctorid int NOT NULL,
admissionDate Datetime NULL
CONSTRAINT PK_Patients PRIMARY KEY(patientid)
CONSTRAINT FK_Patients_Customers FOREIGN KEY(doctorid) REFERENCES dbo.Doctors(doctorid)
)
GO
Insert Into Doctors(doctorname)
Values('Ashish Jain'),('Gaurav Pandit')
GO
Insert Into Patients(doctorid, admissionDate)
Values(1, '20120102'),(1, '20120108'),(2, '20120104'),(2, '20120110'),(1, '20120115')
GO

Select * from dbo.Doctors
Select * from dbo.Patients


  1. FOR XML EXPLICIT
    SQL Server needs specific format of the rowset returned, to process the XML returned in EXPLICIT mode. The first two colum must me Tag an Parent integers, which sets the hierarchy. Tag defines the level of nesting while Parent defines the parent level. You create multiple levels with multiple SELECTs using the UNION ALL operator. The first select defines first level of nesting; second defines second level of nesting, and so on. Every SELECT must include all columns, of course; however, the columns that are not applicable for a specific level must have NULLs instead of a value. You define element and attribute names with column aliases of the SELECT clause in the ElementName!TagNumber!AttributeName! Directive format. If the Directive part is omitted, you are creating attribute-centric XML. You can use the element directive to produce element-centric XML. The following query is an example of producing XML with the EXPLICIT option:
SELECT 1 AS Tag,
NULL AS Parent,
Doctor.doctorid AS [Doctor!1!doctorid],
Doctor.doctorname AS [Doctor!1!doctorname],
NULL AS [Patient!2!Patientid],
NULL AS [Patient!2!admissionDate]
FROM dbo.Doctors AS Doctor
UNION ALL
SELECT 2 AS Tag,
1 AS Parent,
Doctor.doctorid AS [Doctor!1!doctorid],
NULL AS [Doctor!1!doctorname],
Patient.Patientid AS [Patient!2!Patientid],
Patient.admissionDate AS [Patient!2!admissionDate]
FROM dbo.Doctors AS Doctor
JOIN dbo.Patients AS Patient
ON Doctor.doctorid = Patient.doctorid
Order BY [Doctor!1!doctorid],
[Patient!2!Patientid]
FOR XML EXPLICIT

The result is:

<Doctor doctorid="1" doctorname="Ashish Jain">
<Patient Patientid="1" admissionDate="2012-01-02T00:00:00" />
<Patient Patientid="2" admissionDate="2012-01-08T00:00:00" />
<Patient Patientid="5" admissionDate="2012-01-15T00:00:00" />
</Doctor>
<Doctor doctorid="2" doctorname="Gaurav Pandit">
<Patient Patientid="3" admissionDate="2012-01-04T00:00:00" />
<Patient Patientid="4" admissionDate="2012-01-10T00:00:00" />
</Doctor>

  1. FOR XML PATH
In PATH mode, column names and aliases serve as XPath expressions. XPath expressions define the path to the element in the XML generated. Path is expressed in a hierarchical way; levels are delimited with the slash (/) character. By default, every column becomes an element; if you want to generate attribute-centric XML, prefix the alias name with the at (@) character. Here is an example of a simple XPATH query:

SELECT Doctor.doctorid AS [@doctorid],
Doctor.doctorname AS [Doctor/doctorname]
FROM dbo.Doctors AS Doctor
ORDER BY Doctor.doctorid
FOR XML PATH;

The query produces the following xml:

<row doctorid="1">
<Doctor>
<doctorname>Ashish Jain</doctorname>
</Doctor>
</row>
<row doctorid="2">
<Doctor>
<doctorname>Gaurav Pandit</doctorname>
</Doctor>
</row>

As you can see, the PATH mode produced an element for each row of the resulting rowset; in other words, for each doctor, the name of this element is row. If you want to have a different name, use the syntax FOR XML PATH(‘name’).

In EXPLICIT option, If you want to create XML with nested elements for child tables, you have to use subqueries in the SELECT part of the query in the PATH mode. Subqueries have to return a scalar value in SELECT clause. However, you know that a parent row can have multiple child rows; a doctor can have multiple patients. You return a scalar value by returning XML from the subquery. Then the result is returned as a single scalar XML value. You format nested XML from the subquery with the FOR XML clause, as you format XML in outer query. Additionally, you have to use the TYPE directive of the FOR XML clause to produce a value of XML data type, and not XML as text, which cannot be consumed by the outer query. The following query shows this technique for creating the same XML that I created in the FOR XML EXPLICIT part of this section. Note the subquery with the FOR XML PATH clause and the TYPE directive.

SELECT Doctor.doctorid AS [@doctorid],
Doctor.doctorname AS [@doctorname],
(SELECT Patient.patientid AS [@patientid],
Patient.admissionDate AS [@admissionDate]
FROM dbo.Patients AS Patient
WHERE Doctor.doctorid = Patient.doctorid
ORDER BY Patient.patientid
FOR XML PATH('Patient'), TYPE)
FROM dbo.Doctors AS Doctor
ORDER BY Doctor.doctorid
FOR XML PATH('Doctor');

The result of this query is, the same as the result of the query that used EXPLICIT mode to format the XML returned:

<Doctor doctorid="1" doctorname="Ashish Jain">
<Patient patientid="1" admissionDate="2012-01-02T00:00:00" />
<Patient patientid="2" admissionDate="2012-01-08T00:00:00" />
<Patient patientid="5" admissionDate="2012-01-15T00:00:00" />
</Doctor>
<Doctor doctorid="2" doctorname="Gaurav Pandit">
<Patient patientid="3" admissionDate="2012-01-04T00:00:00" />
<Patient patientid="4" admissionDate="2012-01-10T00:00:00" />
</Doctor>

Please share your knowledge and thoughts on this topic and write, if you have any questions or suggessions, in the comment section. In my next post I will cover XML Shredding(converting from XML to relational representation).


Saturday, 21 January 2012

XML and XQUERY Language: Part 1(FOR XML RAW and FOR XML AUTO)

Hello Folks!

Today I will write about XML support in SQL Server. I will also conclude in the last about How important XML support in SQL and when should we, when shouldn't we use XML support.

SQL Server 2008 extensively supports XML. This support includes converting resultset of query into XML and converting XML into resultset (Shredding). SQL Server 2008 also has native XML data type. You can store XML data into XML type variable or column and also you can modify or query XML data in variable, using XML data type methods. These XML data type methods take XQUERY expression as one of their parameter, to query and modify XML data. XQUERY is a standard language used to query and modify XML data.

Create Database Environment for demonstration purpose – Run the below script to create a database and tables which I will use to give example on this topic. This script will create database SAMPLE and two tables Doctors and Patients in this database and also fill the data in these tables.

Listing 1.1
USE master;        
IF DB_ID('SAMPLE') IS NOT NULL
DROP DATABASE SAMPLE;

CREATE DATABASE SAMPLE;
               
USE SAMPLE;
CREATE TABLE Doctors
(
                doctorid int IDENTITY(1,1) NOT NULL,
                doctorname VARCHAR(50) NULL
                CONSTRAINT PK_Customers PRIMARY KEY(doctorid)
)              
GO
CREATE TABLE Patients
(
                patientid int IDENTITY(1,1) NOT NULL,
                doctorid int NOT NULL,
                admissionDate Datetime NULL
                CONSTRAINT PK_Patients PRIMARY KEY(patientid)
                CONSTRAINT FK_Patients_Customers FOREIGN KEY(doctorid) REFERENCES dbo.Doctors(doctorid)
)
GO
Insert Into Doctors(doctorname)
Values('Ashish Jain'),('Gaurav Pandit')
GO
Insert Into Patients(doctorid, admissionDate)
Values(1, '20120102'),(1, '20120108'),(2, '20120104'),(2, '20120110'),(1, '20120115')
GO

Select * from dbo.Doctors
Select * from dbo.Patients

The last select query command will return these rows from Doctors and Patients tables.











Fig 1.1

  1. Converting relational data (resultset) into XML
I created all the XML and XML schemas for this section using T-SQL. Here I will explain you how to convert relational data into XML using FOR XML clause of T-SQL query. I will use most useful options of this clause. We have four different category of FOR XML clause, each having their own strengths. Here are all FOR XML Clauses.

1)      FOR XML RAW
2)      FOR XML AUTO
3)      FOR XML EXPLICIT
4)      FOR XML PATH

As you can see that this topic is too big to cover in one post. Hence I will cover FOR XML RAW and FOR XML AUTO clause in this post and FOR XML EXPLICIT and FOR XML PATH in the next.     

I.       FOR XML RAW

FOR XML RAW is the simplest option, it creates <row> node for each row of the resultset. All column will be represented as attributes of row nodes. Run the below code for demonstration of FOR XML RAW.

USE SAMPLE;
SELECT Doctor.doctorid, Doctor.doctorname,
Patient.patientid, Patient.admissionDate
FROM dbo.Doctors AS Doctor
JOIN dbo.Patients AS Patient
ON Doctor.doctorid = Patient.doctorid
ORDER BY Doctor.doctorid, Patient.patientid
FOR XML RAW;                  

                 The result of this query is..
      <row doctorid="1" doctorname="Ashish Jain" patientid="1" admissionDate="2012-01-02T00:00:00" />
     <row doctorid="1" doctorname="Ashish Jain" patientid="2" admissionDate="2012-01-08T00:00:00" />
     <row doctorid="1" doctorname="Ashish Jain" patientid="5" admissionDate="2012-01-15T00:00:00" />
     <row doctorid="2" doctorname="Gaurav Pandit" patientid="3" admissionDate="2012-01-04T00:00:00" />
     <row doctorid="2" doctorname="Gaurav Pandit" patientid="4" admissionDate="2012-01-10T00:00:00" />

You can enhance the ROW effect by renaming the row element, adding ROOT element, and making the XML returned from attribute-centric to element-centric. The below script will rename row node name with Doctor and adds root element having name DoctorsPatients, and make the output XML element centric.

USE SAMPLE;
SELECT Doctor.doctorid, Doctor.doctorname,
Patient.patientid, Patient.admissionDate
FROM dbo.Doctors AS Doctor
JOIN dbo.Patients AS Patient
ON Doctor.doctorid = Patient.doctorid
ORDER BY Doctor.doctorid, Patient.patientid
FOR XML RAW('Doctor'), ELEMENTS, ROOT('DoctorsPatients');

The result of this query is..

<DoctorsPatients>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>1</patientid>
    <admissionDate>2012-01-02T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>2</patientid>
    <admissionDate>2012-01-08T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>5</patientid>
    <admissionDate>2012-01-15T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>2</doctorid>
    <doctorname>Gaurav Pandit</doctorname>
    <patientid>3</patientid>
    <admissionDate>2012-01-04T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>2</doctorid>
    <doctorname>Gaurav Pandit</doctorname>
    <patientid>4</patientid>
    <admissionDate>2012-01-10T00:00:00</admissionDate>
  </Doctor>
</DoctorsPatients>

As you can see that this is proper XML that we got from previous query. However it does not include additional level of nesting. In this this XML we can see that doctorid value 1 is repeated three times. It would be better if we have one Doctor node for doctorid 1 and three patient nodes nested into that doctor node.


II.    FOR XML AUTO

FOR XML AUTO clause gives you nice XML document with nested elements having minimum redundancy, and it is not difficult to use. To declare namespaces in XML we need to use WITH NAMESPACES clause, preceding the select part of the query, which is used to define namespaces and aliases in the returned XML.

When we use AUTO option with FOR XML clause, we will get nodes for each Participating table in the returned XML. When we declare namespace in query, we have to alias all colum names and table names that participates in the query, with the name in this format (alias:columname) and (alias:tablename) respectively, we need to do this because we want nodes and elements (or attributes) in the returned XML, qualified with the namespace. Run the following query to see the effect of FOR XML Clause.

WITH XMLNAMESPACES('SAMPLE-DoctorsPatients' AS dp)
SELECT [dp:Doctor].doctorid as 'dp:doctorid',
[dp:Doctor].doctorname as 'dp:doctorname',
[dp:Patient].patientid as 'dp:patientid',
[dp:Patient].admissionDate as 'dp:admissionDate'
FROM dbo.Doctors as [dp:Doctor]
JOIN dbo.Patients as [dp:Patient]
ON [dp:Doctor].doctorid = [dp:Patient].doctorid
ORDER BY [dp:Doctor].doctorid, [dp:Patient].patientid
FOR XML AUTO, ELEMENTS, ROOT('DoctorsPatients');

In this query I declared namespace ‘SAMPLE-DoctorsPatients’ with alias name ‘dp’. I also use ‘dp:’ prefix with table and colum alias name in my query so that the returned XML will have nodes and attributes (or elements) prefixed with namespace alias (dp) and colon. A colon is used in XML to separate the namespace from the element name.

The output of this query is..

<DoctorsPatients xmlns:dp="SAMPLE-DoctorsPatients">
  <dp:Doctor>
    <dp:doctorid>1</dp:doctorid>
    <dp:doctorname>Ashish Jain</dp:doctorname>
    <dp:Patient>
      <dp:patientid>1</dp:patientid>
      <dp:admissionDate>2012-01-02T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>2</dp:patientid>
      <dp:admissionDate>2012-01-08T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>5</dp:patientid>
      <dp:admissionDate>2012-01-15T00:00:00</dp:admissionDate>
    </dp:Patient>
  </dp:Doctor>
  <dp:Doctor>
    <dp:doctorid>2</dp:doctorid>
    <dp:doctorname>Gaurav Pandit</dp:doctorname>
    <dp:Patient>
      <dp:patientid>3</dp:patientid>
      <dp:admissionDate>2012-01-04T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>4</dp:patientid>
      <dp:admissionDate>2012-01-10T00:00:00</dp:admissionDate>
    </dp:Patient>
  </dp:Doctor>
</DoctorsPatients>

Note that a proper ORDER BY clause is very important. With T-SQL SELECT, you are actually formatting the returned XML. Without the ORDER BY clause, the order of rows returned is unpredictable. The order of columns in the SELECT clause also influences the XML returned. SQL Server uses column order to determine the nesting of elements.

You have to be so careful with the columns you select; in a relation, the order of columns and rows is not important. Nevertheless, you have to realize that the result of your query is not a relation; it is text in XML format, and parts of your query are used for formatting the text.


In RAW and AUTO mode, you can also return the XSD schema of the document you are
creating. This schema is included inside XML returned, before the actual XML data; therefore, it is called an inline schema. You return XSD with the XMLSCHEMA directive. This directive accepts a parameter that defines a target namespace. If you need schema only, without data, simply include a WHERE condition in your query with a predicate that no row can satisfy. Here is the example that I am using to return XSD Schema of the returned XML. Note that I purposely used, a where condition that will filter no data so that I get only XSD schema in the returned XML.

SELECT Doctor.doctorid as 'doctorid',
Doctor.doctorname as 'doctorname',
Patient.patientid as 'patientid',
Patient.admissionDate as 'admissionDate'
FROM dbo.Doctors as Doctor
JOIN dbo.Patients as Patient
ON Doctor.doctorid = Patient.doctorid
WHERE 1 = 2
FOR XML AUTO, ELEMENTS, ROOT('DoctorsPatients'),
XMLSCHEMA('SAMPLE-DoctorsPatients');


The result of this query is XSD Schema..

<DoctorsPatients>
  <xsd:schema targetNamespace="SAMPLE-DoctorsPatients" xmlns:schema="SAMPLE-DoctorsPatients" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
    <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
    <xsd:element name="Doctor">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="doctorid" type="sqltypes:int" />
          <xsd:element name="doctorname" minOccurs="0">
            <xsd:simpleType>
              <xsd:restriction base="sqltypes:varchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                <xsd:maxLength value="50" />
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
          <xsd:element ref="schema:Patient" minOccurs="0" maxOccurs="unbounded" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
    <xsd:element name="Patient">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="patientid" type="sqltypes:int" />
          <xsd:element name="admissionDate" type="sqltypes:datetime" minOccurs="0" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>
</DoctorsPatients>

I hope you will like this post. I will cover other topics related to XML and XQUERY in my subsequent posts. Please do comment if you have any questions or what to share your own thoughts on this topic.