Saturday 21 January 2012

XML and XQUERY Language: Part 1(FOR XML RAW and FOR XML AUTO)

Hello Folks!

Today I will write about XML support in SQL Server. I will also conclude in the last about How important XML support in SQL and when should we, when shouldn't we use XML support.

SQL Server 2008 extensively supports XML. This support includes converting resultset of query into XML and converting XML into resultset (Shredding). SQL Server 2008 also has native XML data type. You can store XML data into XML type variable or column and also you can modify or query XML data in variable, using XML data type methods. These XML data type methods take XQUERY expression as one of their parameter, to query and modify XML data. XQUERY is a standard language used to query and modify XML data.

Create Database Environment for demonstration purpose – Run the below script to create a database and tables which I will use to give example on this topic. This script will create database SAMPLE and two tables Doctors and Patients in this database and also fill the data in these tables.

Listing 1.1
USE master;        
IF DB_ID('SAMPLE') IS NOT NULL
DROP DATABASE SAMPLE;

CREATE DATABASE SAMPLE;
               
USE SAMPLE;
CREATE TABLE Doctors
(
                doctorid int IDENTITY(1,1) NOT NULL,
                doctorname VARCHAR(50) NULL
                CONSTRAINT PK_Customers PRIMARY KEY(doctorid)
)              
GO
CREATE TABLE Patients
(
                patientid int IDENTITY(1,1) NOT NULL,
                doctorid int NOT NULL,
                admissionDate Datetime NULL
                CONSTRAINT PK_Patients PRIMARY KEY(patientid)
                CONSTRAINT FK_Patients_Customers FOREIGN KEY(doctorid) REFERENCES dbo.Doctors(doctorid)
)
GO
Insert Into Doctors(doctorname)
Values('Ashish Jain'),('Gaurav Pandit')
GO
Insert Into Patients(doctorid, admissionDate)
Values(1, '20120102'),(1, '20120108'),(2, '20120104'),(2, '20120110'),(1, '20120115')
GO

Select * from dbo.Doctors
Select * from dbo.Patients

The last select query command will return these rows from Doctors and Patients tables.











Fig 1.1

  1. Converting relational data (resultset) into XML
I created all the XML and XML schemas for this section using T-SQL. Here I will explain you how to convert relational data into XML using FOR XML clause of T-SQL query. I will use most useful options of this clause. We have four different category of FOR XML clause, each having their own strengths. Here are all FOR XML Clauses.

1)      FOR XML RAW
2)      FOR XML AUTO
3)      FOR XML EXPLICIT
4)      FOR XML PATH

As you can see that this topic is too big to cover in one post. Hence I will cover FOR XML RAW and FOR XML AUTO clause in this post and FOR XML EXPLICIT and FOR XML PATH in the next.     

I.       FOR XML RAW

FOR XML RAW is the simplest option, it creates <row> node for each row of the resultset. All column will be represented as attributes of row nodes. Run the below code for demonstration of FOR XML RAW.

USE SAMPLE;
SELECT Doctor.doctorid, Doctor.doctorname,
Patient.patientid, Patient.admissionDate
FROM dbo.Doctors AS Doctor
JOIN dbo.Patients AS Patient
ON Doctor.doctorid = Patient.doctorid
ORDER BY Doctor.doctorid, Patient.patientid
FOR XML RAW;                  

                 The result of this query is..
      <row doctorid="1" doctorname="Ashish Jain" patientid="1" admissionDate="2012-01-02T00:00:00" />
     <row doctorid="1" doctorname="Ashish Jain" patientid="2" admissionDate="2012-01-08T00:00:00" />
     <row doctorid="1" doctorname="Ashish Jain" patientid="5" admissionDate="2012-01-15T00:00:00" />
     <row doctorid="2" doctorname="Gaurav Pandit" patientid="3" admissionDate="2012-01-04T00:00:00" />
     <row doctorid="2" doctorname="Gaurav Pandit" patientid="4" admissionDate="2012-01-10T00:00:00" />

You can enhance the ROW effect by renaming the row element, adding ROOT element, and making the XML returned from attribute-centric to element-centric. The below script will rename row node name with Doctor and adds root element having name DoctorsPatients, and make the output XML element centric.

USE SAMPLE;
SELECT Doctor.doctorid, Doctor.doctorname,
Patient.patientid, Patient.admissionDate
FROM dbo.Doctors AS Doctor
JOIN dbo.Patients AS Patient
ON Doctor.doctorid = Patient.doctorid
ORDER BY Doctor.doctorid, Patient.patientid
FOR XML RAW('Doctor'), ELEMENTS, ROOT('DoctorsPatients');

The result of this query is..

<DoctorsPatients>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>1</patientid>
    <admissionDate>2012-01-02T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>2</patientid>
    <admissionDate>2012-01-08T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>1</doctorid>
    <doctorname>Ashish Jain</doctorname>
    <patientid>5</patientid>
    <admissionDate>2012-01-15T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>2</doctorid>
    <doctorname>Gaurav Pandit</doctorname>
    <patientid>3</patientid>
    <admissionDate>2012-01-04T00:00:00</admissionDate>
  </Doctor>
  <Doctor>
    <doctorid>2</doctorid>
    <doctorname>Gaurav Pandit</doctorname>
    <patientid>4</patientid>
    <admissionDate>2012-01-10T00:00:00</admissionDate>
  </Doctor>
</DoctorsPatients>

As you can see that this is proper XML that we got from previous query. However it does not include additional level of nesting. In this this XML we can see that doctorid value 1 is repeated three times. It would be better if we have one Doctor node for doctorid 1 and three patient nodes nested into that doctor node.


II.    FOR XML AUTO

FOR XML AUTO clause gives you nice XML document with nested elements having minimum redundancy, and it is not difficult to use. To declare namespaces in XML we need to use WITH NAMESPACES clause, preceding the select part of the query, which is used to define namespaces and aliases in the returned XML.

When we use AUTO option with FOR XML clause, we will get nodes for each Participating table in the returned XML. When we declare namespace in query, we have to alias all colum names and table names that participates in the query, with the name in this format (alias:columname) and (alias:tablename) respectively, we need to do this because we want nodes and elements (or attributes) in the returned XML, qualified with the namespace. Run the following query to see the effect of FOR XML Clause.

WITH XMLNAMESPACES('SAMPLE-DoctorsPatients' AS dp)
SELECT [dp:Doctor].doctorid as 'dp:doctorid',
[dp:Doctor].doctorname as 'dp:doctorname',
[dp:Patient].patientid as 'dp:patientid',
[dp:Patient].admissionDate as 'dp:admissionDate'
FROM dbo.Doctors as [dp:Doctor]
JOIN dbo.Patients as [dp:Patient]
ON [dp:Doctor].doctorid = [dp:Patient].doctorid
ORDER BY [dp:Doctor].doctorid, [dp:Patient].patientid
FOR XML AUTO, ELEMENTS, ROOT('DoctorsPatients');

In this query I declared namespace ‘SAMPLE-DoctorsPatients’ with alias name ‘dp’. I also use ‘dp:’ prefix with table and colum alias name in my query so that the returned XML will have nodes and attributes (or elements) prefixed with namespace alias (dp) and colon. A colon is used in XML to separate the namespace from the element name.

The output of this query is..

<DoctorsPatients xmlns:dp="SAMPLE-DoctorsPatients">
  <dp:Doctor>
    <dp:doctorid>1</dp:doctorid>
    <dp:doctorname>Ashish Jain</dp:doctorname>
    <dp:Patient>
      <dp:patientid>1</dp:patientid>
      <dp:admissionDate>2012-01-02T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>2</dp:patientid>
      <dp:admissionDate>2012-01-08T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>5</dp:patientid>
      <dp:admissionDate>2012-01-15T00:00:00</dp:admissionDate>
    </dp:Patient>
  </dp:Doctor>
  <dp:Doctor>
    <dp:doctorid>2</dp:doctorid>
    <dp:doctorname>Gaurav Pandit</dp:doctorname>
    <dp:Patient>
      <dp:patientid>3</dp:patientid>
      <dp:admissionDate>2012-01-04T00:00:00</dp:admissionDate>
    </dp:Patient>
    <dp:Patient>
      <dp:patientid>4</dp:patientid>
      <dp:admissionDate>2012-01-10T00:00:00</dp:admissionDate>
    </dp:Patient>
  </dp:Doctor>
</DoctorsPatients>

Note that a proper ORDER BY clause is very important. With T-SQL SELECT, you are actually formatting the returned XML. Without the ORDER BY clause, the order of rows returned is unpredictable. The order of columns in the SELECT clause also influences the XML returned. SQL Server uses column order to determine the nesting of elements.

You have to be so careful with the columns you select; in a relation, the order of columns and rows is not important. Nevertheless, you have to realize that the result of your query is not a relation; it is text in XML format, and parts of your query are used for formatting the text.


In RAW and AUTO mode, you can also return the XSD schema of the document you are
creating. This schema is included inside XML returned, before the actual XML data; therefore, it is called an inline schema. You return XSD with the XMLSCHEMA directive. This directive accepts a parameter that defines a target namespace. If you need schema only, without data, simply include a WHERE condition in your query with a predicate that no row can satisfy. Here is the example that I am using to return XSD Schema of the returned XML. Note that I purposely used, a where condition that will filter no data so that I get only XSD schema in the returned XML.

SELECT Doctor.doctorid as 'doctorid',
Doctor.doctorname as 'doctorname',
Patient.patientid as 'patientid',
Patient.admissionDate as 'admissionDate'
FROM dbo.Doctors as Doctor
JOIN dbo.Patients as Patient
ON Doctor.doctorid = Patient.doctorid
WHERE 1 = 2
FOR XML AUTO, ELEMENTS, ROOT('DoctorsPatients'),
XMLSCHEMA('SAMPLE-DoctorsPatients');


The result of this query is XSD Schema..

<DoctorsPatients>
  <xsd:schema targetNamespace="SAMPLE-DoctorsPatients" xmlns:schema="SAMPLE-DoctorsPatients" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
    <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
    <xsd:element name="Doctor">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="doctorid" type="sqltypes:int" />
          <xsd:element name="doctorname" minOccurs="0">
            <xsd:simpleType>
              <xsd:restriction base="sqltypes:varchar" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                <xsd:maxLength value="50" />
              </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
          <xsd:element ref="schema:Patient" minOccurs="0" maxOccurs="unbounded" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
    <xsd:element name="Patient">
      <xsd:complexType>
        <xsd:sequence>
          <xsd:element name="patientid" type="sqltypes:int" />
          <xsd:element name="admissionDate" type="sqltypes:datetime" minOccurs="0" />
        </xsd:sequence>
      </xsd:complexType>
    </xsd:element>
  </xsd:schema>
</DoctorsPatients>

I hope you will like this post. I will cover other topics related to XML and XQUERY in my subsequent posts. Please do comment if you have any questions or what to share your own thoughts on this topic.

No comments:

Post a Comment