XML Tutorial
Volume 4 : XML Schema Basics

Seiichi Kinugasa

In this and the next two volumes, we will discuss "XML Schema," a common schema definition language. Mainly we will address the differences between DTD and XML Schema definitions discussed in earlier volumes, and element and attribute declarations using XML Schema. This is knowledge required for being able to answer questions in Section 4 "XML Schema" in the XML Master Basic V2 Exam, so let’s be sure to establish a solid grasp in this area.

Index

What is XML Schema?

Differences between XML Schema and DTD Definitions

XML Schema Structure

Review Questions

What is XML Schema?

In previous volumes, we discussed well-formed XML documents, valid XML documents using DTDs, and XML parsers. DTD has a characteristically simple syntax for functions and content definition. We see, however, that DTD functions and definitions have limitations when it comes to using XML for a variety of complex purposes.

Traditionally, DTD has been the standard for XML schema definition; however, XML usage has expanded dramatically in core application systems, being tailored for a wide range of purposes for which DTD is not fully capable of supporting. Given this development, the W3C recommended "XML Schema" as a schema definition language to replace DTD. The recommendation of XML Schema has spurred its adoption as a standard schema definition language.

Differences between XML Schema and DTD Definitions

What differences are there between XML Schema and DTD definitions? We will explain these differences using an XML document related to employee information as an example.

When defining XML Schema, the content you wish to put into an XML document must first be summarized. The next step is to create a tree structure.

Content to put into the XML document:

  1. The root element is "Employee_Info"
  2. As the content for "Employee_Info," "Employee" occurs 0 or more times
  3. As content of "Employee," "Name," "Department," "Telephone," and "Email" elements occur once in respective order
  4. "Name," "Department," "Telephone," and "Email" content are text strings
  5. "Employee" has an attribute called "Employee_Number"
  6. "Employee_Number" content must be int type

Employee_Info

This provides us with an understanding of the hierarchical structure of the XML document. Now, we can provide a schema definition using actual schema definition language.

LIST1 is an example using DTD and providing a schema definition for the content above, while LIST2 is an example using XML Schema to provide a schema definition (employee.xs).

LIST1: Employee Information DTD

<!ELEMENT Employee_Info (Employee)*>
<!ELEMENT Employee (Name, Department, Telephone, Email)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Department (#PCDATA)>
<!ELEMENT Telephone (#PCDATA)>
<!ELEMENT Email (#PCDATA)>
<!ATTLIST Employee Employee_Number CDATA #REQUIRED>

LIST2:Employee Information XML Schema(employee.xs)

01 <?xml version="1.0"?>
02 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" >
03
04 <xs:element name="Employee_Info" type="EmployeeInfoType" />
05 <xs:complexType name="EmployeeInfoType">
06  <xs:sequence>
07   <xs:element ref="Employee" minOccurs="0" maxOccurs="unbounded" />
08  </xs:sequence>
09 </xs:complexType>
10
11 <xs:element name="Employee" type="EmployeeType" />
12 <xs:complexType name="EmployeeType">
13  <xs:sequence >
14   <xs:element ref="Name" />
15   <xs:element ref="Department" />
16   <xs:element ref="Telephone" />
17   <xs:element ref="Email" />
18  </xs:sequence>
19  <xs:attribute name="Employee_Number" type="xs:int" use="required"/>
20 </xs:complexType>
21
22 <xs:element name="Name" type="xs:string" />
23 <xs:element name="Department" type="xs:string" />
24 <xs:element name="Telephone" type="xs:string" />
25 <xs:element name="Email" type="xs:string" />
26
27 </xs:schema>

(Line numbers have been added for reference, and are not necessary in the actual code.)

As you see, the syntax is completely different between the two. For the DTD, a unique syntax is written, whereas the XML Schema is written in XML format conforming to XML 1.0 syntax. LIST3 is an example of a valid XML document for the LIST2 XML Schema (employee.xml).

LIST3: Valid XML Document for XML Schema (employee.xml)

<?xml version="1.0"?>
<Employee_Info
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="employee.xs">
  <Employee  Employee_Number="105">
    <Name>Masashi Okamura</Name>
    <Department>Design Department</Department>
    <Telephone>03-1452-4567</Telephone>
    <Email>okamura@xmltr.co.jp</Email>
  </Employee>
  <Employee  Employee_Number="109">
    <Name>Aiko Tanaka</Name>
    <Department>Sales Department</Department>
    <Telephone>03-6459-98764</Telephone>
    <Email>tanaka@xmltr.co.jp</Email>
  </Employee>
</Employee_Info>

For DTD, a DOCTYPE declaration is used to associate with the XML document; but, in the case of XML Schema, the specification does not particularly determine anything with respect to the association of the XML document. Accordingly, the implementation method of the validation tool actually used is followed. However, under the XML Schema specification, there is a defined method for writing a hint to associate with the XML document. The following content is inserted into the root element of the XML document.

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="employee.xs"

XML Schema Structure

From here, using the LIST2 employee.xs file as an example, we will explain the method for writing XML schema.

XML Schema Root Element

The schema element is used as the root element, and the XML Schema "Namespace" is declared. Namespace is a specification used to avoid the duplication of attribute and element names defined under XML, and is normally designated using URL format. Under LIST2, the "xmlns:xs="http://www.w3.org/2001/XMLSchema" section at Line 2 is a Namespace declaration. The "xs" designation is called the "Namespace Prefix," and can be used with an element and a child element. Generally, the "xs" prefix is used most often.

Element Declaration

When declaring an element, an ELEMENT keyword is used under DTD; however, under XML Schema, the element element is used. The declaration method is different depending on whether the element element has a child element or not. When no child element is present, the element name is designated with the name attribute, and the data type is designated using the type attribute.

Element Name

Under DTD, not much more than being able to show an optional text string called #PCDATA as the element content was possible; however, under XML Schema, a variety of data types can be defined. Data types can be designated using pre-defined embedded simple type (Note), including string type, int type and date type shown in a table, as well as ID type and NMTOKEN type that are compatible with DTD. These can be combined and extended or restricted to create new, unique data types.

Note: The XML Schema specification consists of "Part 1: Structure Specification" and "Part 2: Data Type Specification." The embedded simple type is a data type already stipulated in "Part 2: Data Type Specification." In addition to what is shown in Table 2, there is also an "xs: hexBinary" data type that represents hexadecimal binaries and an "xs:base64Binary" data type that represents Base64 format binaries.

Table : Main XML Schema Data Types

●General Data Types

Name Explanation
xs:integer Integers (infinite precision)
xs:positiveInteger Positive integers (infinite precision)
xs:negativeInteger Negative integers (infinite precision)
xs:nonPositiveInteger Negative integers including 0 (infinite precision)
xs:nonNegativeInteger Positive integers including 0 (infinite precision)
xs:byte Integer represented by 8 bits
xs:unsignedByte Integer represented by 8 bits (no symbols)
xs:short Integer represented by 16 bits
xs:unsignedShort Integer represented by 16 bits (no symbols)
xs:int Integer represented by 32 bits
xs:unsignedInt Integer represented by 32 bits (no symbols)
xs:long Integer represented by 64 bits
xs:unsignedLong Integer represented by 64 bits (no symbols)
xs:decimal Decimal number (infinite precision)
xs:float Single-precision floating-point number (32-bit)
xs:double Double-precision floating-point number (64-bit)
xs:Boolean Boolean value
xs:string Arbitrary text string

●Types Representing Dates and Times

Name Explanation
xs:time Time of day
xs:dateTime Date and time of day
xs:date Date
xs:gYear Year
xs:gYearMonth Year and month
xs:gMonth Month
xs:gMonthDay Month and day
xs:gDay Day

● DTD-Compatible Types

Name Explanation
xs:ID XML 1.0 Specification ID type
xs:IDREF XML 1.0 Specification IDREF type
xs:IDREFS XML 1.0 Specification IDREFS type
xs:ENTITY XML 1.0 Specification ENTITY type
xs:ENTITIES XML 1.0 Specification ENTITIES type
xs:NOTATION XML 1.0 Specification NOTATION type
xs:NMTOKEN XML 1.0 Specification NMTOKEN type
xs:NMTOKENS XML 1.0 Specification NMTOKENS type

Meanwhile, if the element has a child element, a new data type must first be designated for the element (Line 11):

<xs:element name="Employee" type="EmployeeType" />

This "EmployeeType type" designated by the type attribute is a Complex Data Type. Lines 11 through 20 are Complex Type declarations. In the actual content of the Complex Type, EmployeeType type is designated with the name attribute of the complexType element, and the Model Group (settings method for the occurrence order of the child element) is designated in the child element. In the Model Group, use the sequence element to output occurrences in the order written (equivalent to the "," in DTD), and use the choice element to output the occurrence of any given element (equivalent to the "|" in DTD).

Meaning of the Model Group XML Schema DTD
Output the element in the written order in the exact number of occurrences designated sequence element
Output any one element in the exact number of occurrences designated choice element

For Model Group element declarations, the most common method is to designate the ref attribute of the element element, referencing the element declared in a separate location (LIST4).

LIST4: Element Declaration Reference for a Model Group Element

LIST4: Element Declaration Reference for a Model Group Element

The element reference syntax is as follows:

attribute

Attribute Declarations

When declaring an attribute, the ATTLIST keyword is used under DTD, while the attribute element is used under XML Schema. The syntax is as shown below. As mentioned previously in connection with Complex Type declarations, when describing an attribute, the convention is to describe it after the Complex Type definition content (after the Model Group) (Line 19).

Complex Type

<xs:attribute name="Employee_Number" type="xs:int" use="required"/>

The attribute name is designated using the name attribute and the data type is designated using the type attribute. The use, default, and fixed attributes can be designated as options. The use attribute is a designation related to occurrences, and can be used to designate "required" (equivalent of #REQUIRED in DTD) or "optional" (equivalent to #IMPLIED in DTD). When nothing is written, the setting is "optional." The default attribute is used to designate initial values, while the fixed attribute is used to designate a fixed value (equivalent to #FIXED in DTD).

Table : XML Schema and DTD Differences related to Attribute Declarations

Designations related to Occurrences XML Schema DTD
Attribute description may be omitted optional #IMPLIED
Attribute description is required required #REQUIRED
Attribute description is prohibited prohibited None

Designating Repeat Count

Under DTD, designating a repeat count was only possible by designating the minimum value as (*) for 0 or more times, or (+) for one or more times. However, under XML Schema, the minOccurs and maxOccurs attributes can be used to designate detailed repeat counts, such as "from one to three" or "between three and unlimited." For an unlimited upper limit, set the maxOccurs attribute to "unbounded."
Be sure to remember that if the minOccurs and maxOccurs attributes are omitted, both default to a value of 1. Repeat count designations can be used within element references, attribute declarations and within Model Groups.

Review Questions

Question 1

Which of the following two answers are correct regarding Embedded Simple Types representing date and time in XML Schema?

  1. xs:date
  2. xs:gMonthYear
  3. xs:gMonthDay
  4. xs:timeDate
  5. xs:gDayMonth

Comments

The correct answer is A (xs:date) and C (xs:gMonthDay). These two are embedded simple types representing time and date under XML Schema. Embedded simple types that are used quite often are the same as data types used for common programming languages. Embedded simple types that are difficult to map to types used in current programming languages and databases tend to not be used very often.

Question 2

Select which of the following is a correct XML Schema description matching the conditions below. Select all that apply. Assume the XML Schema Namespace prefix is "xs."

Conditions: The "Address" attribute is defined as a string type that may be omitted.

  1. <xs:attribute name="Address" type="xs:string" use="optional"/>
  2. <xs:attribute name="Address" type="xs:string" optional="true"/>
  3. <xs:attribute name="Address" type="xs:string" required="optional"/>
  4. <xs:attribute name="Address" type="xs:string" use="required"/>

Comments

Attribute occurrences are designated using the use attribute of the attribute element. Designating the value as "optional" allows the attribute description to be omitted. Since this meets the required conditions of the question, the correct answer is A.

No optional attribute or required attribute exists for the attribute element, and the syntax itself contains an error. Accordingly, answers B and C are incorrect.

Designating "required" as the value of the use attribute for the attribute element means that the attribute description is required. The syntax itself is correct, but does not meet the required condition of providing a definition allowing the Address attribute to be omitted. Accordingly, answer D is incorrect.

Question 3

Select which of the following is a valid XML document with respect to the following XML Schema Document.

●XML Schema Document
<?xml version="1.0"?>
  <xs:schema
   xmlns:xs="http://www.w3.org/2001/XMLSchema" >
  <xs:element name="Employee" type="EmployeeType" />
  <xs:complexType name="EmployeeType">
    <xs:sequence maxOccurs="unbounded">
      <xs:element ref="Name" />
      <xs:element ref="Department" />
    </xs:sequence>
  </xs:complexType>
  <xs:element name="Name" type="xs:string" />
  <xs:element name="Department" type="xs:string" />
  </xs:schema>

  1. <Employee></Employee>
  2. <Employee>
    <Name>Masashi Tanaka</Name>
    <Name>Makiko Okamura</Name>
    </Employee>
  3. <Employee>
    <Name>Masashi Tanaka</Name>
    <Name>Makiko Okamura</Name>
    <Department>Sales Department</Department>
    <Department>Accounting Department</Department>
    </Employee>
  4. Neither A, B, nor C follows the definition in XML Schema Document

Comments

The content defined in this XML Schema Document place the "Name" element and the "Department" element as child elements of the "Employee" element, with the "Name" element and "Department" element occurring in that order. Since the minOccurs attribute of the sequence element is omitted, the combination of "Name" and "Department" elements occur at least once; the maxOccurs attribute is designated as "unbounded," allowing for an unlimited number of occurrences. Rewriting this XML Schema Document under DTD would result in <!ELEMENT Employee(Name,Department)+>.

Under answer A, the "Employee" element is an empty element, and as such does not fulfill the number of occurrences defined in XML Schema Document. Accordingly, answer A is incorrect
With answer B, only the "Name" element occurs repeatedly, while there is no description provided for the "Department" element. Accordingly, answer B is also incorrect.

Under answer C, first the "Name" element occurs, after which the "Department" element occurs; the "Name" and "Department" elements do not occur in combination. Accordingly, answer C is incorrect.

Answer D is correct, because the XML documents of neither A, B, nor C follow the definitions of the XML Schema Document.

In this volume, we have discussed the differences between XML Schema and DTD definitions, as well as the methods for declaring XML Schema elements and attributes. DTD does not support Namespace, while XML Schema does support Namespace. Due to constraints of the page, we did not offer a full explanation in this regard. Also, the default attribute and fixed attribute may be designated in element declarations, a fact about which we hope all XML professionals develop a firm grasp.


Seiichi Kinugasa

Hewlett-Packard Japan HP Training Services.
I currently oversee XML training courses as an Infoteria-certified trainer, providing technical and training support for IT professional development programs, including large-scale Web development support courses. Not having been asked to write magazine articles for quite some time, I am truly feeling the pressure, but I will continue to give my best for the next two articles I am writing for this series.


The content presented here is an HTML version of an article that originally appeared in the June 2007 issue of DB Magazine published by Shoeisya.

XML Master Tutorial Indexs

Go To HOME