Volume 4 : XML Schema Basics
In this and the next two volumes, we will discuss "XML Schema," a common schema definition language. Mainly we will address the differences between DTD and XML Schema definitions discussed in earlier volumes, and element and attribute declarations using XML Schema. This is knowledge required for being able to answer questions in Section 4 "XML Schema" in the XML Master Basic V2 Exam, so let’s be sure to establish a solid grasp in this area.
What is XML Schema?
In previous volumes, we discussed well-formed XML documents, valid XML documents using DTDs, and XML parsers. DTD has a characteristically simple syntax for functions and content definition. We see, however, that DTD functions and definitions have limitations when it comes to using XML for a variety of complex purposes.
Traditionally, DTD has been the standard for XML schema definition; however, XML usage has expanded dramatically in core application systems, being tailored for a wide range of purposes for which DTD is not fully capable of supporting. Given this development, the W3C recommended "XML Schema" as a schema definition language to replace DTD. The recommendation of XML Schema has spurred its adoption as a standard schema definition language.
Differences between XML Schema and DTD Definitions
What differences are there between XML Schema and DTD definitions? We will explain these differences using an XML document related to employee information as an example.
When defining XML Schema, the content you wish to put into an XML document must first be summarized. The next step is to create a tree structure.
Content to put into the XML document:
- The root element is "Employee_Info"
- As the content for "Employee_Info," "Employee" occurs 0 or more times
- As content of "Employee," "Name," "Department," "Telephone," and "Email" elements occur once in respective order
- "Name," "Department," "Telephone," and "Email" content are text strings
- "Employee" has an attribute called "Employee_Number"
- "Employee_Number" content must be int type
This provides us with an understanding of the hierarchical structure of the XML document. Now, we can provide a schema definition using actual schema definition language.
LIST1 is an example using DTD and providing a schema definition for the content above, while LIST2 is an example using XML Schema to provide a schema definition (employee.xs).
LIST1: Employee Information DTD
<!ELEMENT Employee_Info (Employee)*> <!ELEMENT Employee (Name, Department, Telephone, Email)> <!ELEMENT Name (#PCDATA)> <!ELEMENT Department (#PCDATA)> <!ELEMENT Telephone (#PCDATA)> <!ELEMENT Email (#PCDATA)> <!ATTLIST Employee Employee_Number CDATA #REQUIRED>
LIST2：Employee Information XML Schema（employee.xs）
01 <?xml version="1.0"?> 02 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > 03 04 <xs:element name="Employee_Info" type="EmployeeInfoType" /> 05 <xs:complexType name="EmployeeInfoType"> 06 <xs:sequence> 07 <xs:element ref="Employee" minOccurs="0" maxOccurs="unbounded" /> 08 </xs:sequence> 09 </xs:complexType> 10 11 <xs:element name="Employee" type="EmployeeType" /> 12 <xs:complexType name="EmployeeType"> 13 <xs:sequence > 14 <xs:element ref="Name" /> 15 <xs:element ref="Department" /> 16 <xs:element ref="Telephone" /> 17 <xs:element ref="Email" /> 18 </xs:sequence> 19 <xs:attribute name="Employee_Number" type="xs:int" use="required"/> 20 </xs:complexType> 21 22 <xs:element name="Name" type="xs:string" /> 23 <xs:element name="Department" type="xs:string" /> 24 <xs:element name="Telephone" type="xs:string" /> 25 <xs:element name="Email" type="xs:string" /> 26 27 </xs:schema> (Line numbers have been added for reference, and are not necessary in the actual code.)
As you see, the syntax is completely different between the two. For the DTD, a unique syntax is written, whereas the XML Schema is written in XML format conforming to XML 1.0 syntax. LIST3 is an example of a valid XML document for the LIST2 XML Schema (employee.xml).
LIST3: Valid XML Document for XML Schema (employee.xml)
<?xml version="1.0"?> <Employee_Info xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employee.xs"> <Employee Employee_Number="105"> <Name>Masashi Okamura</Name> <Department>Design Department</Department> <Telephone>03-1452-4567</Telephone> <Email>firstname.lastname@example.org</Email> </Employee> <Employee Employee_Number="109"> <Name>Aiko Tanaka</Name> <Department>Sales Department</Department> <Telephone>03-6459-98764</Telephone> <Email>email@example.com</Email> </Employee> </Employee_Info>
For DTD, a DOCTYPE declaration is used to associate with the XML document; but, in the case of XML Schema, the specification does not particularly determine anything with respect to the association of the XML document. Accordingly, the implementation method of the validation tool actually used is followed. However, under the XML Schema specification, there is a defined method for writing a hint to associate with the XML document. The following content is inserted into the root element of the XML document.
XML Schema Structure
From here, using the LIST2 employee.xs file as an example, we will explain the method for writing XML schema.
XML Schema Root Element
The schema element is used as the root element, and the XML Schema "Namespace" is declared. Namespace is a specification used to avoid the duplication of attribute and element names defined under XML, and is normally designated using URL format. Under LIST2, the "xmlns:xs="http://www.w3.org/2001/XMLSchema" section at Line 2 is a Namespace declaration. The "xs" designation is called the "Namespace Prefix," and can be used with an element and a child element. Generally, the "xs" prefix is used most often.
When declaring an element, an ELEMENT keyword is used under DTD; however, under XML Schema, the element element is used. The declaration method is different depending on whether the element element has a child element or not. When no child element is present, the element name is designated with the name attribute, and the data type is designated using the type attribute.
Under DTD, not much more than being able to show an optional text string called #PCDATA as the element content was possible; however, under XML Schema, a variety of data types can be defined. Data types can be designated using pre-defined embedded simple type (Note), including string type, int type and date type shown in a table, as well as ID type and NMTOKEN type that are compatible with DTD. These can be combined and extended or restricted to create new, unique data types.
Note: The XML Schema specification consists of "Part 1: Structure Specification" and "Part 2: Data Type Specification." The embedded simple type is a data type already stipulated in "Part 2: Data Type Specification." In addition to what is shown in Table 2, there is also an "xs: hexBinary" data type that represents hexadecimal binaries and an "xs:base64Binary" data type that represents Base64 format binaries.
Table : Main XML Schema Data Types
●General Data Types
|xs:integer||Integers (infinite precision)|
|xs:positiveInteger||Positive integers (infinite precision)|
|xs:negativeInteger||Negative integers (infinite precision)|
|xs:nonPositiveInteger||Negative integers including 0 (infinite precision)|
|xs:nonNegativeInteger||Positive integers including 0 (infinite precision)|
|xs:byte||Integer represented by 8 bits|
|xs:unsignedByte||Integer represented by 8 bits (no symbols)|
|xs:short||Integer represented by 16 bits|
|xs:unsignedShort||Integer represented by 16 bits (no symbols)|
|xs:int||Integer represented by 32 bits|
|xs:unsignedInt||Integer represented by 32 bits (no symbols)|
|xs:long||Integer represented by 64 bits|
|xs:unsignedLong||Integer represented by 64 bits (no symbols)|
|xs:decimal||Decimal number (infinite precision)|
|xs:float||Single-precision floating-point number (32-bit)|
|xs:double||Double-precision floating-point number (64-bit)|
|xs:string||Arbitrary text string|
●Types Representing Dates and Times
|xs:time||Time of day|
|xs:dateTime||Date and time of day|
|xs:gYearMonth||Year and month|
|xs:gMonthDay||Month and day|
● DTD-Compatible Types
|xs:ID||XML 1.0 Specification ID type|
|xs:IDREF||XML 1.0 Specification IDREF type|
|xs:IDREFS||XML 1.0 Specification IDREFS type|
|xs:ENTITY||XML 1.0 Specification ENTITY type|
|xs:ENTITIES||XML 1.0 Specification ENTITIES type|
|xs:NOTATION||XML 1.0 Specification NOTATION type|
|xs:NMTOKEN||XML 1.0 Specification NMTOKEN type|
|xs:NMTOKENS||XML 1.0 Specification NMTOKENS type|
Meanwhile, if the element has a child element, a new data type must first be designated for the element (Line 11):
<xs:element name="Employee" type="EmployeeType" />
This "EmployeeType type" designated by the type attribute is a Complex Data Type. Lines 11 through 20 are Complex Type declarations. In the actual content of the Complex Type, EmployeeType type is designated with the name attribute of the complexType element, and the Model Group (settings method for the occurrence order of the child element) is designated in the child element. In the Model Group, use the sequence element to output occurrences in the order written (equivalent to the "," in DTD), and use the choice element to output the occurrence of any given element (equivalent to the "|" in DTD).
|Meaning of the Model Group||XML Schema||DTD|
|Output the element in the written order in the exact number of occurrences designated||sequence element||，|
|Output any one element in the exact number of occurrences designated||choice element||｜|
For Model Group element declarations, the most common method is to designate the ref attribute of the element element, referencing the element declared in a separate location (LIST4).
LIST4: Element Declaration Reference for a Model Group Element
The element reference syntax is as follows:
When declaring an attribute, the ATTLIST keyword is used under DTD, while the attribute element is used under XML Schema. The syntax is as shown below. As mentioned previously in connection with Complex Type declarations, when describing an attribute, the convention is to describe it after the Complex Type definition content (after the Model Group) (Line 19).
<xs:attribute name="Employee_Number" type="xs:int" use="required"/>
The attribute name is designated using the name attribute and the data type is designated using the type attribute. The use, default, and fixed attributes can be designated as options. The use attribute is a designation related to occurrences, and can be used to designate "required" (equivalent of #REQUIRED in DTD) or "optional" (equivalent to #IMPLIED in DTD). When nothing is written, the setting is "optional." The default attribute is used to designate initial values, while the fixed attribute is used to designate a fixed value (equivalent to #FIXED in DTD).
Table : XML Schema and DTD Differences related to Attribute Declarations
|Designations related to Occurrences||XML Schema||DTD|
|Attribute description may be omitted||optional||#IMPLIED|
|Attribute description is required||required||#REQUIRED|
|Attribute description is prohibited||prohibited||None|
Designating Repeat Count
Under DTD, designating a repeat count was only possible by designating the minimum value as (*) for 0 or more times, or (+) for one or more times. However, under XML Schema, the minOccurs and maxOccurs attributes can be used to designate detailed repeat counts, such as "from one to three" or "between three and unlimited." For an unlimited upper limit, set the maxOccurs attribute to "unbounded."
Be sure to remember that if the minOccurs and maxOccurs attributes are omitted, both default to a value of 1. Repeat count designations can be used within element references, attribute declarations and within Model Groups.
Which of the following two answers are correct regarding Embedded Simple Types representing date and time in XML Schema?
The correct answer is A (xs:date) and C (xs:gMonthDay). These two are embedded simple types representing time and date under XML Schema. Embedded simple types that are used quite often are the same as data types used for common programming languages. Embedded simple types that are difficult to map to types used in current programming languages and databases tend to not be used very often.
Select which of the following is a correct XML Schema description matching the conditions below. Select all that apply. Assume the XML Schema Namespace prefix is "xs."
Conditions: The "Address" attribute is defined as a string type that may be omitted.
- <xs:attribute name="Address" type="xs:string" use="optional"/>
- <xs:attribute name="Address" type="xs:string" optional="true"/>
- <xs:attribute name="Address" type="xs:string" required="optional"/>
- <xs:attribute name="Address" type="xs:string" use="required"/>
Attribute occurrences are designated using the use attribute of the attribute element. Designating the value as "optional" allows the attribute description to be omitted. Since this meets the required conditions of the question, the correct answer is A.
No optional attribute or required attribute exists for the attribute element, and the syntax itself contains an error. Accordingly, answers B and C are incorrect.
Designating "required" as the value of the use attribute for the attribute element means that the attribute description is required. The syntax itself is correct, but does not meet the required condition of providing a definition allowing the Address attribute to be omitted. Accordingly, answer D is incorrect.
Select which of the following is a valid XML document with respect to the following XML Schema Document.
●XML Schema Document
<xs:element name="Employee" type="EmployeeType" />
<xs:element ref="Name" />
<xs:element ref="Department" />
<xs:element name="Name" type="xs:string" />
<xs:element name="Department" type="xs:string" />
- Neither A, B, nor C follows the definition in XML Schema Document
The content defined in this XML Schema Document place the "Name" element and the "Department" element as child elements of the "Employee" element, with the "Name" element and "Department" element occurring in that order. Since the minOccurs attribute of the sequence element is omitted, the combination of "Name" and "Department" elements occur at least once; the maxOccurs attribute is designated as "unbounded," allowing for an unlimited number of occurrences. Rewriting this XML Schema Document under DTD would result in <!ELEMENT Employee（Name，Department）＋>.
Under answer A, the "Employee" element is an empty element, and as such does not fulfill the number of occurrences defined in XML Schema Document. Accordingly, answer A is incorrect
With answer B, only the "Name" element occurs repeatedly, while there is no description provided for the "Department" element. Accordingly, answer B is also incorrect.
Under answer C, first the "Name" element occurs, after which the "Department" element occurs; the "Name" and "Department" elements do not occur in combination. Accordingly, answer C is incorrect.
Answer D is correct, because the XML documents of neither A, B, nor C follow the definitions of the XML Schema Document.
In this volume, we have discussed the differences between XML Schema and DTD definitions, as well as the methods for declaring XML Schema elements and attributes. DTD does not support Namespace, while XML Schema does support Namespace. Due to constraints of the page, we did not offer a full explanation in this regard. Also, the default attribute and fixed attribute may be designated in element declarations, a fact about which we hope all XML professionals develop a firm grasp.
Hewlett-Packard Japan HP Training Services.
I currently oversee XML training courses as an Infoteria-certified trainer, providing technical and training support for IT professional development programs, including large-scale Web development support courses. Not having been asked to write magazine articles for quite some time, I am truly feeling the pressure, but I will continue to give my best for the next two articles I am writing for this series.
The content presented here is an HTML version of an article that originally appeared in the June 2007 issue of DB Magazine published by Shoeisya.