XML Tutorial - XML Master Professional Application Developer Edition
Volume 5 : Checking XML Validity using XML Schema

Tatsuya Kimura

Section 4 Major Points for Study

To be prepared for Section 4, you must have a firm understanding of XML Schema basic grammar, as well as knowledge covering a wide range of concepts, including XML Schema design methods incorporating XML namespaces, and include / import methods for other XML Schema. In this volume, I will use examples of the most typical questions that appear on the exam, along with added commentary.

Checking XML Validity using Java

Before going into the major points related to XML Schema questions on the exam, let's make sure we understand how to use Java to perform a validity check for an XML document. As with XSLT transformation explained in the previous volume, we can easily perform this check if we use the standard API "JAXP (Java API for XML Processing)" included in J2SE 5.0. List1 shows an example of a program created in JAXP that performs a validity check.

List1: Program using Java (JAXP) to perform a validity check (XSDValidate.java)

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXParseException;

public class XSDValidate {
  public static void main(String[] args) {
   if (args.length == 0) {
     System.out.println(
      "usage: XSDValidate XML-file [XSD-file]");
     return;
   }

   String xml = args[0];
   String xsd = null;

   if (args.length > 1) {
     xsd = args[1];
   }
 
   try {
     DocumentBuilderFactory dbf
      = DocumentBuilderFactory.newInstance();
     dbf.setNamespaceAware(true); 
     dbf.setValidating(true); ---- (2)

     dbf.setAttribute(
      "http://java.sun.com/xml/jaxp/properties/schemaLanguage",
      "http://www.w3.org/2001/XMLSchema"); 

     if (xsd != null) {
      dbf.setAttribute(
        "http://java.sun.com/xml/jaxp/properties/schemaSource",
        xsd);
     }
 
     DocumentBuilder db = dbf.newDocumentBuilder();
     db.setErrorHandler(new ErrorHandler() {
      public void error(SAXParseException exception) {
        System.out.println(
         "Error: URI=" + exception.getSystemId()
         + ", Line=" + exception.getLineNumber()
         + ", Column=" + exception.getColumnNumber());
        System.out.println(exception.getMessage());
        System.out.println();
      }
      public void fatalError(SAXParseException exception) {}
      public void warning(SAXParseException exception) {}
     });
     db.parse( xml ); ---- (1)
     System.out.println("checked xml validity");
   } catch (Exception e) {
     System.out.println(e);
   }
  }
}

This program verifies whether the XML document identified in the first argument is valid against the XML Schema identified in the second argument, outputting the result. If the reference to the XML Schema (attribute noNamespaceSchemaLocation or schemaLocation) is coded directly in the XML document, you can omit the second argument. If there is no validation error as a result of executing the validity check, the message "checked xml validity" is output. If there is an error, an error message (name of the file in which the error was detected; line and column where the error occurred, and a message indicating the details of the error) will be output.

As we can see in List1, when a validity check is performed using JAXP, an instance of the class javax.xml.parsers.DocumentBuilder is created, and the parse method is executed (List1-(1)). To create an instance of this class, first an instance of the factory class javax.xml.parsers.DocumentBuilderFactory must be created. Next, using the DocumentBuilderFactory instance, call the method setValidating or setAttribute, and perform the necessary settings for the validation check before performing a validity check (List1-(2) and below).

To get a better understanding of XML Schema, you should go beyond just looking over existing schema written by XML Schema; I strongly recommend that you create various XML schemas on your own and actually perform validity checks on XML documents.

With that, let's take a look at some practice questions/answers related to XML Schema, as well as some comments related to notable points.

Example of an XML Schema Question Appearing on the Exam - (1)

Select the answer that correctly describes the results of performing a validity check on the following [XML Document]. Assume that the XML parser can correctly process the XML Schema attribute noNamespaceSchemaLocation and schemaLocation.

[XML Document]

<Series
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="series.xsd">
 <Title>XML Tutorial - Professional Application Developer - </Title>
 <Content xmlns="urn:xmlmaster:sample">
  <SubTitle>DOM/SAX</SubTitle>
  <SubTitle>DOM/SAX Programming</SubTitle>
 </Content>
</Series>

[series.xsd]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:import namespace="urn:xmlmaster:sample"
  schemaLocation="content.xsd" />
 
 <xs:element name="Series" type="seriesType" />
 <xs:complexType name="seriesType">
  <xs:sequence>
   <xs:element name="Title" type="xs:string" />
   <xs:element ref="sam:Content" xmlns:sam="urn:xmlmaster:sample" />
  </xs:sequence>
 </xs:complexType>
</xs:schema>

[content.xsd]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 targetNamespace="urn:xmlmaster:sample"
 xmlns:sam="urn:xmlmaster:sample">
 
 <xs:element name="Content" type="sam:contentType" />
 <xs:complexType name="contentType">
  <xs:sequence>
   <xs:element name="SubTitle" type="xs:string" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
</xs:schema>

Option



  1. The document is valid
  2. An error occurs when processing "xsi:noNamespaceSchemaLocation="series.xsd"" in [XML Document], because the code in the XML document is not proper
  3. An error occurs when processing xs:import element in [series.xsd], because the code in the XML schema is not proper
  4. A processing error does not occur; however, this is not a valid XML document

Answer

D

Commentary

This question tests your understanding of XML schema design incorporating namespaces and importing other XML schema. Let's take a look at the sections deemed "not proper" in the answers, beginning with [XML Document], [series.xsd], and then [content.xsd].

First, answer B states that an error occurs when processing "xsi:noNamespaceSchemaLocation="series.xsd"". This xsi:noNamespaceSchemaLocation attribute links an XML document and an XML schema. In other words, under the [XML Document], [XML Document] and [series.xsd] are linked. There are no mistakes in this syntax. Accordingly, answer B is incorrect.

Answer C states that there is a problem with the syntax of xs:import element. Writing xs:import element in the XML schema allows you to import other XML schema. With [series.xsd], this xs:import element is used to designate the import of [content.xsd]; however, there are no mistakes in this syntax. Accordingly, answer C is also incorrect.

With the choices left, it appears that the correct answer hinges on whether [XML Document] is valid against the XML Schema. Let's take a closer look at this.

Under XML Schema, when importing a certain XML schema into another XML schema, you are able to use the elements defined in that XML schema. When defining schema, this mechanism is used for [series.xsd] to use Content element defined in the imported [content.xsd]. Under [series.xsd], child elements (Title element and Content element) of the Series element are defined in order.

Meanwhile, under [content.xsd], the Content element used by [series.xsd] is defined. "targetNamespace="urn:xmlmaster:sample"" is written in [content.xsd]; accordingly, we know that Content element belongs to namespace urn:xmlmaster:sample. SubTitle is defined as a child element of Content element.

Here, take note of the namespace to which SubTitle element belongs. Actually, SubTitle element is declared locally in xs:complexType element, and does not belong to urn:xmlmaster:sample, which is the Content element namespace. Nevertheless, the default namespace (urn:xmlmaster:sample) is used in the [XML Document]. In other words, the Content element and SubTitle element in [XML Document] is written as belonging to namespace urn:xmlmaster:sample. As a result, the SubTitle element namespace written in the XML document differs from the namespace defined in the XML schema, meaning that the result of a validity check would be "not valid."

When using xs:import element, as with the practice question here, the rule states that one must import XML schema with differing namespaces. The xs:include element may also be used to include other XML schema, but when using this element, you must designate XML schema that belongs to the same namespace. Be sure to understand the differences in each approach.

The following are areas that frequently show up on the exam, as well as being very important in actual work applications. I advise you to take the time to go over these areas in detail.

  • Defining any given element or attribute using xs:any element or xs:anyAttribute element
  • Redefining schema definitions using xs:redefine element
  • Defining substitute elements using substitutionGroup attribute of xs:element element
  • Creating element groups or attribute groups using xs:group element or xs:attributeGroup element
  • Defining unique values using xs:unique element or xs:key element

Example of an XML Schema Question Appearing on the Exam - (2)

Select the answer that correctly describes the results of performing a validity check on the following [XML Document]. Assume that the XML parser can correctly process the XML Schema attribute noNamespaceSchemaLocation and schemaLocation.

[XML Document]

<Series
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:noNamespaceSchemaLocation="series.xsd"
 xsi:schemaLocation="urn:xmlmaster:sample content.xsd">
 <Title>XML Tutorial - Professional Application Developer - </Title>
 <Content xmlns="urn:xmlmaster:sample">
  <SubTitle>DOM/SAX</SubTitle>
  <SubTitle>DOM/SAX Programming</SubTitle>
 </Content>
</Series>

[series.xsd]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="Series " type="seriesType" />
 <xs:complexType name="seriesType">
  <xs:sequence>
   <xs:element name="Title" type="xs:string" />
   <xs:any />
  </xs:sequence>
 </xs:complexType>
</xs:schema>

[content.xsd]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
 targetNamespace="urn:xmlmaster:sample"
 xmlns:sam="urn:xmlmaster:sample">
 <xs:element name="Content" type="sam:contentType" />
 <xs:complexType name="contentType">
  <xs:sequence>
   <xs:element name="SubTitle" type="xs:string" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
</xs:schema>

Option



  1. The document is valid
  2. An error occurs when processing "xsi:schemaLocation="urn:xmlmaster:sample content.xsd" in [XML Document], because the code in the XML document is not proper
  3. An error occurs when processing xs:any element in [series.xsd], because the code in the XML schema is not proper
  4. A processing error does not occur; however, this is not a valid XML document

Answer

D

Commentary

Here, as well, we will look at the sections deemed "not proper" in the answers, beginning with [XML Document], [series.xsd], and then [content.xsd].

First, answer B states that an error occurs when processing "xsi:schemaLocation="urn:xmlmaster:sample content.xsd"". The purpose of this schemaLocation attribute is to link the element belonging to namespace in the XML document and the XML schema. In other words, [XML Document] links the element belonging to namespace urn:xmlmaster:sample and [content.xsd]. There are no mistakes in this syntax. Accordingly, B is not a correct answer.

Answer C indicates that an error occurs when processing xs:any element. xs:any element indicates the occurrence of any given element in the location written.

Under [series.xsd], the Series element that does not belong to namespace in [XML Document] is defined. Title element and xs:any (in other words, any given element) are defined as child elements. Accordingly, answer C is incorrect.

As with the previous practice question, it appears that the correct answer hinges on whether [XML Document] is valid. Let's take a look at the validity of [XML Document].As with the previous practice question, it appears that the correct answer hinges on whether [XML Document] is valid. Let's take a look at the validity of [XML Document].

The namespace attribute can be coded in xs:any element. Writing this attribute constrains the namespace of any given element. For example, coding "namespace="##any"" allows you to designate an element that belongs to any given namespace, while writing "namespace="urn:xmlmaster:sample"" allows you to designate an element belonging to the namespace urn:xmlmaster:sample. If the namespace attribute has been omitted, as with this practice question, "##any" is applied as the default value.

The processContents attribute can be written in xs:any element. Using this attribute allows you to designate whether to perform a validity check for any given element, provided that one of the following three values is used as the value of this attribute:

  • strict: Validity check is performed
  • skip: Validity check is not performed
  • lax: Validity check is performed if XML schema is found; otherwise, validity check is not performed

When omitting the processContents attribute as in this practice question, the default value is "strict".

Given the preceding, there is no apparent issue with the Content element coded as the given element in [XML Document]. In this case, however, a validity check for Content element and below will be performed. The child SubTitle element of Content element in [XML Document] belongs to namespace urn:xmlmaster:sample. The problem is that there is no namespace for the SubTitle element defined in [content.xsd]. In other words, since the namespace to which the SubTitle element in the XML document belongs differs from the namespace defined in the XML schema, the result is "not valid." Accordingly, the correct answer is D.


XML Tutorial - XML Master Professional Application Developer Edition Indexs

Go To HOME