Wednesday, January 1, 2003

Scalable Integration of XML Schemas - Chapter 3


Modeling XML Schema
3.1  DTDs and DTD Trees
DTDs with their limited complexity are easily represented as trees. Although recursions are permitted they are actually graphs but for simplicity we refer and draw them as trees. The following example shows how a tree represents a DTD.
Figure 3.1-1: Example DTD modeled as a tree
<!ELEMENT Article (Title, Author+, Sections+)>
<!ELEMENT Sections (Title?, (Para | (Title?, Para+)+)*)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Para (#PCDATA)>
<!ELEMENT Author (Name, Affiliation)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Affiliation (#PCDATA)>
However the above tree is too simplistic and is not applicable to XML Schema. We will discuss the model that we have proposed for XML Schema.
3.2  XML Schema
As per XML Schema requirements mentioned in Section 1.2, XML Schemas present a whole new dimension in data definition and it is powerful enough to be directly converted into a Java class. In addition to traditional features provided by DTDs, XML Schema have their own powerful features that we exploit in our algorithm, these include:
  • Data types
  • Namespaces
  • A wide variety of grouping methods
  • Derived elements and inheritance
  • Referenced objects
  • Global and local definitions
The following example demonstrates a wide range of XML Schema abilities.
Figure 3.2-1: Example XML Schema Definition (XSD)
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
       Sample Stock Quote Schema, adapted from web and W3C XSD Primer Part 0
    </xsd:documentation>
  </xsd:annotation>
  <!-- global element -->
  <xsd:element name="stockfile">
    <xsd:complexType>
       <xsd:sequence>
         <xsd:element name="mystock" type="mystockType" maxOccurs="unbounded"/>
       </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
  <xsd:element name="comment" type="xsd:string"/>
  <!-- Main complex type -->
  <xsd:complexType name="mystockType">
    <xsd:sequence>
       <xsd:element name="symbol" type="xsd:string"/>
       <xsd:element name="company" type="companyAddrType"/>
       <xsd:element name="currentQuote" type="quoteType"/>
       <xsd:element name="lastCloseQuote" type="quoteType"/>
       <xsd:element name="change" type="xsd:double"/>
       <xsd:element name="volume" type="xsd:long"/>
       <xsd:element ref="comment" minOccurs="0"/>
    </xsd:sequence>
    <xsd:attribute name="dateBought" type="xsd:date"/>
  </xsd:complexType>
  <!-- Following two complex types nested inside Main -->
  <xsd:complexType name="quoteType">
    <xsd:all>
       <xsd:element name="date" type="xsd:date"/>
       <xsd:element name="time" type="xsd:time"/>
       <!-- date time may be in any order -->
    </xsd:all>
    <xsd:attributeGroup ref="priceAttrib"/>
  </xsd:complexType>
  <xsd:complexType name="companyAddrType">
    <xsd:sequence>
       <xsd:element name="name" type="xsd:string"/>
       <xsd:element name="address" type="xsd:string"/>
    </xsd:sequence>
    <xsd:attribute name="country" type="xsd:string" use="required"/>
  </xsd:complexType>
  <!-- Attrib group -->
  <xsd:attributeGroup name="priceAttrib">
    <xsd:attribute name="currentPrice" type="xsd:float" use="required"/>
    <xsd:attribute name="currency" type="xsd:string" default="SGD"/>
  </xsd:attributeGroup>
</xsd:schema>
The above Schema presents many features that can be easily mapped into a graphical representation of our proposed XST model as follows
Figure 3.2-2: XSD representation in proposed XST model
3.3  Proposed XST Model
As we have highlighted in the previous subsection, there are many structures in XML Schema [W3CXS1], [EV00] and our aim is to model them explicitly express the relationship among various structures. Existing models like OEM, DOM cannot model efficiently the rich data presented in a XML Schema. While [ORASS] makes a good effort to model semi-structured data, it fails to capture completely all the information represented in a XML Schema such as data types, cardinality, grouping methods etc as we see in Figure 3.3-1. We develop XML Schema Tree (XST) that represents concisely the semantically rich XML Schemas. All XML Schemas can be represented as graphs using this model, and it shows the precise semantic relationship between elements, attributes and types.
Figure 2.3-1: Modeling as [ORASS] Schema Diagram
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
   <xs:element name="contact">
     <xs:complexType>
        <xs:sequence>
          <xs:element name="name">
             <xs:complexType>
               <xs:sequence>
                  <xs:element name="first" type="xs:string"/>
                  <xs:element name="last" type="xs:string"/>
               </xs:sequence>
               <xs:attribute name="title">
                  <xs:simpleType>
                    <xs:restriction base="xs:string">
                       <xs:enumeration value="mr"/>
                       <xs:enumeration value="mrs"/>
                       <xs:enumeration value="dr"/>
                       <xs:enumeration value="prof"/>
                    </xs:restriction>
                  </xs:simpleType>
               </xs:attribute>
             </xs:complexType>
          </xs:element>
          <xs:element name="address" type="xs:string"/>
          <xs:element name="phone" type="xs:string"/>
        </xs:sequence>
     </xs:complexType>
   </xs:element>
</xs:schema>
            The fundamental constructs in the proposed XST model are:
  • Objects (Elements/Attributes)
  • Types
  • Links/Edges
The XML Schema recommendations [W3CXS1] and its representation in the XST model are based on simple guidelines.<import><include> and <redefine> must be resolved before attempting to create a tree; externally referenced sources cannot be accommodated. Recursive types are expressed by a loop back.
We have defined the following types of nodes in the XST model. Table 2.3-1 gives a summary of the notation used.
  1. Element
  2. Named Complex Type
  3. Anonymous Complex Type
  4. Named Simple Type
  5. Built-in Simple Type
  6. Anonymous Simple Type
  7. Attribute
  8. Named Attribute Group
  9. Special
  10. Named Grouping Structure
  11. Grouping Structure
Table 3.3-1: Summary of XST Notation
Node
Remark
Element
  • Annotated with name of element
  • Cardinality constraint/other constraints written inside element (1)
Named Complex Type
  • Annotated with name of complex type
  • Multiple elements can point to it (Triple Line)
Anonymous Complex Type
  • No annotation
  • Only one element can have this type (Triple Line)
Named Simple Type
  • Annotated with name of simple type
  • Multiple nodes can point to it (Double Line)
Anonymous Simple Type
  • No annotation
  • Only one element can have this type (Double Line)
Built-in Simple Type
  • Annotated with its type name (string, int etc)
Attribute
  • Annotated with name
  • Can occur only as child of an attribGrp
  • Cardinality is implicit (2)
  • Other Constraints maybe annotated
Special
  • Exceptional and uncommon occurrences, structures that do not affect semantics. For example Annotation, Restrictions (regular expressions, enumeration) etc.
Named Grouping Structure
  • Annotated with name
  • TYPE can be any one of all,choice, sequencegroup and
  • attributeGroup is also permitted even though it is not a grouping structure
Grouping Structure
  • Same as above but not named
  • Only one can structure may have it has its child
Notes:
(1)   Element Cardinality constraints:
    • * minOccurs=0, maxOccurs=unbounded
    • + minOccurs=1, maxOccurs=unbounded
    • ? minOccurs=0, maxOccurs=1
    • Write other conditions specifically (example: use= “required” minOccurs=2, maxOccurs=4)
    • minOccurs=1, maxOccurs=1 by default
(2)   Attribute Cardinality constraints:
o       minOccurs=0, maxOccurs=1 these are implicit and cannot be changed
Write other conditions specifically (example: use = “required” , default = “2” )
Links/Edges: These provide an instant clue to the Data Type for a particular object.
Single – Link between elements
Double – Element and Simple Type
Triple – Element and Complex Type
Table 3.3-2: XST Link/Edges
 
Representation
Remark
(1)
General
(2)
Between an Element or Attribute and a Built-in Simple Type or Anonymous Simple Type
(3)
Between an Element and an Anonymous Complex Type
(4)
Points to a referred type
(5)
Between an Element or Attribute and a Named Simple Type (Ref’ed)
(6)
Between an Element and a Named Complex Type (Ref’ed)
As an example the following XST represents the same schema in Figure 3.3-1
Figure 2.3-2:Modeling as XST
Also refer to Figure 3.2-2 that shows the corresponding XST model for XML Schema in Figure 3.2-1

No comments: