Wednesday, January 1, 2003

Scalable Integration of XML Schemas - Chapter 6


Schema Integration, Supply Chain Management and eBusiness
6.1 Schema Integration
As we discussed in our motivations, information integration for companies dealing with other companies is essential and processing such information individually for every partner is not practical. For example there are hundreds of airlines operating and ticketing agents need an integrated view to the ever changing information to process real-time transactions. And this may only be part of a bigger picture, for instance involving credit verification and then another vendor providing delivery service.
Figure 6.1-1 Why Schema Integration
Another example is Meteorological information and Satellite imagery services and the use of this data by aviation and media organisations. Such large scale transactional processes are rapidly moving towards global integration using XML. Using XML for communication has many implications, since XML provides open standard and extensible definitions diverse and changing group of companies can easily plug into existing systems. XML Schemas provide a way to enforce data definitions conforming to standards and using XML Schema similarity computation we can cluster similar Schemas making integration within clusters easier. Schematically the process can be shown as in the following diagram.
Figure 6.1-2 Integration Process
Ideally we may obtain one integrated schema when working on specific domains. This integrated schema can then be used instead, helping an organisation provide better services to its partners.
Integration process for XML Schemas
A cluster contains semantically and structurally similar Schemas. This is very useful since conflicts present will be minimal and easy to resolve. An integration process will involve the following steps:
  1. Tabulation of matching pairs of objects using similarity values as computed
  2. Resolving Name conflicts: Different sources may use different names to express the same object in the real word
  3. Resolving Cardinality conflicts: Different sources may have different cardinalities for the same object
  4. Resolving source importance: Some sources are more important than others. Users want the final global schema to be more similar with the more important source. User queries may be mostly related to those sources, so the rewriting cost may be reduced using this way.
  5. Resolving element/attribute conflict: Information may be expressed as an attribute or element by different authors, and its final representation in the integrated schemas needs thought
  6. Resolving Datatype: Resolving simple primitive types will be easier while complex types present structural changes.
  7. Resolving structural conflict:  This is probably the most important and difficult issue. Different sources may use totally different structures to express the same relationship.
Schema integration transformations must be Information Preserving. [STG02]
Definition: A transformation meets the round-trip criterion if there exists an inverse transformation -1 such that -1(f(x)) = x for all documents x in the domain of f, and where = represents equivalence of XML information sets.
Definition: A transformation is an information-preserving transformation iff it meets the round-trip criterion.
Let us look at Supply Chain first and then apply XClustXS to the problem.
6.2 Supply Chain
A supply chain is a network of facilities and distribution options that performs the functions of procurement of materials, transformation of these materials into intermediate  and finished products, and the distribution of these finished products to customers [GH]. Figure 6.2-1 shows an example of a supply chain. Materials flow downstream, from raw material sources through a manufacturing level transforming the raw materials to intermediate products that are assembled on the next level to form products. The products are shipped to distribution centers and from there on to retailers and customers.
Traditionally, marketing, distribution, planning, manufacturing, and the purchasing organizations along the supply chain operated independently. These organizations have their own objectives and these are often conflicting. The result of these factors is that there is not a single, integrated plan for the organization - there were as many plans as businesses.
Supply chain management is a strategy through which such integration can be achieved. Supply chain management is typically viewed to lie between fully vertically integrated firms, where the entire material flow is owned by a single firm and those where each channel member operates independently.
Figure 6.2-1 Supply Chain Example
Coordination between the various players in the chain is the key in its effective management. The classic objective of logistics is to be able to have the right products in the right quantities (at the right place) at the right moment at minimal cost. This gives rise to a Just In Time (JIT) SCM. Maintaining a Supply Chain System requires descision making which is classified into three level; the strategic, the tactical, or the operational level. [RT97]
Figure 6.2-2 Supply Chain Descisions
Strategic decisions are made typically over a longer time horizon and are closely linked to the corporate strategy and guide supply chain policies from a design perspective. On the other hand, operational decisions are short term, and focus on activities over a day-to-day basis.
6.3 eBusiness and XClustXS
All effective Supply Chain Management decisions are data driven which brings in computers and with the increasing global data gives rise to the new term Electronic Business (eBusiness). Fully vertically integrated firms are rare bringing in B2B (business to business) communication and creating the supply chains. Traditional use of purchase orders, invoicing and inventory management are being replaced by large databases (for example SAP systems) which even include employee management, adding an “e” in every system. In the latest trend mobile users and their needs have also become increasingly important. These systems are specific to each organization and often even proprietary. XML comes in as a key facilitating B2B communications. B2B Communication involves PAIN:
  • Privacy: Ensuring that unauthorized parties cannot read the information that is being transmitted.
  • Authentication: Ensures the identity of a user or source of a transaction to prevent fraudulent use.
  • Integrity: Ensuring that the message content cannot be changed (intentionally or accidentally) or if it is changed, that change can be detected.
  • Non-repudiation: Ensures that the sender cannot deny having sent a transaction, and the recipient cannot deny having received the transaction.
There are numerous such proposed XML standards such as:
  • BizTalk Framework – Microsoft
  • Commerce XML (cXML) – Ariba, HP, Microsoft, webMethods and Sterling Commerce
  • Electronic Business XML (ebXML) – UN/CEFACT and OASIS
  • XML Common Business Library (xCBL) – Commerce One, OASIS, Microsoft, and UN/CEFACT
  • RosettaNet Partner Interface Processes (PIP) – CISCO, Intel, IBM, Dell, FedEx, Ericsson and Dun & Bradstreet
  • Financial Information Exchange Markup Language (FixML) – Goldman Sachs, Solomon Smith Barney and State Street Global Advisors
  • News Industry Text Format (NITF) – International Press Telecommunication Council
Each has its own proprietary methods and tools and specific areas of use and describing them individually is beyond our scope. The underlying principle involved in setting up of standards is that businesses can communicate effectively and increase productivity – EDI (Electronic Data Interchange). However adoption of a standard by all industry players is never the case since new technologies are constantly emerging, thus integration methods for diverse propriety structures are always needed.
Application to Application Integration (A2Ai)
Application to Application Integration (A2Ai) can be described as integration of heterogeneous systems within a corporations firewall. The transformation and communication middleware facilitate the conversion of data suitable from one application to the other. The generic architecture may be represented diagrammatically as below:
Figure 6.3-1 A2Ai Architecture
In this context XML comes into the picture for easy transformation (XClustXS for transformation mappings) of data leading to EAI (Enterprise Application Integration) followed by B2Bi (Business to Business Integration) across corporations and geographies. There may be XML Native applications that use XML for all communications and represent a newer trend. Enabling communication between two such systems can be accomplished by a single transformation process. Older system use proprietary communication protocols. These Non-Native XML applications require additional transformations to convert between these proprietary formats. Diagrammatically these are shown below:
Figure 6.3-2 XML Native Applications
 
Figure 6.3-3 XML Non-Native Applications
Business to Business Integration (B2Bi)
B2Bi can be thought of as similar to A2Ai except that it is outside a corporations firewall and integrates applications in different corporations. It brings in a number of issues related to security and data integrity addressed by XML Signature and XML Encryption.
Figure 6.3-4 B2Bi example
XClustXS
XClustXS can serve as a transformation tool to map from one XML source to another serving in the data translation process. Let us continue with our Airlines example in Section 6.1. The business process model can be seen as a high volume information transaction system with Information Subscribers (like Expedia) and Information Distributors (like Singapore Airlines).
Figure 6.3-5 Information Subscriber/Distributor Business Process Model [RNET]
If there was only one airline things would be very simple for the providers and could easily align their schemas. However there are hundreds of information providers in this case consequently bringing in Data Translation into the picture. Suppose myOrg represents an organisation like Expedia with Org1….Org n being the information providers like the Airlines.
Figure 6.3-6 Data Translation
myOrg would need to connect to seamlessly to the providers and give accurate results to its customers. XClustXS can be used to map and cluster diverse set of schemas and ultimately integrate them and for instance enable faster queries. It can also be used to monitor changes as they occur in information provider schema, for example if an airline makes a change in its Schemas they need to be updated by the subscriber as well. Such integrated clusters may be resolved into a single Schema or each cluster may be handled as an individual. For example consider the following Schema fragments:
Figure 6.3-7 Example Schema Samples and Integration
   <xs:element name="flights_search">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="from" type="xs:string"/>
           <xs:element name="to" type="xs:string"/>
           <xs:element name="departing" type="xs:date"/>
           <xs:element name="returning" type="xs:date"/>
           <xs:element name="adults" type="xs:int"/>
           <xs:element name="children" type="xs:int"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
   <xs:element name="flights_search">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="from" type="xs:string"/>
           <xs:element name="to" type="xs:string"/>
           <xs:element name="depart_dtm" type="xs:date"/>
           <xs:element name="return_dtm" type="xs:date"/>
           <xs:element name="who">
              <xs:complexType>
                 <xs:sequence>
                    <xs:element name="adults" type="xs:int"/>
                    <xs:element name="children" type="xs:int"/>
                    <xs:element name="infants" type="xs:int"/>
                 </xs:sequence>
              </xs:complexType>
           </xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
   <xs:element name="flight_search">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="per_person" type="xs:decimal"/>
           <xs:element name="airline" type="xs:string"/>
           <xs:element name="when_where">
              <xs:complexType>
                 <xs:sequence>
                    <xs:element name="from" type="xs:string"/>
                    <xs:element name="to" type="xs:string"/>
                    <xs:element name="departureDate" type="xs:date"/>
                    <xs:element name="returnDate" type="xs:date"/>
                 </xs:sequence>
              </xs:complexType>
           </xs:element>
           <xs:element name="stop" type="xs:string"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
 
Integrated Schema
   <xs:element name="flight_search">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="per_person" type="xs:decimal"/>
           <xs:element name="airline" type="xs:string"/>
           <xs:element name="when_where">
              <xs:complexType>
                 <xs:sequence>
                    <xs:element name="from" type="xs:string"/>
                    <xs:element name="to" type="xs:string"/>
                    <xs:element name="departure" type="xs:date"/>
                    <xs:element name="return" type="xs:date"/>
                 </xs:sequence>
              </xs:complexType>
           </xs:element>
           <xs:element name="who">
              <xs:complexType>
                 <xs:sequence>
                    <xs:element name="adults" type="xs:int"/>
                    <xs:element name="children" type="xs:int"/>
                    <xs:element name="infants" type="xs:int"/>
                 </xs:sequence>
              </xs:complexType>
           </xs:element>
           <xs:element name="stop" type="xs:string"/>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
These examples (from our flight domain test cases) can be easily reconciled into a larger Schema. There can thus be numerous applications of our XClustXS capability creating value for semi-structured data storage, communication, query and retrieval systems.

No comments: