Monday, March 8, 2010

Mapping XML data to database data

Mapping XML data to database data 


Introduction


When working with XML it is often the need to store key values into a database.  The reasons may vary but include running reports or building interfaces over the data.  Imagine a purchase order system, where an Order represented in XML comes through your system, a response comes back from an external party and later an invoice is returned.  There are many other related documents that could be included in one such business transaction.  There could be many asynchronous XML messages flowing over time that in the end all relate to form a single trading transaction; ie fulfillment of goods or services.  Some sort of reporting system will need to be built around that.  Building such a system on top of raw XML is not an option it would be too difficult, slow and cumbersome.  In such a case it is common to extract key pieces of the XML into database tables.  The may be many other use cases for extraction data from XML but the focus of this will be on the plain old order process (POOP!)

How your design turns out will be dependent on your needs and the source and target data.  Some differences will apply such as:
  1. XML data may hold a lot more data than is required for the database
  2. XML data can have a very deep hierarchy
  3. Table data will be most likely flattened down
  4. Table data may be more generic, ie XML may have Order, ShipmentNotification and Invoice as different definitions but they may be stored in the same table(s)
  5. Storing into the tables gives the opportunity to relate easily all the XML documents

Using Java libraries we can make this job pain free (disclaimer: we are playing with technology here so expect a little ouch!) .  Note: I expect .Net can do similar or even easier using LINQ but I have not yet explored that far yet

By example I use the following frameworks:
  1. JAXB to turn XML data into the object domain
  2. Cayenne an Object relational framework to go from the object domain to database tables
  3. Dozer an object mapping framework to map from JAXB to Cayenne objects.  The use of this is optional but can speed up development, offers great flexibility and improve maintenance tasks.

By using these 3 we don't have to write any XML and database manipulation code.  This could cut down development time, improve reliability and reduce maintenance tasks by many factors.

Let get into to it by using real word examples.

JAXB - Map XML to Objects

Version used 2.2.

JAXB as of JDK1.6 is included with the JDK making it a convenient choice.  Of course there are choices to this in the Java world such as XMLBeans

In my experience (and carrying on from the intro Order example) one of the largest XML schemas I have worked with is XCBL.  This is a large and comprehensive free set of procurement XML schemas.

Whatever the schema's you are using (or not) you will need to have JAXB generate Java source for you based on the schema.  If you only have XML and dont have a schema you can generate them for you based on XML using a tool such as OxygenXML.

So elts generate a bunch of Java source code based  on the schema.  Assuming you are using xcbl and have the schemas extracted into a folder run (Java installed is a great option too for xjc - XML Java compiler):

xjc -d java -p com.mycompany.jaxb.xcbl xcbl30/XCBL30.xsd -verbose

Xcbl is a complex set of schemas.  The top-level schema XCBL30.xsd includes all other top level schemas and in turn the other include other schemas.  After a moment of running you will see output like:

parsing a schema...
compiling a schema...
[INFO] generating code
unknown location

com\mycompany\jaxb\xcbl30\ASNBaseItemDetail.java
com\mycompany\jaxb\xcbl30\ASNDates.java
com\mycompany\jaxb\xcbl30\ASNDetail.java
com\mycompany\jaxb\xcbl30\ASNHeader.java
com\mycompany\jaxb\xcbl30\ASNItemDetail.java
...etc


The Java generated for Order will be:

//
// This file was generated by the JavaTM Architecture for XML Binding(JAXB) Reference Implementation, vJAXB 2.1.10 in JDK 6 
// See <a href="http://java.sun.com/xml/jaxb">http://java.sun.com/xml/jaxb</a> 
// Any modifications to this file will be lost upon recompilation of the source schema. 
// Generated on: 2010.03.05 at 02:59:26 PM EST 
//


package com.mycompany.jaxb.xcbl30;

import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlType;


/**
 * <p>Java class for Order complex type.
 * 
 * <p>The following schema fragment specifies the expected content contained within this class.
 * 
 * <pre>
 * &lt;complexType name="Order">
 *   &lt;complexContent>
 *     &lt;restriction base="{http://www.w3.org/2001/XMLSchema}anyType">
 *       &lt;sequence>
 *         &lt;element ref="{rrn:org.xcbl:schemas/xcbl/v3_0/xcbl30.xsd}OrderHeader"/>
 *         &lt;element ref="{rrn:org.xcbl:schemas/xcbl/v3_0/xcbl30.xsd}OrderDetail" minOccurs="0"/>
 *         &lt;element ref="{rrn:org.xcbl:schemas/xcbl/v3_0/xcbl30.xsd}OrderSummary" minOccurs="0"/>
 *       &lt;/sequence>
 *     &lt;/restriction>
 *   &lt;/complexContent>
 * &lt;/complexType>
 * </pre>
 * 
 * 
 */
@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "Order", propOrder = {
    "orderHeader",
    "orderDetail",
    "orderSummary"
})
public class Order {

    @XmlElement(name = "OrderHeader", required = true)
    protected OrderHeader orderHeader;
    @XmlElement(name = "OrderDetail")
    protected OrderDetail orderDetail;
    @XmlElement(name = "OrderSummary")
    protected OrderSummary orderSummary;

    /**
     * Gets the value of the orderHeader property.
     * 
     * @return
     *     possible object is
     *     {@link OrderHeader }
     *     
     */
    public OrderHeader getOrderHeader() {
        return orderHeader;
    }

    /**
     * Sets the value of the orderHeader property.
     * 
     * @param value
     *     allowed object is
     *     {@link OrderHeader }
     *     
     */
    public void setOrderHeader(OrderHeader value) {
        this.orderHeader = value;
    }

    /**
     * Gets the value of the orderDetail property.
     * 
     * @return
     *     possible object is
     *     {@link OrderDetail }
     *     
     */
    public OrderDetail getOrderDetail() {
        return orderDetail;
    }

    /**
     * Sets the value of the orderDetail property.
     * 
     * @param value
     *     allowed object is
     *     {@link OrderDetail }
     *     
     */
    public void setOrderDetail(OrderDetail value) {
        this.orderDetail = value;
    }

    /**
     * Gets the value of the orderSummary property.
     * 
     * @return
     *     possible object is
     *     {@link OrderSummary }
     *     
     */
    public OrderSummary getOrderSummary() {
        return orderSummary;
    }

    /**
     * Sets the value of the orderSummary property.
     * 
     * @param value
     *     allowed object is
     *     {@link OrderSummary }
     *     
     */
    public void setOrderSummary(OrderSummary value) {
        this.orderSummary = value;
    }

}




Simple enough?

So now we have a set of Java sources ready to be populated with data.  JAXB and other similar technologies call this pushing of XML data into Objects "unmarshal".  Similarly going from Objects back to XML (as a String say) is  a "marshal" operation.

For this exercise we are only going from XML to Objects (and then to database) but not back.

So to unmarshal from an XML String:


        StringReader stringReader = new StringReader(someXMLString);
     StreamSource source = new StreamSource(stringReader);
   
     Order order = JAXB.unmarshal(source, Order.class);



That is it.  Make sure your input XML does match the schema including namespaces (do it all by the book) or it will not work...such as:

<?xml version="1.0" encoding="utf-8"?>
<Order xmlns="rrn:org.xcbl:schemas/xcbl/v3_0/xcbl30.xsd"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="rrn:org.xcbl:schemas/xcbl/v3_0/xcbl30.xsd XCBL30.xsd">
    <OrderHeader>
        <OrderNumber>
            <BuyerOrderNumber>
etc etc...


So now we have objects containing data that represent the input XML.  In code we can now query the values that have been unmarshal'd such as:

order.getOrderHeader().getOrderNumber().getBuyerOrderNumber()
      
List<ItemDetail> itemDetails = order.getOrderDetail().getListOfItemDetail().getItemDetail();

All very nice, something big, simplified.

Lets turn to the database layer.


Cayenne - Objects to database

Version used 3.0RC2

Cayenne and other Object relational tools provide a great abstraction from mundane database and JDBC operations.  I like Cayenne and have used it on a number of projects with great success and very little issues to overcome along the way.  I must say I have never used it to its fullest.  It has a rich feature set, great support community and I believe a bright future.

Features at a glance:
  1. Reverse engineer a database to Cayenne objects
  2. Database neutral
  3. Has a great modelling tool
  4. insert data - create new objects, call set methods and commit
  5. update data - select existing objects, call set methods and commit
  6. delete data - select existing objects, call delete
  7. select data - methods to choose from depending on complexity of  select, simple object graph selection to EJBQL
  8. Remote Object Persistence
  9. Caching strategies
  10. XML serialization

and much more.

There are 2 options for getting your Cayenne project off the ground, which I have both used succesfully.
  1. Building your database schema from scratch
  2. Reverse engineering an existing schema

I have another set of tutorials/examples to work from for more detail on this as does the Cayenne community but in a nutshell, lets assume we will create a Cayenne dataset from scratch.

Assuming you have downloaded Cayenne, fired up the modeler, have connected to a database and had a quick look around (ie you know a trick or 2) lets create a a very basic couple of entities to save our XML values into.

The basic structure is a top level entity 'Document' which has:
  1. 1 and only 1 'Heade'r entity
  2. 0 to many 'Item' entities
  3. 1 and only 1 'Summary' entity 

I don't want to go into much Cayenne specifics here so here are a couple of screenshots of the entities:

Document entity:



The generated Document class has no attributes only relationships:

Item class attributes:


Basic Cayenne use
Obtain a "DataContext", this can be done a number of ways depending on your application, the important thing is that it can find your cayenne.xml file in the classpath.  There are a few ways to do this.

Main method/standalone app:
DataContext dataContext = DataContext.createDataContext();

Web app (using Apache Click):
 <filter>
        <filter-name>DataContextFilter</filter-name>
        <filter-class>org.apache.click.extras.cayenne.DataContextFilter</filter-class>
        <!--<filter-class>com.quadrem.common.click.cayenne.ObjectContextFilter</filter-class>-->
        <init-param>
            <param-name>session-scope</param-name>
            <param-value>true</param-value>
        </init-param>
        <init-param>
            <param-name>shared-cache</param-name>
            <param-value>true</param-value>
        </init-param>


Spring :
    <bean id="cayenneConfiguration" class="org.apache.cayenne.conf.DefaultConfiguration"
          init-method="initializeSharedConfiguration" >        
    </bean>


    <bean id="context" class="org.apache.cayenne.access.DataContext"
          factory-method="createDataContext" depends-on="cayenneConfiguration">
        <constructor-arg value="Domain"/>
    </bean>




Once we have a handle on a DataContext we can create (insert will be generated for you):
Header header = new Header();
header.setDocNumber("1234");
header.setDocument(document);
dataContext.registerNewObject(header);
dataContext.commitChanges();
                                                                                

Basic select:

final Expression where = ExpressionFactory.matchExp(Header.DOC_NUMBER_PROPERTY, "1234");

SelectQuery query = new SelectQuery(Header.class, where);
List values = dataContext.performQuery(query);


Update:

// select an object as per above ...
Header header = (Header) values.get(0);
header.setDocNumber("2222");
dataContext.commitChanges();

Delete:

// select an object as per above ...
Header header = (Header) values.get(0);
dataContext.deleteObject(header);
dataContext.commitChanges();



A lot of power without even writing any SQL and that has just scratched the surface.  And the really nice thing is you can switch to whatever database you need and only have to change a config value or 2, the DB specific syntax Cayenne does for you.

So now it would be possible to manually map the JAXB classes to Cayenne all in Java, but that could become tedious.

Dozer - Object to Object mapping

Version used: 5.2.0

Dozer is a framework that provides a simple configurable way of mapping objects.  If the objects methods or attributes being mapped have the same name then mapping is automatic.  If not then you can declare what is mapped to where.  I admit in mapping a deep XML tree to the database classes above you loose a lot of the automatic mapping power which does diminish the value of Dozer a little in this case.  However you are still making the mapping more of a config exercise rather than hardcoding.  At the time of writing a Dozer GUI tool was no longer available but I believe some work was being done to restore it, such a task lends itself well to a GUI tool.

3 steps are required to get going with Dozer:
  1. Obtain a Dozer mapper class
  2. Write your mapping
  3. Run the mapping with your instantiated classes

Obtain Dozer mapper:
In code:
Mapper mapper = new DozerBeanMapper();


Using Spring:

    <bean id="dataExtractMapper" class="org.dozer.DozerBeanMapper">
        <property name="mappingFiles">
            <list>
                <value>dozer-global-configuration.xml</value>
                <value>dozer-order-mappings.xml</value>
            </list>
        </property>
    </bean>


Write a mapping:



    <mapping wildcard="false" type="one-way">
        <class-a>com.myco.jaxb.xcbl30.OrderHeader</class-a>
        <class-b>com.myco.model.Header</class-b>
        <field>
            <a>listOfStructuredNote.structuredNote[0].generalNote</a>
            <b>description</b>
        </field>
        <field>
            <a>orderNumber.buyerOrderNumber</a>
            <b>docNumber</b>
        </field>
        <field>
            <a>orderDates.promiseDate</a>
            <b>dueDate</b>
        </field>
        <field>
            <a>orderIssueDate</a>
            <b>orderDate</b>
        </field>
        <field>
            <a>orderType.orderTypeCoded</a>
            <b>xcblOrderType</b>
        </field>
    </mapping>


Run the mapping:



Header header = new Header();

StringReader stringReader = new StringReader(someXMLString);
StreamSource source = new StreamSource(stringReader);
Order order = JAXB.unmarshal(source, Order.class);

getBeanMapper().map(order.getOrderHeader(), header);
dataContext.registerNewObject(header);


List<ItemDetail> itemDetails = order.getOrderDetail().getListOfItemDetail().getItemDetail();
for (ItemDetail itemDetail : itemDetails) {
    Item item = new Item();
    dataContext.registerNewObject(item);
    item.setDocument(document);
    getBeanMapper().map(itemDetail, item);
}

dataContext.commitChanges();



Studying the above mapping process notice that it is driven from the data classes.  ie the target data drives the process of mapping.

No comments: