Tuesday, June 11, 2013

xsd:sequence Example with Talend Open Studio


To process an xsd:sequence of XML elements in Talend Open Studio, set the Loop XPath Query to the sequence and define relative xpaths for the remaining elements.

An xsd:sequence is an ordered list of XML elements.  The bounds of the list can be specified in an XSD.  The list can be unlimited which means that XML processors should build accordingly.  If the sequence is bounded, you can alternatively simplify and flatten the data structure.  For example, elements street_address_1 and street_address_2  may be easier to handle than multiple address/street elements.

In this XML document, there is a sequence 'tel' containing one or more elements for each 'person'.



   
    Alan
    02087654321
    07654321098
   

   
    Bill
    02078901234
    07890123456
   

   
    Chas
    02066666666
   


Array Syntax
 
A possible mapping for the person elements is to define a loop 'person', map a 'name' xpath, and map each possible 'tel' element using an array index-like syntax: tel[1] or tel[position()=1].  This implies that your schemas will set a bound that may or may not to adhere to the specification.  For example, if I define only 'tel1' and 'tel2' and a system sends home, office, and mobile numbers, one will be omitted.

With the following Talend Open Studio job,

XML Source to Log Target Job
 The following tInputFileXML configuration will map the person/name and the first two person/tel elements to the target.

Individually Mapping Elements
General Solution

A more versatile handling will push the loop definition onto 'tel' and map the other XML elements ('name') relatively.  This seems a bit counter-intuitive as I wrote in "XPath Loops in Talend Open Studio".  After all, 'tel' is just one field in the more important 'person' data structure.

Building a Loop Around a Sequence
Running the job produces the following

Running Xpath-based Job with Loop on Sequence
Setting the loop on the right element is the key to creating an XML processing job in Talend Open Studio.  Sometimes the "right element" is a field contained in a more significant data structure.  It's counter-intuitive, but after a few jobs, you'll be able to create the correct tFileInputXML configuration.

No comments: