Thursday, June 13, 2013

Using XPaths for XML Input in Talend Open Studio


If you have an XML document based on a schema that requires transformation, consider using XPaths in Talend Open Studio to flatten the hierarchical file for loading.
Consider the following XML fragment.

 
   
   
 


The fragment lacks "patientId" and "firstName" elements.

XPaths can be used to flatten this by mapping the attribute "hrediattribute" values to different columns based on the name attribute.  (That's the "name" attribute of the element "hrediattribute".)

For patientId, the XPath would be

hrediattributes/hrediattribute[@name='patientId']/@value

And for firstName, the XPath would be

hrediattributes/hrediattribute[@name=firstName]/@value  

If the hrediattribute elements have their values in the element body (11111
File XML in Talend Open Studio

The following screenshot is from Talend Open Studio's File XML Wizard.  The File XML is used as an Input XML and is available here.

To test this, I dragged the File XML onto the canvas as an input and hooked up a tLogRow.  Here are the results from the run.

Starting job HREDIXMLFile at 08:47 30/05/2011.
[statistics] connecting to socket on port 3547
[statistics] connected
111111|Carl
222222|Joe
[statistics] disconnected
Job HREDIXMLFile ended at 08:47 30/05/2011. [exit code=0]


If you need to transform your XML document, consider using XPaths in Talend Open Studio rather than a separate XSL.  Although you can call the XSL transformation from TOS, that won't take advantage of the TOS' browsing and dependency checking.

Specifying an XSD

Although the File XML wizard is labeled "File Settings / XML" (TOS 4.2.1), an XSD can be entered.  The XSD must be a local file.  However, make sure that any references within the XSD are web resources and not local files.  If the XSD imports another XSD namespace, the schemaLocation should to something accessible on the web and not another local file.

A Second Example

If some of the enclosing parent elements have data that needs to be mapped, additional xpaths are required.  Because the xpaths are referencing elements outside of the loop, the relative xpaths in the first example won't work with out backing up (../..) or using absolute paths.

This example also needs transId mapped.  An absolute path selecting all transIds is used (//@transId).




111111
Carl


The full XML file is here.


In order to dig into the specifics instances of the hredielements, additional attribute selectors are used

hredielement[@name='patient']/hrediattributes/hrediattribute[@name='patientId']

This is a screenshot of the mappings entered into File XML metadata.

Additional XPaths Example
Loop Element

The position of the loop element will determine the repeating rows.  Using a loop element on "contact" on the following XML







 

 
produces

HUXLEY INDUSTRIES, INC.|David|King
HUXLEY INDUSTRIES, INC.|Sybil|Bedford


with the companyName repeating.  If the loop element is "company" instead -- and companyName, firstName, lastName is mapped -- then only the first row (with "David King") would be displayed.

No comments: