Tuesday, June 11, 2013

XPath Functions in Talend Open Studio

XPath has standard functions like fn:name() that will retrieve the name of an XML element.  You can use these in Talend Open Studio when certain parts of your XML aren't known in advance.


In this example, an XML document contains nodes representing filenames that aren't known in advance: APIUSER.INI, CISConsis.ini, CLAIM.INI.

Define the Loop

The first step to process this file with Talend Open Studio is to define the loop.  In this case, the innermost element 'ROW' will provide me with establish the granularity.  I expect each of the four ROW elements to generate a record in the main flow of a job.

The attributes in the ROW element, FieldName and Value are mapped using a single attribute selector.

Job Processing File with Unknown XML Elements
Add in Context

The parent elements -- Section, APIUSER.ini, etc. -- give the ROW element context and help differentiate one part's ROW from another's.  Section Name is mapped using an attribute selector, but with a parent reference (the relative ../).  Section Name will appear in each of the ROWS.  Since there are only three section for four elements, one of the Section Names will be repeated.

Mapping the Unknown XML

APIUSER, Cisconsis, and CLAIM are not known in advance, but they can be mapped using the same relative technique used for Section.  However, they will require the XPath function name() to provide actual data, since there is no identifying Name element the way there is with Section.

Results

Running the job produces the following four-record result set.

Output with Filename and Section Repeating Groups
XPath functions are a powerful way to process an XML document with loosely-defined or variable elements.  There are a number of string, numeric, and date functions available from XPath (count(), dateTime(), matches(), true()).  However, avoid making the logic of XPaths overly complex.  There are cleaner ways of handling business rules with the data-oriented components.

No comments: