Tuesday, June 11, 2013

Creating Internal XML Documents in Talend Open Studio

Creating Internal XML Documents in Talend Open Studio

The Talend Open Studio component 'tWriteXMLField' is used to convert a set of input fields into XML with each row producing its own XML document.  If you'd like to combine several rows into one XML document, create a Document object beforehand, and append with a DOM call in a tJavaRow.

The tWriteXMLField will produce an XML document for each input row.  For example, an input row (firstName=Carl, lastName=Walker) could produce a document like Carl Walker .  However, you may want to combine several of these "contacts" under one document.

See this post for a basic usage of tWriteXMLField: tWriteXMLField Component in Talend Open Studio.

The Job

This job starts with a tJava that creates a Document that will be used in the row-oriented processing of the second subjob which contains a tWriteXMLField.  The second subject is driven from a tFixedFlowInput, but this could be any input source available to Talend.  Finally, the document as a whole is outputted to System.out.


Job Appending tWriteXMLField Results to a Global Doc
The tJava_1 component contains Java code that creates an org.dom4j.Document object and appends a rootElement 'players'.

Create a Document Object and add a Root Element
The tFixedFlowInput_1 is shown here for completeness and isn't required for the XML production.  Any input (RDBMS, file, etc.) will work.

Input for Job
The tWriteXMLField configuration begins with a single 'doc' field.   Here is the Component View of the tWriteXMLField.  In the Advanced Settings, the "Remove the xml declaration" checkbox is set for debugging, but isn't required for this job.

tWriteXMLField
The schema is a single element.
tWriteXMLField Schema
In the XML tree configuration, I map the input to subelements.  The component requires a Loop Element to be set, but doesn't seem to have any meaning.  (I think this is because the same XML dialog is reused from other places in Open Studio where loop is meaningful.)

XML Mappings
Finally, a tJavaRow is used to issue a Java command that will append each input row -- as well-formed XML -- to the global document.  This one-liner pulls the Document from the globalMap, gets the rootElement () and appends the row's root element ().

Appending Row XML Subtree to Global Document
There is a component at the end of this job to print out the results, verifying the document.

Output the XML
These are the results.

Results of Running the Job
Talend Open Studio has a versatile set of components for producing XML fields.  tAdvancedFileOutputXML is the most useful.  However, if you need to work with the XML as an internal document, then this technique -- blending tWriteXMLField with a Java wrapper -- will work.

1 comment:

owarnier said...

The problem is the different data type, the org.dom4j Document isn't the same as routines.system.Document.
Do I can't cast from dom4j to Talend Document type, sadly it's impossible ti use this for a tRESTRequest :(