Thursday, June 13, 2013

A Talend Open Studio Walkthrough: Producing XML


XML is used for data transfer because it carries with it metadata and structure.  Talend Open Studio is well-suited to forming XML from sources like relational databases.

If data is modeled using an XML Schema (XSD), Talend Open Studio can read the schema and produce structures for mapping a data source like an RDBMS.  Much of this process is automated in TOS, but some manual steps may be needed to "flatten" a hierarchical XML tree.

Dimensional Model Source

The following dimensional model is a source of budget data.  The diagram was created in Sparx Systems Enterprise Architect 8.

Dimensional Model Source for XML Output
For convenience, I built a view on this model that joins all of the dimensions to the fact table.  I did this because I'm also building reports using Jaspersoft iReport Designer, and I don't want to code the same join repeatedly.  the view is called "BUDGET_ITEM_VW".

Target XSD

The view is used as the source of the RDBMS input (tAccessInput) for Talend Open Studio job that creates an XML file.  The output XML is based on the following XML model.   The diagram was also created in Sparx Systems Enterprise Architect 8.
XML Model Transferring Budget Items

 Enterprise Architect also generated a schema (XSD) from this diagram.  The schema can befound here.

Walkthrough

The following video walks through a Talend job that will produce this XML file from an RDBMS source.

http://www.youtube.com/watch?feature=player_embedded&v=3JTNvx3CdN0#t=0s

Manual Steps for Schema

Talend's lets you load an XSD for use in mappings.  The elements appear as part of the schema.  These elements are used to set up loops within the document.  However, not all of the data elements -- particularly attributes -- will be present.  In the video, there is a spot highlighting several fields that were added through the XML File Wizard.  Every data element produced by the RDBMS should be present in the XML File's schema.

No comments: