Tuesday, June 11, 2013

Updating XML in a Database with Talend Open Studio


If an XML document is serialized in a database column, you can use tExtractXMLField to make the data within the document available for general Talend Open Studio processing (tMap).  If the database column needs to be modified with an updated XML document, use a tXMLMap linked to a tMap.

tExtractXMLField is a Talend Open Studio (TOS) component that parses a field containing XML and breaks the structure out into individual fields.  Once the data is out of XML, the resulting schema can be processed using other TOS components like tMap.

If an update is required, use a tXMLMap to rebuild the structure and direct the flow to an output component like tMySqlOutput.  A tMap will need to be added in order to convert the Document output of tXMLMap to an output schema; implicit cast-style conversions or tConvertType won't work.

XML Document


Suppose this XML document was created by front-end Javascript and saved to the database.  The document records user preferences which include a theme (modern) and a color palette (reds).




reds


RDBMS Schema


The XML document can be saved in a Text column of a MySQL database.  The Text column is like an Oracle Clob which removes the maxlength limit of the typical VARCHAR data type.  This database record stores an id (preferences_id) and a unique business key (username) along with the XML for easy retrieval by a RESTful web service.

DB Schema Showing a Text Column with XML

Use Case

The use case described in this post is to migrate the data stored in the XML document to a new value.  Specifically, the theme named "modern" is to be renamed "modern1".  One way to handle this is to use a string replacement, but a more flexible way to handle this is to retain the data's definition by dealing with the perferences_xml field as XML rather than a simple string.  To apply the new theme name, the XML document will be read, converted internally into something usable by TOS, modified with the results of a lookup value, and written back out the the database.


Job Parsing and Re-forming XML
"preferences" is a MySQL table used to select the XML document (preferences_xml).  It is also a target, the tMySqlOutput component.  "preferences_update" is a two-column lookup table that maps an old theme name to a new theme name.

tExtractXMLField

The tExtractXMLField parses and queries the XML document with XPaths.  From the XML document, it creates preference_theme and preference_color columns.  Although preference_color is not modified, it needs to be extracted so that it can be re-formed later.

tExtractXMLField Configuration
The tExtractXMLField flows into a tMap which joins the preferences_update lookup with the newly-created tExtractXMLField schema (the broken out fields).  This tMap should be familiar if you have read the documentation or studied basic lookup examples.
Join Tables to Update Preferences
tXMLMap

The tXMLMap takes the individual fields -- now converted by the preferences_update lookup -- and re-forms the XML document in a field "preferences_xml".  To use tXMLMap, make sure that the target type is Document.  This will make the basic subelement/attribute right-click menu functions available.  Build the structure and map the fields as you would with any XML component.

tXMLMap Configuration
Document to String Conversion

A second tMap is used to convert the Document coming from the tXMLMap into the String used by  the target tMySqlOutput.  It seems like this should happen automatically, but you'll get an error even if you add an interceding tConvertType.  Rather, call the toString() method of the Document as in this configuration.

Document to String Conversion
Update Configuration

Finally, an update is executed using a tMySqlOutput component.  This update uses the Repository schema for preferences.  But it uses the Advanced settings' Field options as specified below to prevent username from being overridden with a NULL value.  Update is the action used.

Update Only preferences_xml
tXMLMap forms an XML document from input fields.  Unlike the output XML components, though, processing can continue with TOS.  The structures defined in the tXMLMap can be put into a field to be written out immediately, or combined with additional transformations in other components.

A Note about tWriteXMLField:


tWriteXMLField has similar capabilities to tXMLMap.  However, tWriteXMLField will only output a single field of XML.  That's good for inserting, however it may not work for updating where a unique key or id needs to be carried through.  See TDI-8392 on Talend's bug tracker for more info.

No comments: