If an XML document is serialized in a database column, you can use tExtractXMLField to make the data within the document available for general Talend Open Studio processing (tMap). If the database column needs to be modified with an updated XML document, use a tXMLMap linked to a tMap.
tExtractXMLField is a Talend Open Studio (TOS) component that parses a field containing XML and breaks the structure out into individual fields. Once the data is out of XML, the resulting schema can be processed using other TOS components like tMap.
If an update is required, use a tXMLMap to rebuild the structure and direct the flow to an output component like tMySqlOutput. A tMap will need to be added in order to convert the Document output of tXMLMap to an output schema; implicit cast-style conversions or tConvertType won't work.
XML Document
Suppose this XML document was created by front-end Javascript and saved to the database. The document records user preferences which include a theme (modern) and a color palette (reds).
reds
RDBMS Schema
The XML document can be saved in a Text column of a MySQL database. The Text column is like an Oracle Clob which removes the maxlength limit of the typical VARCHAR data type. This database record stores an id (preferences_id) and a unique business key (username) along with the XML for easy retrieval by a RESTful web service.
Use Case
The use case described in this post is to migrate the data stored in the XML document to a new value. Specifically, the theme named "modern" is to be renamed "modern1". One way to handle this is to use a string replacement, but a more flexible way to handle this is to retain the data's definition by dealing with the perferences_xml field as XML rather than a simple string. To apply the new theme name, the XML document will be read, converted internally into something usable by TOS, modified with the results of a lookup value, and written back out the the database.
"preferences" is a MySQL table used to select the XML document (preferences_xml). It is also a target, the tMySqlOutput component. "preferences_update" is a two-column lookup table that maps an old theme name to a new theme name.
tExtractXMLField
The tExtractXMLField parses and queries the XML document with XPaths. From the XML document, it creates preference_theme and preference_color columns. Although preference_color is not modified, it needs to be extracted so that it can be re-formed later.
The tExtractXMLField flows into a tMap which joins the preferences_update lookup with the newly-created tExtractXMLField schema (the broken out fields). This tMap should be familiar if you have read the documentation or studied basic lookup examples.
tXMLMap
The tXMLMap takes the individual fields -- now converted by the preferences_update lookup -- and re-forms the XML document in a field "preferences_xml". To use tXMLMap, make sure that the target type is Document. This will make the basic subelement/attribute right-click menu functions available. Build the structure and map the fields as you would with any XML component.
Document to String Conversion
A second tMap is used to convert the Document coming from the tXMLMap into the String used by the target tMySqlOutput. It seems like this should happen automatically, but you'll get an error even if you add an interceding tConvertType. Rather, call the toString() method of the Document as in this configuration.
Update Configuration
Finally, an update is executed using a tMySqlOutput component. This update uses the Repository schema for preferences. But it uses the Advanced settings' Field options as specified below to prevent username from being overridden with a NULL value. Update is the action used.
tXMLMap forms an XML document from input fields. Unlike the output XML components, though, processing can continue with TOS. The structures defined in the tXMLMap can be put into a field to be written out immediately, or combined with additional transformations in other components.
A Note about tWriteXMLField:
tWriteXMLField has similar capabilities to tXMLMap. However, tWriteXMLField will only output a single field of XML. That's good for inserting, however it may not work for updating where a unique key or id needs to be carried through. See TDI-8392 on Talend's bug tracker for more info.
tExtractXMLField is a Talend Open Studio (TOS) component that parses a field containing XML and breaks the structure out into individual fields. Once the data is out of XML, the resulting schema can be processed using other TOS components like tMap.
If an update is required, use a tXMLMap to rebuild the structure and direct the flow to an output component like tMySqlOutput. A tMap will need to be added in order to convert the Document output of tXMLMap to an output schema; implicit cast-style conversions or tConvertType won't work.
XML Document
Suppose this XML document was created by front-end Javascript and saved to the database. The document records user preferences which include a theme (modern) and a color palette (reds).
RDBMS Schema
The XML document can be saved in a Text column of a MySQL database. The Text column is like an Oracle Clob which removes the maxlength limit of the typical VARCHAR data type. This database record stores an id (preferences_id) and a unique business key (username) along with the XML for easy retrieval by a RESTful web service.
DB Schema Showing a Text Column with XML |
Use Case
The use case described in this post is to migrate the data stored in the XML document to a new value. Specifically, the theme named "modern" is to be renamed "modern1". One way to handle this is to use a string replacement, but a more flexible way to handle this is to retain the data's definition by dealing with the perferences_xml field as XML rather than a simple string. To apply the new theme name, the XML document will be read, converted internally into something usable by TOS, modified with the results of a lookup value, and written back out the the database.
Job Parsing and Re-forming XML |
tExtractXMLField
The tExtractXMLField parses and queries the XML document with XPaths. From the XML document, it creates preference_theme and preference_color columns. Although preference_color is not modified, it needs to be extracted so that it can be re-formed later.
tExtractXMLField Configuration |
Join Tables to Update Preferences |
The tXMLMap takes the individual fields -- now converted by the preferences_update lookup -- and re-forms the XML document in a field "preferences_xml". To use tXMLMap, make sure that the target type is Document. This will make the basic subelement/attribute right-click menu functions available. Build the structure and map the fields as you would with any XML component.
tXMLMap Configuration |
A second tMap is used to convert the Document coming from the tXMLMap into the String used by the target tMySqlOutput. It seems like this should happen automatically, but you'll get an error even if you add an interceding tConvertType. Rather, call the toString() method of the Document as in this configuration.
Document to String Conversion |
Finally, an update is executed using a tMySqlOutput component. This update uses the Repository schema for preferences. But it uses the Advanced settings' Field options as specified below to prevent username from being overridden with a NULL value. Update is the action used.
Update Only preferences_xml |
A Note about tWriteXMLField:
tWriteXMLField has similar capabilities to tXMLMap. However, tWriteXMLField will only output a single field of XML. That's good for inserting, however it may not work for updating where a unique key or id needs to be carried through. See TDI-8392 on Talend's bug tracker for more info.
No comments:
Post a Comment