Tuesday, June 11, 2013

Exporting Oracle XML to a File with Talend Open Studio


Oracle stores XML in a database column using a proprietary data type called XMLType.  XMLType is a CLOB that can be exported to a file as a block of text using Talend Open Studio.

Talend Open Studio can retrieve XML from an XMLType column in an Oracle table.  The default handling for an XMLType column is to treat the data is an Object.  A special setting on a tOracleInput will treat the XMLType as text, making it available for export to a text file.

Start with a job that connects a tOracleInput to a tFilterColumns.
Job Exporting an XMLType to a File
The schema used by the tOracleInput retrieves two identifiers from the query: prefs_id, cust_id.  A field "prefs_xml" is based on an XMLType column.
Schema Referencing XMLType
The Advanced settings tab has an option for treating the XMLType as text.  If this isn't check, the XMLType is treated as an Object throughout the document and will print out a memory location (Java hashCode) if directed to an output component.  If you intend to parse the XML, routing portions of the document to other components, keep the default value.

Interpret the XMLType as a String
Lastly, a tFilterColumns component is used to strip out the identifiers prefs_id and cust_id.  These could also have been omitted from the query's SELECT.  Note the difference in type of prefs_xml.
tFilterColumns Schema
If you're storing XML in an Oracle database, you can output it to files for analysis, distribution, or conversion.  Once outside of the database, the XML files can be processed using XSL (tXSLT) or transferred to another datastore without directly connecting.  If you plan on parsing the XML within Talend Open Studio, leave the setting as it is and add components like tExtractXMLField

No comments: