Rendering XML from a Multi-Schema Text File with Talend Open Studio
When you're working with a multi-schema text file in Talend Open Studio, use a set of tAdvancedFileOutputXMLs to render an XML document. Make sure you use the proper connections: OnSubjobOK rather than Main.
A mutli-schema text file contains a varying structure where each line may represent a different record type. This is an example from the Talend documentation.
01;SOFT MUSIC DANCE ALBUM;RICHARDSON;15/12/2005
02;We Danced
02;She's Everytying
02;Once in a Lifetime Love
03;National Library
01;COUNTRY MUSIC ALBUM;WHITE;02/01/2006
02;Fall Into Me
02;Another Try
02;Something About Her
Songs ("We Danced", etc) are grouped in Compact Discs ("SOFT MUSIC DANCE ALBUM"). There's also a third record type "Library". This could be expressed in an XML document like this.
Using Main Connectors
It might seem possible to connect each schema in a multi-schema component like tFileInputMSDelimited to a set of tAdvancedFileOutputXMLs set in Append mode. This job looks like it would work. The record counts all check out.
However, the XML output doesn't render correctly. The subelements "Songs" and "Libraries" are missing.
OnSubJobOk
Instead of connecting each schema out with a Main, chain the tAdvancedFileOutputXMLs together using OnSubJobOks.
This produces the correct document.
Component Configuration
The configuration of the tFileInputMSDelimited is found in the Talend help files. In the case of "OnSubJobOk", the tFileInputMSDelimited is duplicated, one for each output schema.
The tAdvancedFileOutputXML's are in append mode (except for the first one) and directed to the same XML file.
This third mapping repeats so that will follow it.
Connecting a bunch of tAdvancedFileOutputXMLs didn't work initially, but by restructuring the job, you can produce an XML document from a text file.
A mutli-schema text file contains a varying structure where each line may represent a different record type. This is an example from the Talend documentation.
01;SOFT MUSIC DANCE ALBUM;RICHARDSON;15/12/2005
02;We Danced
02;She's Everytying
02;Once in a Lifetime Love
03;National Library
01;COUNTRY MUSIC ALBUM;WHITE;02/01/2006
02;Fall Into Me
02;Another Try
02;Something About Her
Songs ("We Danced", etc) are grouped in Compact Discs ("SOFT MUSIC DANCE ALBUM"). There's also a third record type "Library". This could be expressed in an XML document like this.
XML Document from a Multi-Schema Text File |
It might seem possible to connect each schema in a multi-schema component like tFileInputMSDelimited to a set of tAdvancedFileOutputXMLs set in Append mode. This job looks like it would work. The record counts all check out.
WRONG: Appended Elements Won't Show Up |
However, the XML output doesn't render correctly. The subelements "Songs" and "Libraries" are missing.
XML Document Missing Key Subelements |
Instead of connecting each schema out with a Main, chain the tAdvancedFileOutputXMLs together using OnSubJobOks.
With OnSubJobOk |
Component Configuration
The configuration of the tFileInputMSDelimited is found in the Talend help files. In the case of "OnSubJobOk", the tFileInputMSDelimited is duplicated, one for each output schema.
The tAdvancedFileOutputXML's are in append mode (except for the first one) and directed to the same XML file.
Mapping that Produces Toplevel Disc Container |
The elements in the toplevel container should appear in all schemas: Author, Date. This is to ensure the correct ordering of the XML elements which might be validated against an xs:sequence element in an XSD.
Mapping that Produces Song Element |
Mapping that Produces the Libraries Element |
Connecting a bunch of tAdvancedFileOutputXMLs didn't work initially, but by restructuring the job, you can produce an XML document from a text file.
No comments:
Post a Comment