If you work with XML, you may see the xs:group feature in an XSD. To work with XML based on this XSD in Talend Open Studio, consider using an XML binding tool and a custom Java class that implements the Builder Pattern.
Background
xs:group allows XSD authors to re-use a list of elements (sequence, choice, all). 'xs:group', called Model Group in the specification, is different than xs:complexType. xs:complexType needs element definitions. xs:group will use the elements in its own definition, inserting them into a complexType via.
Complex Type
For example, given a complexType 'PhoneNumberType' with elements 'number' and 'countryCode',
Background
xs:group allows XSD authors to re-use a list of elements (sequence, choice, all). 'xs:group', called Model Group in the specification, is different than xs:complexType. xs:complexType needs element definitions. xs:group will use the elements in its own definition, inserting them into a complexType via
Complex Type
For example, given a complexType 'PhoneNumberType' with elements 'number' and 'countryCode',
Another type using PhoneNumber type is like this
'contactPhone' is defined for ContactType. Another type, say 'BusinessType', might have a definition like
XPaths to get the numbers' country codes would be contact/contactPhone/countryCode and business/businessPhone/countryCode.
Model Group
A Model Group is defined in a similar fashion to Complex Type.
The syntax for a referencing type is different. A 'group ref' is added to the sequence rather than a new element.
This will generate XML accessed by XPaths like contact/countryCode or business/countryCode. There is no intervening element wrapping up a type required as there is with PhoneNumberType. However, an author may map a Group to a single Element Sequence and then to a Complex Type.
An author might do this if he or she wanted to use an element's name consistently. A Group built starting with a single Element Sequence would have that element name consistently available throughout the definition. For example, a Group with a Sequence containing only Element 'phone' would make 'phone' available to every reference rather than 'businessPhone', 'contactPhone', or 'phn' as could be with a Complex Type.
Talend Open Studio and Complex XML
When possible, use the Talend off-the-shelf XML components. If you're outputting XML, try to create a file output XML based on the XSD. Ideally, your job would look like the screenshot following this paragraph. However, the Recordare MusicXML XSD I was working on caused numerous problems for Talend: empty XML elements, invalid XML, no data mapped. If your XSD requires more than one loop element or can't readily be processed by Talend, consider incorporating a third-party library and a custom Java class that implements the Builder pattern.
The Job I'd Like to Write |
For a detailed explanation on XML binding and the Builder pattern, read this blog post. The following is a second example of a custom Java class written for Talend Open Studio that does handle the XSD from the preceding screenshot.
Builder Class
The Builder Pattern constructs a complex object like an XML document in stages. In this example, a stage is a loop in the XSD. While the Recordare MusicXML XSD contains hundreds of elements that can express pagination and formatting, my simple data is based on Measures and Notes. So, the Builder Class, ScorePartwiseBuilder, is correspondingly simple. Yet supporting ScorePartwiseBuilder is a complex set of over 400 generated classes.
To get a feel for this pattern, look at this main() method run outside of Talend Open Studio.
ScorePartwiseBuilder bld = new ScorePartwiseBuilder();
bld.init("A1", "Music");
bld.addMeasure("1", "0", "4", "4", "C", "2");
bld.addNote("C", "1", "1", "quarter");
bld.addNote("D", "1", "1", "quarter");
bld.addNote("E", "1", "1", "quarter");
bld.addNote("D", "1", "1", "quarter");
System.out.println( bld.toXmlString() );
After creation (new), an init() method sets some fields which will be represented as a block of XML. A measure is added with addMeasure() and notes are added with addNote(). For each measure and sequence of notes, this pattern is repeated. The Builder Class keeps track of the internal state -- the current measure -- and eventually will render the document to a String with toXmlString().
Builder Class Referencing XML Binding Tool Classes |
The full source of SourcePartwiseBuilder is available here. This post is working with Liquid XML Data Binder though open source tools like XmlBeans work too.
Job Design
The sequence of steps in a Talend Open Studio job starts with loading libraries and creating the builder class. Next, text file input is sent to the builder. Finally, an output string is formed.
Job Using ScorePartwiseBuilder |
The tJava instantiates the builder class, initializes the object with some header information, and stores the builder on the globalMap for use in later components. tJava_1 also sets a current measure variable (curr_measure) that ensures that tJavaRow_1 will have a value.
Input
The input for this example is the following text file. Note that the input is an extremely condensed data set compared to what's possible with the schema.
Measure,Fifths,Beats,BeatType,Sign,Line,Step,Octave,Duration,Line
1,0,4,4,C,2,C,1,1,quarter
1,0,4,4,C,2,D,1,1,quarter
1,0,4,4,C,2,E,1,1,quarter
1,0,4,4,C,2,D,1,1,quarter
The code behind tJavaRow_1 will create the Measures and Notes. Notes make up Measures, so a Measure doesn't need to be created for each Note. This is coded using a flag 'curr_measure' which is adjusted for each new Measure.
Invoking the Builder Commands |
XML Output Enforced by XSD Group |
No comments:
Post a Comment