Tuesday, June 11, 2013

Validate XML Data with Talend Open Studio

Validate XML Data with Talend Open Studio

With an XSD, you can validate the structure of an XML document.  To validate the contents of the document in Talend Open Studio, use a component like tFilterRow.

tXSDValidator is a Talend Open Studio component that verifies the structure of an XML document.  However, you may want to validate the contents of an XML document.  To do this, input the document using a tFileInputXML and apply a tFilterRow with a set of rules.

The Source Data

Make sure that your XML is in a format that is supported by Talend Open Studio.  This means that the XML processing will be based on attributes and elements rather than items encoded within the tags.  If the XML is not readily processable, convert it using a stylesheet.

For example,

    
    TABNAM=EDI_DC40    
    MANDT=100    
    DOCNUM=0000000001234567    
    DOCREL=123    
    STATUS=30    
    VVV=15    
    SNDPRN=EXP1100    
    RCVPOR=A000000001    
    RCVPRT=LS    
    RCVPRN=NRPP041V3    
    CREDAT=20080401    
    CRETIM=094655    
    SERIAL=20080401094655 


has values within the element CRField.  The following stylesheet will replace the text contents with an attribute "name".


xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all">
   
   
       
           
       
       
   


   
       
           
               
           

           
       
   
   

    


To produce this variation of the XML.

    
    EDI_DC40    
    100    
    0000000001234567    
    123    
    30    
    15    
    EXP1100    
    A000000001    
    LS    
    NRPP041V3    
    20080401    
    094655    
    20080401094655 


Job

The job starts by converting the XML with a stylesheet; use the tXSLT component.

Job Validating tFileInputXML

The configuration of the tXSLT component will apply a stylesheet to an XML file (Data.xml) and output a transformed XML file (DataWithAttributes-TALEND.xml).

Configuring a tXSLT Component
 The XML input -- now transformed -- maps each CRField to a different field.  The mapping is based on a new attribute "name".
Mapping the Transformed Document to a Schema
Validation
 
Now that the job parses the XML, a component like tFilterRow can be used to produce a set of rules that validate the data.  Other components, like tSchemaCompliance or even tJavaRow, could also be used.  This example does a length check (DOCNUM) and two regular expression matches (CREDT, VVV).

Some tFilterRow Rules

In order to take advantage of Talend Open Studio's XML components, make sure that your XML data conforms.  Produce a schema from the XML document, making sure that any encoded values are incorporated into the XML.  Several components can be used with the schema to do the validation; this example shows tFilterRow.

No comments: