Validate XML Data with Talend Open Studio
With an XSD, you can validate the structure of an XML document. To validate the contents of the document in Talend Open Studio, use a component like tFilterRow.
tXSDValidator is a Talend Open Studio component that verifies the structure of an XML document. However, you may want to validate the contents of an XML document. To do this, input the document using a tFileInputXML and apply a tFilterRow with a set of rules.
The Source Data
Make sure that your XML is in a format that is supported by Talend Open Studio. This means that the XML processing will be based on attributes and elements rather than items encoded within the tags. If the XML is not readily processable, convert it using a stylesheet.
For example,
TABNAM=EDI_DC40
MANDT=100
DOCNUM=0000000001234567
DOCREL=123
STATUS=30
VVV=15
SNDPRN=EXP1100
RCVPOR=A000000001
RCVPRT=LS
RCVPRN=NRPP041V3
CREDAT=20080401
CRETIM=094655
SERIAL=20080401094655
has values within the element CRField. The following stylesheet will replace the text contents with an attribute "name".
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all">
To produce this variation of the XML.
EDI_DC40
100
0000000001234567
123
30
15
EXP1100
A000000001
LS
NRPP041V3
20080401
094655
20080401094655
Job
The job starts by converting the XML with a stylesheet; use the tXSLT component.
The configuration of the tXSLT component will apply a stylesheet to an XML file (Data.xml) and output a transformed XML file (DataWithAttributes-TALEND.xml).
The XML input -- now transformed -- maps each CRField to a different field. The mapping is based on a new attribute "name".
Validation
Now that the job parses the XML, a component like tFilterRow can be used to produce a set of rules that validate the data. Other components, like tSchemaCompliance or even tJavaRow, could also be used. This example does a length check (DOCNUM) and two regular expression matches (CREDT, VVV).
In order to take advantage of Talend Open Studio's XML components, make sure that your XML data conforms. Produce a schema from the XML document, making sure that any encoded values are incorporated into the XML. Several components can be used with the schema to do the validation; this example shows tFilterRow.
tXSDValidator is a Talend Open Studio component that verifies the structure of an XML document. However, you may want to validate the contents of an XML document. To do this, input the document using a tFileInputXML and apply a tFilterRow with a set of rules.
The Source Data
Make sure that your XML is in a format that is supported by Talend Open Studio. This means that the XML processing will be based on attributes and elements rather than items encoded within the tags. If the XML is not readily processable, convert it using a stylesheet.
For example,
has values within the element CRField. The following stylesheet will replace the text contents with an attribute "name".
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all">
To produce this variation of the XML.
Job
The job starts by converting the XML with a stylesheet; use the tXSLT component.
Job Validating tFileInputXML |
The configuration of the tXSLT component will apply a stylesheet to an XML file (Data.xml) and output a transformed XML file (DataWithAttributes-TALEND.xml).
Configuring a tXSLT Component |
Mapping the Transformed Document to a Schema |
Now that the job parses the XML, a component like tFilterRow can be used to produce a set of rules that validate the data. Other components, like tSchemaCompliance or even tJavaRow, could also be used. This example does a length check (DOCNUM) and two regular expression matches (CREDT, VVV).
Some tFilterRow Rules |
In order to take advantage of Talend Open Studio's XML components, make sure that your XML data conforms. Produce a schema from the XML document, making sure that any encoded values are incorporated into the XML. Several components can be used with the schema to do the validation; this example shows tFilterRow.
No comments:
Post a Comment