To input JSON into a flow using Talend Open Studio, use the tFileInputJSON component. If the JSON input may be empty, use a guard condition that examines the structure beforehand.
The tFileInputJSON component takes a JSON structure as input and builds a schema based on JSON paths. For example,
{ "attribs": [
{
"name": "req",
"value": "1"
} ]
}
Defines an array "attribs" with records containing fields "name" and "value". This is mapped in a tFileInputJSON using the following schema and JSON paths (from the tFileInputJSON Component View)
Column JSONPath query
---------------------------
name "$.attribs[*].name"
value "$.attribs[*].value"
"$" refers to the root of the JSON object. ".", immediately following the $, is the current object. "attribs" is an array element. The wildcard index "*" means select all attribs. "name" and "value" are fields.
This structure relies on the convention that there is a balanced number of names and values. Although the JSON paths are similar, they aren't correlated. Pull out a "value" in the middle of the list and the remaining values will shuffle up to different names.
Empty Input
Another case one might encounter is that of empty input. For example,
{
"attribs": [ ]
}
In Talend Open Studio, the current tFileInputJSON component will throw an error as it attempts to map the non-existent name and value fields. In this case, use some programming logic to filter the empty input from the tFileInputJSON component. This can be done with a second tFileInputJSON component.
Guard Condition
This job performs a beforehand check on the input prior to mapping the name and the value fields. See "UPDATE: Reduced Number of Components" at end of post for an alternate implementation.
Both tFileInputJSON components operate on the same file. However, the first JSON component maps the JSON array "attribs" rather than the individual fields "name" and "value". From tFileInputJSON_1's Component View
Column JSONPath query
------------------------------
attribs "$.attribs"
Determining Empty Object
A tJavaRow applies the logic that determines whether or not the input is empty. This code uses a regular expression to look for an empty (bracketed whitespce) array.
if( input_row.attribs == null ||
input_row.attribs.matches("^\\[\\s*\\]$") ) {
globalMap.put("EMPTY_FILE_FLAG", new Boolean(true));
}
else {
globalMap.put("EMPTY_FILE_FLAG", new Boolean(false));
}
Applying the Filter
The tFixedFlowInput, tFilterRow, and tFlowToIterate components will invoke the tFileInputJSON_3 component that actually maps the name and value fields and would continue with additional processing. The tFileInputJSON will only be invoked if the EMPTY_FILE_FLAG is set to false. The construct is verbose because of the need for iterate and flow adapter components.
A tFilterFlow converts the global variable set in the guard condition subjob to a flow. This enables the tFilterRow component to be used.
The tFilterRow component applies a simple boolean check on the input field.
If the check passes, processing continues with a second pass taken on the input JSON file.
In today's version, the tFileInputJSON requires a well-rounded data structure. Additional components are needed if input doesn't conform. This post scanned an input file and used a regular expression to determine whether or not the input was empty. An improvement to this is to wrap the logic and filtering into a routine or custom component.
UPDATE: Reduced Number of Components
You can skip the tFilterRow and tFlowToIterate components by using the Run If trigger from tFileInputJSON_1. The Run If trigger supports an expression that will continue processing if true. The tFixedFlowInput component is still needed as an adapter between the pair of tFileInputJSON components.
Here is a screenshot of the reworked job. The Component View is showing the expression of the filter condition.
The tFileInputJSON component takes a JSON structure as input and builds a schema based on JSON paths. For example,
{ "attribs": [
{
"name": "req",
"value": "1"
} ]
}
Defines an array "attribs" with records containing fields "name" and "value". This is mapped in a tFileInputJSON using the following schema and JSON paths (from the tFileInputJSON Component View)
Column JSONPath query
---------------------------
name "$.attribs[*].name"
value "$.attribs[*].value"
"$" refers to the root of the JSON object. ".", immediately following the $, is the current object. "attribs" is an array element. The wildcard index "*" means select all attribs. "name" and "value" are fields.
This structure relies on the convention that there is a balanced number of names and values. Although the JSON paths are similar, they aren't correlated. Pull out a "value" in the middle of the list and the remaining values will shuffle up to different names.
Empty Input
Another case one might encounter is that of empty input. For example,
{
"attribs": [ ]
}
In Talend Open Studio, the current tFileInputJSON component will throw an error as it attempts to map the non-existent name and value fields. In this case, use some programming logic to filter the empty input from the tFileInputJSON component. This can be done with a second tFileInputJSON component.
Guard Condition
This job performs a beforehand check on the input prior to mapping the name and the value fields. See "UPDATE: Reduced Number of Components" at end of post for an alternate implementation.
Job Checking JSON Input Prior to Processing |
Both tFileInputJSON components operate on the same file. However, the first JSON component maps the JSON array "attribs" rather than the individual fields "name" and "value". From tFileInputJSON_1's Component View
Column JSONPath query
------------------------------
attribs "$.attribs"
Determining Empty Object
A tJavaRow applies the logic that determines whether or not the input is empty. This code uses a regular expression to look for an empty (bracketed whitespce) array.
if( input_row.attribs == null ||
input_row.attribs.matches("^\\[\\s*\\]$") ) {
globalMap.put("EMPTY_FILE_FLAG", new Boolean(true));
}
else {
globalMap.put("EMPTY_FILE_FLAG", new Boolean(false));
}
Applying the Filter
The tFixedFlowInput, tFilterRow, and tFlowToIterate components will invoke the tFileInputJSON_3 component that actually maps the name and value fields and would continue with additional processing. The tFileInputJSON will only be invoked if the EMPTY_FILE_FLAG is set to false. The construct is verbose because of the need for iterate and flow adapter components.
A tFilterFlow converts the global variable set in the guard condition subjob to a flow. This enables the tFilterRow component to be used.
Retrieving a Flag in tFilterFlow |
tFilterRow Checking a Flag |
In today's version, the tFileInputJSON requires a well-rounded data structure. Additional components are needed if input doesn't conform. This post scanned an input file and used a regular expression to determine whether or not the input was empty. An improvement to this is to wrap the logic and filtering into a routine or custom component.
UPDATE: Reduced Number of Components
You can skip the tFilterRow and tFlowToIterate components by using the Run If trigger from tFileInputJSON_1. The Run If trigger supports an expression that will continue processing if true. The tFixedFlowInput component is still needed as an adapter between the pair of tFileInputJSON components.
Here is a screenshot of the reworked job. The Component View is showing the expression of the filter condition.
JSON Job Re-worked to Use Run If |
No comments:
Post a Comment