Thursday, June 13, 2013

Handling an Empty JSON Object in Talend Open Studio


To input JSON into a flow using Talend Open Studio, use the tFileInputJSON component.  If the JSON input may be empty, use a guard condition that examines the structure beforehand.

The tFileInputJSON component takes a JSON structure as input and builds a schema based on JSON paths.  For example,

{ "attribs": [
    {
      "name": "req",
      "value": "1"
  } ]
}

Defines an array "attribs" with records containing fields "name" and "value".  This is mapped in a tFileInputJSON using the following schema and JSON paths (from the tFileInputJSON Component View)

Column        JSONPath query
---------------------------
name       "$.attribs[*].name"
value       "$.attribs[*].value"

"$" refers to the root of the JSON object.  ".", immediately following the $, is the current object.  "attribs" is an array element.  The wildcard index "*" means select all attribs.  "name" and "value" are fields.

This structure relies on the convention that there is a balanced number of names and values. Although the JSON paths are similar, they aren't correlated.  Pull out a "value" in the middle of the list and the remaining values will shuffle up to different names.

Empty Input

Another  case one might encounter is that of empty input.  For example,

{
"attribs": [ ]
}

In Talend Open Studio, the current tFileInputJSON component will throw an error as it attempts to map the non-existent name and value fields.  In this case, use some programming logic to filter the empty input from the tFileInputJSON component.  This can be done with a second tFileInputJSON component.

Guard Condition

This job performs a beforehand check on the input prior to mapping the name and the value fields. See  "UPDATE: Reduced Number of Components" at end of post for an alternate implementation.

Job Checking JSON Input Prior to Processing

Both tFileInputJSON components operate on the same file.  However, the first JSON component maps the JSON array "attribs" rather than the individual fields "name" and "value".  From tFileInputJSON_1's Component View

Column   JSONPath query
------------------------------
attribs      "$.attribs"

Determining Empty Object

A tJavaRow applies the logic that determines whether or not the input is empty.  This code uses a regular expression to look for an empty (bracketed whitespce) array.

if( input_row.attribs == null ||
    input_row.attribs.matches("^\\[\\s*\\]$") ) {
  globalMap.put("EMPTY_FILE_FLAG", new Boolean(true));
}
else {
  globalMap.put("EMPTY_FILE_FLAG", new Boolean(false));
}

Applying the Filter

The tFixedFlowInput, tFilterRow, and tFlowToIterate components will invoke the tFileInputJSON_3 component that actually maps the name and value fields and would continue with additional processing.  The tFileInputJSON will only be invoked if the EMPTY_FILE_FLAG is set to false.  The construct is verbose because of the need for iterate and flow adapter components.

A tFilterFlow converts the global variable set in the guard condition subjob to a flow.  This enables the tFilterRow component to be used.

Retrieving a Flag in tFilterFlow
The tFilterRow component applies a simple boolean check on the input field.

tFilterRow Checking a Flag
 If the check passes, processing continues with a second pass taken on the input JSON file.

In today's version, the tFileInputJSON requires a well-rounded data structure.  Additional components are needed if input doesn't conform.  This post scanned an input file and used a regular expression to determine whether or not the input was empty.  An improvement to this is to wrap the logic and filtering into a routine or custom component.

UPDATE: Reduced Number of Components

You can skip the tFilterRow and tFlowToIterate components by using the Run If trigger from tFileInputJSON_1.  The Run If trigger supports an expression that will continue processing if true.  The tFixedFlowInput component is still needed as an adapter between the pair of tFileInputJSON components.

Here is a screenshot of the reworked job.  The Component View is showing the expression of the filter condition.

JSON Job Re-worked to Use Run If

No comments: