Tuesday, June 18, 2013

Yes, Oui, and Si: Boolean Handling with Talend Open Studio


Databases usually handle boolean values with a small integer where 0 is false and 1 is true.  However, the presentation layer usually prefers something human-readable like "Yes" or "True".  Commons Lang BooleanUtils provides a way of converting boolean strings without writing the same repetitive logic.

If you're working with boolean data in Talend Open Studio, you can define schemas using the "boolean | Boolean" data type.  This can then be used to write out a boolean value to the database.  Sometimes, usually for presentation purposes, boolean values are encoded as string values like "Yes" or "N" or "False".  The BooleanUtils class in the Commons Lang library provides some handy functions to keep Talend expressions uncluttered.

The following job uses tRowGenerator to create a record with 3 string values and a boolean: "Yes", "N", "true", false.

BooleanUtils Test Job
 tLibraryLoad loads the commons-lang-2.5.jar file and imports a class called "org.apache.commons.lang.BooleanUtils".  Select commons-lang-2.5.jar from the Library select on the Component / Basic settings tab.  Enter the following statement in the Import text box on the Advanced settings tab.

import org.apache.commons.lang.BooleanUtils;

 The tRowGenerator is configured to write a single record out using a String/String/String/Boolean schema.  The single record contains the following values: "Yes", "N", "true", false.

tRowGenerator Schema
The tJavaRow component maps the input fields to the output fields and applies a BooleanUtils call to the fields containing Strings.  BooleanUtils.toBoolean() will convert -- case insensitive -- "Yes", "No", "On", "Off", "True", and "False" to a boolean value.

Other String Used as Booleans

"Y" and "N" are often used, but they're not included in toBoolean(), so the method call is expanded to use toBooleanObject().  toBooleanObject() takes the strings used as true, false, and null in the method signature.  So, this could work for languages other than English ("Oui", "Non")

Null Value Handling

BooleanUtils.toBooleanObject() will also handle a null value.  For example, if you're data can be "Y", "N", or "N/A", make a call like this to return true, false, or null.

BooleanUtils.toBooleanObject(ynColumn, "Y", "N", "N/A")

The input schema for the tJavaRow is the same as the tRowGenerator schema (3 string fields and a boolean field).  The output schema is all boolean.

tJavaRow Schema Converting Strings to Booleans
The following shows the code for the tJavaRow component.  I use "Generate code" to provide a straight-forward input/output mapping, then modify the right hand expressions to include the BooleanUtils call.

BooleanUtils Code in tJavaRow Component
Running the job produces the following output.

Starting job BooleanTest at 17:23 16/03/2011.
[statistics] connecting to socket on port 3526
[statistics] connected
true|false|true|false
[statistics] disconnected
Job BooleanTest ended at 17:23 16/03/2011. [exit code=0]

Even Less Code

You can use a shorthand syntax for the toBoolean() and toBooleanObject() methods that leaves off the class-qualifying "BooleanUtils".  This is done using Java's static import and will let you use the following in your Talend expressions (the BooleanUtils is left off).

output_row.col1 = toBoolean(yesNoColumn);
output_row.col2 = toBooleanObject(ynColumn, "Y", "N", "N/A")


The static import is applied in the tLibraryLoad component.  Rather than import the class BooleanUtils, the Advanced settings tab of the component will have

import static org.apache.commons.lang.BooleanUtils.toBoolean;
import static org.apache.commons.lang.BooleanUtils.toBooleanObject;
  
A single "import static org.apache.commons.lang.BooleanUtils.*" could be used instead, but I usually import each function individually.  The static import should be used sparingly because it can cause conflicts with the global namespace.  That is if you import similarly named functions from different classes or packages using the wildcard notation, you could have a conflict.

Consistent Handling Throughout Job

While you can certainly form your own expressions using the tertiary operator of if statements, BooleanUtils can consistently apply the same logic and null handling throughout the job.

No comments: