Tuesday, June 11, 2013

tScriptRules v 2.0 for Talend Open Studio

tScriptRules v 2.0 for Talend Open Studio

tScriptRules is a Talend Open Studio component that allows you to apply rules to an input flow.  A new version, 2.0, was released.  This blog post demonstrates its usage.

Off-the-shelf, Talend Open Studio provides the tMap and the tFilterRow components which can apply a business rule or quality check to an incoming data flow.  I wrote tScriptRules last year to be a little more flexible.  tScriptRules is based on a Javascript-like implementation called JEXL which will let you apply more complex conditions that the standard tFilterRow.  Also, unlike tFilterRow or tMap, tScriptRules stores additional documentation with each expression.

Basic Usage

This example shows an Excel source sending input records to tScriptRules.  The output of the tScriptRules is one of two tLogRows.  One tLogRow handles the filter flow.  Any script expressions that resolve to true (such as "!empty(customerId)") will be routed to the filter tLogRow.  A script expression that fails will be sent to the reject tLogRow.

tScriptRules with 5 Rows that Met the Conditions
The tScriptRules in this example is performing a basic quality check, verifying that certain key fields are not empty (null or otherwise).

Some Rules Checking for Missing Data
The rules are written using fields taken from the input flow, in this case, coming from a tFileInputException.  "input_row" is a handy alias to avoid coding the rules to "row1" in case that changes later.  Note that "row1" will work such that !empty(input_row.Business_Name) is equivalent to !empty(row1.Business_Name).

Expanded Check

tScriptRules supports a Run All mode which will run each rule against the input.  The normal operation (Run All = false) will stop processing of a row on the first failure.  In both cases, a row failure does not kill the job, but will carry on with the next row.

Run All Option
Run All mode is a Dynamic Setting, so it can be controlled on the command line with a --context_param option.

More Complicated Rule

The JEXL script syntax is like Javascript and supports regular expressions.  This example uses a rule that checks a state value ("VA") and an email domain ("edu") to infer something about the contact.


A More Complicated Example
tScriptRules 2.0 is a complete re-write and designed to work with the new tScriptRulesLoad component.  Functionally, this pairing is the same as using the Component View table demonstrated in this post.  However, tScriptRulesLoad makes the rules transferable by using an external source like a file.  This file can be in a readable text file that is well suited to a configuration management system like Git.

No comments: