tScriptRules v 2.0 for Talend Open Studio
tScriptRules is a Talend Open Studio component that allows you to apply rules to an input flow. A new version, 2.0, was released. This blog post demonstrates its usage.
Off-the-shelf, Talend Open Studio provides the tMap and the tFilterRow components which can apply a business rule or quality check to an incoming data flow. I wrote tScriptRules last year to be a little more flexible. tScriptRules is based on a Javascript-like implementation called JEXL which will let you apply more complex conditions that the standard tFilterRow. Also, unlike tFilterRow or tMap, tScriptRules stores additional documentation with each expression.
Basic Usage
This example shows an Excel source sending input records to tScriptRules. The output of the tScriptRules is one of two tLogRows. One tLogRow handles the filter flow. Any script expressions that resolve to true (such as "!empty(customerId)") will be routed to the filter tLogRow. A script expression that fails will be sent to the reject tLogRow.
The tScriptRules in this example is performing a basic quality check, verifying that certain key fields are not empty (null or otherwise).
The rules are written using fields taken from the input flow, in this case, coming from a tFileInputException. "input_row" is a handy alias to avoid coding the rules to "row1" in case that changes later. Note that "row1" will work such that !empty(input_row.Business_Name) is equivalent to !empty(row1.Business_Name).
Expanded Check
tScriptRules supports a Run All mode which will run each rule against the input. The normal operation (Run All = false) will stop processing of a row on the first failure. In both cases, a row failure does not kill the job, but will carry on with the next row.
Run All mode is a Dynamic Setting, so it can be controlled on the command line with a --context_param option.
More Complicated Rule
The JEXL script syntax is like Javascript and supports regular expressions. This example uses a rule that checks a state value ("VA") and an email domain ("edu") to infer something about the contact.
tScriptRules 2.0 is a complete re-write and designed to work with the new tScriptRulesLoad component. Functionally, this pairing is the same as using the Component View table demonstrated in this post. However, tScriptRulesLoad makes the rules transferable by using an external source like a file. This file can be in a readable text file that is well suited to a configuration management system like Git.
Off-the-shelf, Talend Open Studio provides the tMap and the tFilterRow components which can apply a business rule or quality check to an incoming data flow. I wrote tScriptRules last year to be a little more flexible. tScriptRules is based on a Javascript-like implementation called JEXL which will let you apply more complex conditions that the standard tFilterRow. Also, unlike tFilterRow or tMap, tScriptRules stores additional documentation with each expression.
Basic Usage
This example shows an Excel source sending input records to tScriptRules. The output of the tScriptRules is one of two tLogRows. One tLogRow handles the filter flow. Any script expressions that resolve to true (such as "!empty(customerId)") will be routed to the filter tLogRow. A script expression that fails will be sent to the reject tLogRow.
tScriptRules with 5 Rows that Met the Conditions |
Some Rules Checking for Missing Data |
Expanded Check
tScriptRules supports a Run All mode which will run each rule against the input. The normal operation (Run All = false) will stop processing of a row on the first failure. In both cases, a row failure does not kill the job, but will carry on with the next row.
Run All Option |
More Complicated Rule
The JEXL script syntax is like Javascript and supports regular expressions. This example uses a rule that checks a state value ("VA") and an email domain ("edu") to infer something about the contact.
A More Complicated Example |
No comments:
Post a Comment