Thursday, June 13, 2013

Running Count in Talend Open Studio


Most Talend components keep a count of the records processed using variables like NB_LINE or NB_LINE_OK.  But these are only available after all processing is completed.  Define your own counter variable to keep a running count for use in a tMap.

Variables like tFilterRow.NB_LINE or tAccessOutput.NB_LINE_INSERTED can be used to report the number of affected lines after a subjob's processing.  However, it may be of use to get the current line index for use in a tMap.  The index variables used to form NB_LINE aren't available during processing; they're only written out the globalMap at the end of processing.

In this example, staging records are loaded from Excel to Access.  The order in which the Excel records are read is preserved in a database column called DISPLAY_SEQ_NB.  Note that there is an auto-increment column used for record ID in the Access table.  This could be used to infer a loading order, but this job uses a separate column to keep the ID as a meaningless surrogate key to help with maintenance later.  (I can swap in a record at the same DISPLAY_SEQ_NB without having to work against the auto-incrementing mechanism.)

Talend Staging Job Using a Counter
Step 1: Define the Counter Variable

To define the counter variable, use a tSetGlobalVar.  Define a global with an initial value.  In this case, the job uses an unquoted 0 to set it as an Integer which will support an increment later.

tSetGlobalVar

Step2: Use the Variable

Use the variable in a tMap.  Retrieve the value using the globalMap and cast to the Integer type.

tMap Using budgetFileCounter Variable

Step 3: Increment the Counter

Use a tJavaRow to increment the counter.  First, use the "Generate Code" feature to pass the input fields directly to the output.  Next, add a line of Java code that unpacks the variable stored in the globalMap into a Java primitive type that can be manipulated.

Incrementing budgetFileCounter Variable
 Most component report the outcome of their processing using CID-named global variables like 'tFilterRow.NB_LINES_OK'.  However, these variables are only available after the processing has been completed.  If you want to keep a running count, set your own variable.

No comments: