Tuesday, June 11, 2013

Manipulating a tHashOutput in Talend Open Studio

Manipulating a tHashOutput in Talend Open Studio

The Talend Open Studio tHashOutput and tHashInput allow you to save your input in RAM, offering potential performance gains.  The basic usage defines a single tHashOutput which gathers input and a tHashInput which will direct the input to a data flow.  This post describes two expanded configurations.

tHashOutput and tHashInput worked with input stored in internal memory and do so in a way consistent with other Talend components.  The Hash components allow you to define flows to retrieve data throughout a map that has been stored by some other part of the job.  In a simple scenario, this is done with a single input/output pair.

Multiple Sources

This screenshot shows a job that will merge two data sources -- a tRowGenerator and a tFileInputDelimited -- into a single Hash data structure using two tHashOutputs.  The first tHashOutput will be referenced by subsequent tHashOutputs in the "Link with a tHashOutput" control.


Configuration of Linked tHashOutput
This tHashOutput refers to the first component.
tHashOutput Referring to Prior Component
The combined data sets are available through the tHashInput.  It doesn't matter which of the two components are selected in the Component List select since they are linked.

tHashInput Configuration
Neither the Data Write Model, Keys Management, or Append settings will have any effect in this job.  Data Write Model has only one value in its select.  I think Keys Management has a bug in version 5 (see TDI-21180).  Append only takes effect in an iteration.

Clearing When Iterating

This job iterates over a data set, clearing the backing RAM structure defined in the tHashOutput with each iteration.  This is done by unchecking Append.  If Append were not unchecked, each iteration would produce more and more output as the preceding iteration's tHashOutput gathers more values.

Clearing After Each Iteration
The results of clearing the tHashOutput follow.

Results with Append Unchecked

If Append is checked, the output is repeated as it accrues through the iterations.

Iterating in Append Mode
The tHashOutput and tHashInput components can provide your Talend Open Studio job with a performance gain by saving input in RAM.  tHashOutput can be used to gather input from different sources using the Linked feature.  Append mode will work only in iterations and provides control over when a tHashOutput is cleared.

No comments: