Tuesday, June 11, 2013

Applying Contexts to Talend Open Studio Jobs


If you're using Talend Open Studio for with stable Dev/Test/Prod configurations, consider creating a Context for each configuration.

In Talend Open Studio, a Context is used to group a set of variables that parameterize a job.  This is an important configuration management feature that gives the caller the ability to deploy the same code in different runtime environments.

Using Only ContextGroups


For example, if I export a job 'PropsJob' with scripts generated, I can call the following from the Windows command line to configure the job to run in different environments.

C:\Jobs\PropsJob_0,1\PropsJob> PropsJob_run.bat --context=Dev
C:\Jobs\PropsJob_0,1\PropsJob> PropsJob_run.bat --context=Test
C:\Jobs\PropsJob_0,1\PropsJob> PropsJob_run.bat --context=Prod
C:\Jobs\PropsJob_0,1\PropsJob> PropsJob_run.bat

The last command will use the Default context.

Within Talend Open Studio, different runtime Contexts are selected from the Run tab.  See the drop down list in the upper right-hand corner.

Select a Context on the Run Tab

Contexts and Context Variables can be created at the Project level, which makes them available to all jobs in a Project.  These are added to the Repository.  Using the Repository is recommended to help with coding standards.  Names like 'DATA_DIR' can be centralized and won't be subject to being renamed in individual jobs.

Alternatively, Contexts and Context Variables can be created for individual jobs.  See the following screenshot showing the Contexts tab of a job 'PropsJob'.

Contexts Tab
Overriding Parameters

If you're running an exported job, you can pass the argument --context_param followed by a name/value pair to override any of the parameters in the context.  For example

C:\Jobs\PropsJob_0.1\PropsJob>PropsJob_run --context_param DBSCHEMA=demo

Will override the context parameter 'DBSCHEMA' defined in a Context.

Preferred Approach: Separate Config from Code

The most robust option for deploying Talend Open Studio jobs is to use property files that are separated from the exported job code.  This allows for the changing of config data like data sources, directories, and passwords without touching code.  To accomplish this, use the tContextLoad component in a job with a source such as tFileInputDelimited.  Other input components like databases can also be used.

Configure the input component schema with key and value fields.  If your input schema doesn't have key and value fields, use a tMap to rename the fields.  In the following screenshot, a tFileInputDelimited is created with two fields: key and value.  The file name used by the tFileInputDelimited is itself parameterized with a Context Variable set for the Default context.  Note that the variable 'context.PROPSFILE' is not expected to be in the properties file (and might be removed to flag an error condition of the job).

The schema in the tFileInputDelimited uses the equals sign for a delimiter ('=').

TOS Job Configured with Custom .properties File

To run the exported job using the default configuration,

C:\Jobs\PropsJob_0.1\PropsJob>PropsJob_run

And to use a different properties file

C:\Jobs\PropsJob_0.1\PropsJob>PropsJob_run --context_param PROPSFILE=C:/jobs/propsjob2.properties

If you would like the job to fail immediately if it can't find a config file, check the "Die on error" option on the tFileInputDelimited component.

Printing Context

In a development or even a production environment, it can be useful to print out the configuration at the start of a job.  This job uses the tContextDump component to echo each name/value setting defined in the context and used in the job to a tLogRow.  Other output data sources can be used, but this example will keep all the messages directed to standard out together.

Using tContextDump


Alternative: Hack the Exported Job

You can also hack the exported job by finding the exported properties files.  In the unzipped export, look for a directory 'contexts' under the Project/Job directories (for example, "propsjobproject/propsjob_0_1/contexts'.  There is a property file for each context that can be edited.  It's best to avoid this practice since it can mix up a deployment that overwrites these files.  However, it can be used to fix an operational problem quickly.


For stable environments that share resources like directories and database accounts, create a Project or Job-wide set of Contexts to parameterize your jobs.  Most everything can be parameterized using using Contexts, but variables like passwords may require special handling.  This is so that passwords don't get mixed up in TOS code leading to a security hack or forcing a new job deployment when a password expires.  In these cases, use the more robust deployment option of a separate properties file.

No comments: