Top Ten Timesaving Talend Tips of Two Thousand Eleven: Part 2
The second part of my 10 Talend Open Studio for Data Integration time-saving recommendations from 2011.
#5 Re-use connections
Each RDBMS has a connection component: tOracleConnection, tMSSqlConnection. Add a connection to your component and reference the component with the "Use an existing connection" option in other DB components like tMSSqlOutput or tOracleInput. This centralizes the connections configuration which includes items like username/password, auto-commit settings, and JDBC properties.
#4 Properties files
When you're managing different environments, particularly a production environment, a text-based properties file is a convenient way to configure your jobs. The properties file can be versioned, is easily readable, and supports files differences with Linux commands like "diff".
This is a video on using properties files in Talend Open Studio.
http://www.youtube.com/watch?feature=player_embedded&v=4J4FuN43IcA#t=0s
#5 Re-use connections
Each RDBMS has a connection component: tOracleConnection, tMSSqlConnection. Add a connection to your component and reference the component with the "Use an existing connection" option in other DB components like tMSSqlOutput or tOracleInput. This centralizes the connections configuration which includes items like username/password, auto-commit settings, and JDBC properties.
A Job with a tMSSqlConnection Component |
When you're managing different environments, particularly a production environment, a text-based properties file is a convenient way to configure your jobs. The properties file can be versioned, is easily readable, and supports files differences with Linux commands like "diff".
This is a video on using properties files in Talend Open Studio.
http://www.youtube.com/watch?feature=player_embedded&v=4J4FuN43IcA#t=0s
The standard way to parameterize a set of Talend Open Studio jobs is through Context Groups. These are sets of global variables grouped by an environment (dev, test, prod) which can be toggled through an export or via the Run View.
Run View Referencing Several Contexts |
#2 Use queries
While Talend Open Studio will generate queries based on a table for your input components like tOracleInput, you can save your own queries and reference them throughout your jobs. This has two advantages. The first is to allow for queries that span multiple tables and that exceed the query-generation capability of Talend Open Studio (think Oracle set-based operations). The second is to produce a more robust job by leaving off irrelevant queries that may be removed later.
For example, if a lookup involves only a name and an id field, there's no need to add other fields that may be dropped before the job goes to production. If a column is dropped and it's not relevant to the query, it shouldn't break the job.
A Talend Open Studio Query |
Schemas should be based on the Repository rather than Built-in where ever possible. In some cases, components like tMSSqlOutput can be adjusted to ignore columns for a write operation using the Advanced tab. That way, a complete set of columns can still be referenced in a Repository schema, but there won't be any contention over auto-generated fields.
This tip also works with #2 to support more robust jobs. If a subset of fields is used repeatedly -- say an id/name pair -- define it as a Generic Schema or other Schema and store it in the repository. That way, the field list never becomes out of sync with the database (as long as the lookup fields are still valid).
Best of luck to all the Talend coders in 2012.
No comments:
Post a Comment