Tuesday, June 18, 2013

How to Improve Step by Step ETL/ELT Open Sources Tools (Talend) On Teradata

Part 1 : Application setup

Prerequisites: Setup Talend TOS environnement for Teradata

  • Download TOS All available for all environment (Window, Linux, MacOs)
  • Install TOS

Step 1 : Download Talend

The download of Talend Open Stiudio Zip file is available on: http://www.talend.com/download.php#

Step 2 : Unzip on your local directory

Step 3 : launch TOS Application

To launch TOS Application on the Windows environment, execute “TalendOpenStudio-win32-wpf.exe” and accept the License agreement:

Step 4 : Create a new connection

After license agreement aceptance, click on  Then complete the fields on the next screen : "Référentiel" : Repository=Local, "Email" = , "Mot de passe":Password=, "Créer un nouveau projet":Create a new project.

Step 5 : Create a new Project

Select on the list box “Create a New Project” and click on "OK" button, the screen allowing the creation of a new project appears:
"Nom du Projet": Project Name = Step By Step,
"Nom technique": Technical Name = STEP_BY_STEP,
"Description du projet": Project Description = ,
"Langage de génération": Generation Language select  Java



Step 6 : Select a Project

Select now new Project that you have created, "Projet Sélectioner": Select Project = Step By Step.

Step 7 : Start TOS Application

Click on “Start now” to open TOA application
TOS application is now ready to use.

Part 2 : Step by Step Development

Scenario 1: Classical load ETL with the tTeradataFastLoad using JDBC

Generate a dataset sample and load the result into a Teradata table. (the CREATE TABLE script for cust_teradata is at the end of the document)
Components need:
  • a tRowGenerator (category Misc in the component Palette)
  • a tTeradataFastLoad (category Databases/Teradata in the component Palette)

Step 1 : database metadata discovery wizard

  • click right on the “DB connections” item in the Repository and choose a name of the metadata connection in your repository, for example “myTeradata”.
  • then in the following step, specify the right value to connect to your Teradata database. Click finish to save it in your repository.

Step 2 : retrieve metadata schema from the Database

  • click right on “myTeradata” item in the Repository and choose “Retrieve Schema”, then choose your expected filter Table, View, Synonym or write a specific filter of Object return into the wizard.
  • check to select Table structure that you would like retrieve in the metadata repository.
  • then in the last step you can verify the structure returned via the catalog.

Step 3: Create a new JobDesign

  • click right on “Job Designs” in the Repository and specify a name
  • then choose a tRowGenerator component into the component palette (Misc category)
  • drag-n-drop the cust_teradata item in your repository into the JobDesign interface and choose the tTeradataFastLoad component into the dialog box.
  • then create a link ROW between tRowGenerator and tTeradataFastLoad components. Click-right on tRowGenerator, choose “Row” and “Main” and link it to tTeradataFastLoad.

Step 4: tRowGenerator configuration

  • Double-click on tRowGenerator to open configuration GUI.

Step 5: tTeradataFastLoad configuration

  • Double-click on tTeradataFastLoad to open basic settings properties tab.

Step 6: tab RUN to execute the JobDesign

Scenario 2: load ETL with the tTeradataMultiLoad using mload teradata utility

The dataset already exist into a delimited file (comma or semi-colon separated). You can produce this dataset with the previous JobDesign. Just modify the output component and replace by tFileOutputDelimited.

Step 1 : tTeradataMultiload

  • Pick and drop the tTeradataMultiLoad into a new JobDesign
Below, you can see :
  • the mload script code generated by Talend
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
/**
* [tTeradataMultiLoad_1]
*/
currentComponent="tTeradataMultiLoad_1";
 
String tableFullName_tTeradataMultiLoad_1 = "talend" + "." + "cust_teradata";
java.io.FileWriter fw_tTeradataMultiLoad_1 = new java.io.FileWriter("D:/scriptfolder/"+"cust_teradata"+".script");
 
StringBuilder script_tTeradataMultiLoad_1 = new StringBuilder();
fw_tTeradataMultiLoad_1.write(".LOGTABLE "+"talend"+"."+"cust_teradata"+"_lt;\r\n");
fw_tTeradataMultiLoad_1.write(".LOGON "+"talend"+","+"talend"+";\r\n");
 
fw_tTeradataMultiLoad_1.write(".BEGIN IMPORT MLOAD TABLES "+tableFullName_tTeradataMultiLoad_1+" SESSIONS 8;\r\n");
fw_tTeradataMultiLoad_1.write(".LAYOUT customer_layout;\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"ID"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"NAME"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".FIELD "+"DATE_BIRTH"+" * VARCHAR(255);\r\n");
fw_tTeradataMultiLoad_1.write(".DML LABEL DML_LABEL;\r\n");
 
fw_tTeradataMultiLoad_1.write("INSERT INTO "+tableFullName_tTeradataMultiLoad_1+"("
+"ID,NAME,DATE_BIRTH"+") VAlUES("+":ID,:NAME,:DATE_BIRTH"+");\r\n");
fw_tTeradataMultiLoad_1.write(".IMPORT INFILE "+"D:\\scriptfolder\\bulk_file.txt");
fw_tTeradataMultiLoad_1.write(" FORMAT VARText '"+";"+"' LAYOUT customer_layout APPLY DML_LABEL;\r\n");
fw_tTeradataMultiLoad_1.write(".END MLOAD;\r\n");
fw_tTeradataMultiLoad_1.write(".LOGOFF;\r\n");
 
fw_tTeradataMultiLoad_1.close();
 
String sb_tTeradataMultiLoad_1= new String("cmd /c mload < "+"D:/scriptfolder/"+"cust_teradata"+".script > "+"D:/scriptfolder/bulk_file_log.txt"+" 2>&1");
/**
* [tTeradataMultiLoad_1]
*/

  • Create table script for cust_teradata
1
2
3
4
5
6
7
8
9
CREATE MULTISET TABLE talend.cust_teradata ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ID INTEGER,
NAME VARCHAR(255) CHARACTER SET LATIN NOT CASESPECIFIC,
DATE_BIRTH DATE FORMAT 'YYYY-MM-DD')
PRIMARY INDEX ( ID );

No comments: