Tuesday, June 18, 2013

A Talend Conversion Project


Splitting a table is a common conversion project.  As the data model grows, cardinality changes.  That is, a contact may have had a single email address ten years ago, but now there may be several.

If you have a table with a column, and you need to store more of something in that column, consider creating another table to hold the value.  For example, if a Contact table has an URL column, and Contacts can now have more than one URL because of a change in technology, create a second ContactURL table to hold the url.  A join will link the many ContactURL records to the Contact.

See this physical model for a visual guide to the structures mentioned in the following paragraphs.

In Talend, splitting a table in this fashion requires two data flows.  One is the main flow representing the part of the structure that stays the same.  The second flow represents the value to be broken out.  The flows are separate because the main flow must run to provide linkage information (ids).

In the linked physical model, the LEGACY_CONTACT -> CONTACT load represents the main flow.  This is a straightforward mapping.  The second flow is a LEGACY_CONTACT + newly loaded CONTACT -> CONTACT_URL.  This is created in Talend as follows.

Talend Job Splitting a Table
This job contains the following components

Data sources and targets

A MySQL LEGACY_CONTACT table, CONTACT table, and CONTACT_URL table.  LEGACY_CONTACT is used as an input twice.  CONTACT is used both as a target and as an input.

Maps

A tMap is used to load the CONTACT table from the LEGACY_CONTACT table.  This is a straight-forward, field-by-field mapping.  The tMap for CONTACT_URL contains an expression that filters out records with a NULL in the URL_PATH.  Press the green arrow in the target map's schema to activate the expression.


Talend Job Splitting a Table


 Splitting a file is a common scenario in the life of an application.  Talend provides a convenient way to covert the structure, especially if compared with hand-coding.  This conversion projects -- with supporting jobs for the InitialLoad, Reset, and RunAll -- took about 20 minutes.  Most of the productivity is from not having to manually process the database schema (cut-and-paste, etc.) by building the tMaps off of the tables.

1 comment:

21cssIndia said...

Talend Online Training
Introduction
Introduction to Talend
Why Talend?
Talend Editions and Features
Talend Data Integration Overview
Talend Environment
Talend Environment - Overview
Repository and Pallate
Talend Design and Views
Talend Jobs Designing
Types of Components
Basic Components - Overview
Component Properties
Sample Job designing
Job Execution. For More...
Call Us +919000444287
http://www.21cssindia.com/courses/talend-online-training-160.html