Tuesday, June 18, 2013

A Talend Conversion Project


Splitting a table is a common conversion project.  As the data model grows, cardinality changes.  That is, a contact may have had a single email address ten years ago, but now there may be several.

If you have a table with a column, and you need to store more of something in that column, consider creating another table to hold the value.  For example, if a Contact table has an URL column, and Contacts can now have more than one URL because of a change in technology, create a second ContactURL table to hold the url.  A join will link the many ContactURL records to the Contact.

See this physical model for a visual guide to the structures mentioned in the following paragraphs.

In Talend, splitting a table in this fashion requires two data flows.  One is the main flow representing the part of the structure that stays the same.  The second flow represents the value to be broken out.  The flows are separate because the main flow must run to provide linkage information (ids).

In the linked physical model, the LEGACY_CONTACT -> CONTACT load represents the main flow.  This is a straightforward mapping.  The second flow is a LEGACY_CONTACT + newly loaded CONTACT -> CONTACT_URL.  This is created in Talend as follows.

Talend Job Splitting a Table
This job contains the following components

Data sources and targets

A MySQL LEGACY_CONTACT table, CONTACT table, and CONTACT_URL table.  LEGACY_CONTACT is used as an input twice.  CONTACT is used both as a target and as an input.

Maps

A tMap is used to load the CONTACT table from the LEGACY_CONTACT table.  This is a straight-forward, field-by-field mapping.  The tMap for CONTACT_URL contains an expression that filters out records with a NULL in the URL_PATH.  Press the green arrow in the target map's schema to activate the expression.


Talend Job Splitting a Table


 Splitting a file is a common scenario in the life of an application.  Talend provides a convenient way to covert the structure, especially if compared with hand-coding.  This conversion projects -- with supporting jobs for the InitialLoad, Reset, and RunAll -- took about 20 minutes.  Most of the productivity is from not having to manually process the database schema (cut-and-paste, etc.) by building the tMaps off of the tables.

1 comment:

  1. Talend Online Training
    Introduction
    Introduction to Talend
    Why Talend?
    Talend Editions and Features
    Talend Data Integration Overview
    Talend Environment
    Talend Environment - Overview
    Repository and Pallate
    Talend Design and Views
    Talend Jobs Designing
    Types of Components
    Basic Components - Overview
    Component Properties
    Sample Job designing
    Job Execution. For More...
    Call Us +919000444287
    http://www.21cssindia.com/courses/talend-online-training-160.html

    ReplyDelete