Tuesday, June 18, 2013

Five Date Formats in Talend Open Studio


It's best to standardize on one date format throughout your application.  The date format for the RDBMS is a good candidate, like "dd-MM-yy" for Oracle.  But dates can come from other sources, so you'll occasionally need to format a date.

Schemas in Talend Open Studio can automatically turn text strings into date objects.  Once Talend recognizes a field as a date, that date can be made available to other components without having to worry about format.

The following screenshot is a text file with fields of various date formats.

Input File with Different Date Formats
Load the file using a tFileInputDelimited.  When creating the schema, define each field as a date (rather than a string).  Specify a pattern to handle the different formats.  Talend Open Studio provides a handy code complete that will help select the correct format.  The following date pattern is for "December 2, 2010" which isn't in the code complete list.

"MMMM dd, yyyy"

MMMM uses the full month, like "December".  MMM is for an abbreviated month, "Dec".  MM is used for numeric values ("12").

This is a simple tFileInputDelimited to tLogRow Job.
Job Showing Date Conversions

The following schema identifies each field as a date and provides a date pattern.  It's for the tFileInputDelimited.

Schema with Different Date Format
This schema converts the input strings to a date format.  Once the fields are in date format, they can be reformatted.  The tLogRow schema applies a uniform date format to all of the date fields.
Schema Applying a Common Date Format
Running the job produces the following.

Running a Job with a Common Date Format
When dealing with dates, it's best to standardize on a format.  However, dates can come from a lot of other sources -- a web application, legacy data, external interface -- so, become familiar with handling dates in a schema or a dedicated component like tConvertType.

No comments: