Tuesday, June 18, 2013

ETL and the Logical Model


For some applications, the logical model represents a significant investment.  From Hibernate and JPA, to Microsoft's ADO.NET Entity Framework, to PHP ORM libraries, development is moving farther away from the database and into higher levels of abstraction.  This presents a challange for ETL developers who don't use these frameworks and have to operate their tools on the physical model.  What's lost are the business relationships that represent the heart of the application.

The logical model is an arrangement of domain objects.  Often, these domain objects have a strong presence in the database.  That is, a domain object is easily recognizable as a relational table.  But domain objects also include relationships and that's where the divergence with and RDBMS begins.

For example, take a simple domain object called 'Contact'.  The Contact can be any of three ContactTypes: Customer, 'C'; Vendor, 'V'; or Employee, 'E'.  Also, a Contact may at least one email address.  In a logical model, this is shown as follows in a diagram created in Enterprise Architect.
CONTACT Logical Model
 Despite the similarity of ContactType's and ContactEmail's relationships with Contact, they have a very different physical implementation.  Since there may be some unused ContactTypes -- say there aren't any Employees initially -- the relationship is a Many-To-Many and requires a join table.  An Email is required and won't be orphaned in the database; this is a Many-To-One.
CONTACT Physical Model
 Today's ETL tools emphasize the physical model.  They read schemas from relational databases like Oracle or SQL Server and establish source / target mappings to other databases, files, and XML.  But because they lack access to the logical model, the tools aren't as effective as they could be.

There are some techniques in Java that can bridge this gap.  Look for future posts identifying easy-to-use or graphical solutions to integration.

No comments: