In today's integration projects, it's useful to sketch out a design prior to coding. The Data Flow Diagram, or DFD, from the late 70's is helps get projects off to a great start.
Even though the current market of ETL tools is graphical, the tools become difficult to read with the details of processing: errors, reject files, etc. A DFD is a diagram that is essential for design and planning purposes. The DFD establishes the partitions in the project (functional, work assignments, phases).
A DFD consists of processes (circles), boxes (external systems and data stores), and arcs (data flows). The simple nomenclature makes it accessible project-wide.
The Data Dictionary is a listing of schemas and data definitions used in the project. It's related to the DFD by the arcs. For every arc, there should be a Data Dictionary entry. The format of the Data Dictionary may vary depending on the project requirements. It can contain detailed entries or a trace amount of information essential to development.
The following DFD is a single process, LoadContactJob, that loads a MySQL datastore. As it's loading, it also queries the MySQL data store via a lookup function. This DFD is a Level 0 DFD which represents most abstract, system-oriented view of the project. Additional levels which break apart LoadContactJob can be added if the project requires; this particular example is ready for coding. Another level down would be nearly identical to the graphical coding.
In the image, the Data Dictionary is added as an Enterprise Architect note. A better option is to store the Data Dictionary in an Enterprise Architect document. Then the documentation and diagram can be generated in HTML and published.
In today's Agile environments, sometimes this type of documentation gets skipped as developers bear down on the task at hand. But every Extreme or Agile methodology allows for timeboxed activities, and the planning and coordination provided by the DFD and related Data Dictionary will make future sprints, increments, and releases go smoother.
Even though the current market of ETL tools is graphical, the tools become difficult to read with the details of processing: errors, reject files, etc. A DFD is a diagram that is essential for design and planning purposes. The DFD establishes the partitions in the project (functional, work assignments, phases).
A DFD consists of processes (circles), boxes (external systems and data stores), and arcs (data flows). The simple nomenclature makes it accessible project-wide.
The Data Dictionary is a listing of schemas and data definitions used in the project. It's related to the DFD by the arcs. For every arc, there should be a Data Dictionary entry. The format of the Data Dictionary may vary depending on the project requirements. It can contain detailed entries or a trace amount of information essential to development.
The following DFD is a single process, LoadContactJob, that loads a MySQL datastore. As it's loading, it also queries the MySQL data store via a lookup function. This DFD is a Level 0 DFD which represents most abstract, system-oriented view of the project. Additional levels which break apart LoadContactJob can be added if the project requires; this particular example is ready for coding. Another level down would be nearly identical to the graphical coding.
Example DFD |
In today's Agile environments, sometimes this type of documentation gets skipped as developers bear down on the task at hand. But every Extreme or Agile methodology allows for timeboxed activities, and the planning and coordination provided by the DFD and related Data Dictionary will make future sprints, increments, and releases go smoother.
No comments:
Post a Comment