DWH Modeling Rule #2: Build a generic, data-driven core

Almost any Data Warehouse architecture can be divided into five sections:

  1. Raw data from external data sources
  2. Staging area
  3. Consolidated and enriched raw data
  4. Multidimensional data (dimension tables, fact tables, etc.)
  5. Data marts

When we take a look at the data flow in the warehouse process we can find the most client-specific or application-specific requirements at the transitions 1->2, 2->3, and 4->5. In most cases, the transition 2->3 is particularly specific, complex, and elaborate.

On the other hand, the transition 3->4 is a good candidate for a generic approach. The complete process can be defined by a set of meta data and a set of procedures, which, based on the meta data, can dynamically build and execute the required SQL statements. The transition 3->4 can still be very complex and elaborate, but it is by far not as application-specific as the other transitions.

Sections 3 and 4 form what I call the Warehouse Core. This is the place where you can normally find some sort of star- or snowflake-schemas. The warehouse process of the core is primarily made up of the following steps:

  • Aggregations
  • Management of historical changes
  • Management of structural changes

One might say, that these operations can be very application-specific. This is absolutely right, but, compared to the other sections, they can easily be customized and configured by enhancing the star-schema and the meta data a little bit.

This post shall just set the stage for the upcoming detailed description of the enhanced star-schema and the associated meta data , which are both part of the Data Warehouse Framework I have developed over the past years. With the help of this framework I have succesfully implemented a number of Data Warehouse solutions, many of them containing irregular and ragged hierarchies and non-additive measures.