DWH Managing Rule #1: The single most important prerequisite for success is a complete set of meta data

In my opinion, one of the very first things a DWH project manager should strive for is the definition of a complete and consistent set of meta data.

If this is done, requirements engineering, specification, documentation, and project management is nothing more than collecting meta data and assessing the completeness of the meta data set. Through priorities and processing sequences it is possible to completely define a procedural model for the DWH project.

When I speak of meta data, I do not only mean the more or less technical data, which describes dimensions and facts, but also data, which describes the warehouse process (ETL), and, most important, “political” data like target groups, stakeholders, team members, and other important people.

To get the most out of the meta data and to alleviate the collecting and administration of the meta data set, I frequently use a relational database. That allows me to generate a GUI for entering data and a number of different reports. Plus, this database can be used as a central repository for each member of the project team. For the project manager it can be of great help, if it contains typical project information like target date, status, estimated effort, remaining effort, responsibilities, etc. for the relevant entities.

A big advantage, which is based on the completeness of the meta data set, is, that certain pitfalls and showstoppers can be identified at a very early stage of the project.

Here is an example from one of my projects: I’m always especially paranoid with historical variability like slowly moving dimensions (which often turn out to be rapidly changing dimensions). Hence there are a number of attributes in my meta data model, which describe SMDs. In the (meta data based) process of specification and requirements engineering I asked the client about the historical variability of the product hierarchy. The people I asked were very amazed and apparently, nobody in the company had ever though about it. The question was: What happens with historical data when the product hierarchy changes? Has the change to be applied to the historical data (especially aggregated data)? Through the procedural model implied by the meta data we were able to address the implications of the historical variability at a very early stage in the project and we could force the client’s management to make a reliable decision. Very often, these kinds of aspects finally occur when the BI system is already in production, jeopardizing the success of the entire project.

In one of my next posts, I’m going to describe the meta data model in more detail by identifying the different sections of the model and describing the attributes, which make up the different meta data entities .