November 2006

The crucial question: Which BI-Tool is able to aggregate this correctly? I say: “NONE!”

I really want to know it! I’m making the assertion, that none of the BI-Tools on the market today is capable of calculating the right aggregations for the following real-world task.

Again, we’re looking at a dimension with diamond shapes:

On the right you can see a part of the sales force of a financial services company.Sales force structure
C1-C4 is a subset of the clients, R1 and R2 are two sales reps and M1 is a regional manager.

For each client, the measure “a” is given, which means the yearly amount of money available to the client for investment. In order to calculate the investment potential for each node in the hierarchy, “a” has to be summed up.

It is typical for financial services companies, that each rep is expected to fully exploit a client’s potential. That means, that it is no option to use weighting factors to share the client’s potential between reps. As you can see on the right, C2 is assigned to two reps, whereas C2’s full investment potential has to be assigned to R1 and to R2 respectively.

For M1 each client’s potential must be taken into account only once.

To have something to work with, I give you the investment potentials for each client:

  • C1: 2.000
  • C2: 7.000
  • C3: 8.000
  • C4: 4.000

Here are the expected results of the aggregation:

  • R1: 9.000
  • R2: 19.000
  • M1: 21.000

Please do not hesitate to leave your comments here. I’m still convinced, that there is no BI-tool available, which would be able to solve to above mentioned problem. These kinds of aggregations and hierarchies are everywhere and it would be a quantum leap to finally have an out-of-the-box solution that could cope with them.


BI-Tools

Comments (5)

Permalink

DWH Managing Rule #1: The single most important prerequisite for success is a complete set of meta data

In my opinion, one of the very first things a DWH project manager should strive for is the definition of a complete and consistent set of meta data.

If this is done, requirements engineering, specification, documentation, and project management is nothing more than collecting meta data and assessing the completeness of the meta data set. Through priorities and processing sequences it is possible to completely define a procedural model for the DWH project.

When I speak of meta data, I do not only mean the more or less technical data, which describes dimensions and facts, but also data, which describes the warehouse process (ETL), and, most important, “political” data like target groups, stakeholders, team members, and other important people.

To get the most out of the meta data and to alleviate the collecting and administration of the meta data set, I frequently use a relational database. That allows me to generate a GUI for entering data and a number of different reports. Plus, this database can be used as a central repository for each member of the project team. For the project manager it can be of great help, if it contains typical project information like target date, status, estimated effort, remaining effort, responsibilities, etc. for the relevant entities.

A big advantage, which is based on the completeness of the meta data set, is, that certain pitfalls and showstoppers can be identified at a very early stage of the project.

Here is an example from one of my projects: I’m always especially paranoid with historical variability like slowly moving dimensions (which often turn out to be rapidly changing dimensions). Hence there are a number of attributes in my meta data model, which describe SMDs. In the (meta data based) process of specification and requirements engineering I asked the client about the historical variability of the product hierarchy. The people I asked were very amazed and apparently, nobody in the company had ever though about it. The question was: What happens with historical data when the product hierarchy changes? Has the change to be applied to the historical data (especially aggregated data)? Through the procedural model implied by the meta data we were able to address the implications of the historical variability at a very early stage in the project and we could force the client’s management to make a reliable decision. Very often, these kinds of aspects finally occur when the BI system is already in production, jeopardizing the success of the entire project.

In one of my next posts, I’m going to describe the meta data model in more detail by identifying the different sections of the model and describing the attributes, which make up the different meta data entities .

Data Warehouse
Management
Meta Data

Comments (2)

Permalink

DWH Modeling Rule #1: Most aggregations have to be done in the Data Warehouse directly

If you have read my first post about “my real world experience” with out-of-the-box BI systems like Cognos, you might have gotten the impression, that I was bashing Cognos. This is definitely not the case, since Cognos and other BI systems are great software products, which offer a wide range of functionality. The point I was trying to make is, that even the leading product in the BI market was and still is not able to cope with certain data structures. It’s not that these data structures are especially weird or uncommon, no, they have occurred in each data warehouse project I have been involved so far.

Hierarchy with diamond shape The picture on the left depicts a typical hierarchy, which can often be found as the structure of a sales force.

C1-C4 are clients, who are assigned to the sales reps R1 and R2. The sales reps are both managed by regional manager M1.

A quite important measure for sales reps, managers, sales unit, and, of course, the company as a whole is the number of associated clients.

How would a typical BI tool be set up to calculate the number of clients based on the hierarchy on the left?

  • The client level with members C1-C4 is defined as the raw data level. Each member has a client-id as a primary key.
    .
  • The measures for upper levels for the sales reps and the managers are aggregated by the system. These aggregations are either pre-calculated or take place on-the-fly.
    The aggregation rule is “count(distinct client-id)”.
    .
  • First, the measures for the sales reps are calculated with the following results: R1: 2, R2: 3
    .
  • Based on the results for the reps, the measures for the managers are calculated. The result for M1 would be 2+3=5, which is obviously wrong!

Continue Reading »

Data Warehouse
Data Modeling

Comments (4)

Permalink

Welcome to my real world experience!

Welcome to my Data Warehouse Blog! To illustrate the motivations to come up with this blog I’d like to tell you a little story first.

Back around 1995 I was working for a software company, which had been the market leader for Sales Force Automation (SFA) and Electronic Territory Managment Systems (ETMS) software for the pharmaceutical industry in Germany. As the leader of the server development team I was in charge of the programs and the underlying Oracle database, which guaranteed a multi-directional data flow between the different sales reps and the head office.Reps were reporting sales calls and additional data about their activities as well as certain characteristics of the doctors, pharmacies, and hospitals they visited. Additionally, pharmaceutical companies were buying turnover and sales volumes data from external providers based on geographical segments, time, and different levels of a self-defined “product hierarchy”.

All this valuable data was residing in the database on the server, which obviously triggered the desire for analytical applications.

Long before I had been responsible for the server, an analytical application had been developed by a another team. This application did no longer satisfy the growing demands of the very heterogeneous user base and thus, we decided to abandon it and to develop a new suite of analytical applications, based on a real data warehouse, from scratch.

Then, someone had the very smart idea to check the market for out-of-the-box solutions! Plus, some of our customers were already using BI systems like Cognos. After contacting Cognos we agreed to hold a three day in-house workshop as a proof of concept and soon, three Cognos staffers showed up at our office in Heidelberg: The unavoidable sales guy, a senior pre-sales consultant, and a quite attractive and very nice young woman, who seemed to be new to the company and apparently doing some training on the job. I can’t recall her name today, so let’s call her Lucy.
Continue Reading »

About this blog
BI-Tools

Comments (8)

Permalink