25. January 2017 · Comments Off on Pitfalls in analyzing pharma sales force effectiveness – Part 1 · Categories: Data model, Data Warehouse, Specials · Tags: , , , , , ,

Pitfalls in analyzing pharma sales force effectiveness – Part 1

Sales Force Effectiveness - The ultimate goal

Sales Force Effectiveness – The ultimate goal

One of the main goals of this blog is to clarify special real-life demands on business intelligence systems. Academic sources and textbooks often do not fully address these special requirements. This also applies to many of the common BI solutions on the market.
I have encountered, and fulfilled, plenty of these respective demands in designing and building systems for analyzing pharma sales force effectiveness.

Unlike many other sectors, pharma is up against special data structures, data constellations, and hierarchies. I can’t say it enough that way too often decision-makers ultimately drop important business requirements, just because the BI system cannot fulfill them.

I will describe some of these special requirements in a special series parallel to the regular path of this blog. Even if your focus is not on pharma, you will eventually encounter some of the following aspects in other sectors too. I had to deal with them in projects for insurance companies, financial service providers, banks, and companies from consumer sectors.

These are the subjects I will address in the special series:

  • Hierarchies in sales force and regional structures
  • Product hierarchies
  • Combining and aggregating data from different sources with different granularity
  • Raw data on different hierarchy levels
  • Temporal variability of
    • locations and relationships of doctors and institutions
    • doctor’s specialities and characteristics
    • target groups and segmentation
    • sales force constellation
    • sales rep’s objectives and implications on sales force compensation

 Hierarchies in sales force and regional structures

Regularly, the sales force is divided into several sales lines. Different lines are dedicated to different products, product groups or indications, different types of clientele, or different business models (like “traditional” sales force or key account management).  All these lines can have different internal structures. They can be entirely independent or they can mix somewhere in the overall hierarchy. Regardless of that, in most cases a thorough analysis requires summaries, measures, and KPIs on corporate level over all of the sales lines.

The hierarchies of traditional sales force models are derived from geographic entities, where the companies strive for an optimal alignment of territories. The smallest geographic entities are often determined by legal requirements and by the granularity of data delivered by external sources. The hierarchies are made up of typically a handful of levels (e.g. brick or segment, subterritory, territory, region, area, country, etc.).

The following images depict simplified versions of a sales force structure. A typical structure frequently has more than four hierarchy levels. However, for the sake of clarity, I will show and discuss only four levels in the upcoming examples.

The entities on the different hierarchy levels are:

  • Brick or Segment
    A brick is usually the smallest geographic unit, defined according to legal restrictions for anonymized prescription data. Bricks represent the highest regional granularity in data from external sources like IMS Health Xponent etc.. In the best case, bricks correspond to ZIP codes, whereas national laws often dictate larger areas. Bricks don’t have persons from the sales force assigned to them. On the other hand, target persons like doctors and institutions like physician’s offices, hospitals, or pharmacies are relevant.
  • Territory
    The territory is directly assigned to a sales rep. It is the area where the rep operates and where the relevant target persons and institutions reside.
  • Region
    A region combines several territories under the responsibility of a regional manager.
  • Country or Organization
    The upper level or all-encompassing node of the hierarchy is often a country, a busniess unit, or an entire corporation.

The “nice and easy” hierarchy

In the best case, the hierarchy looks similar to this:

Simplified diagram of a "nice and easy" sales force structure

Simplified diagram of a “nice and easy” sales force structure – Click to enlarge

Whenever you encounter a structure like the above, calculating and aggregating measures should be a piece of cake. All of the current BI solutions on the market are able to deliver the correct results just out of the box.

The main characteristics of a “nice and easy” sales force structure are:

  • Just one sales line
  • Each node of the hierarchy has a well-defined level
  • Each node has exactly one parent on the next upper level
  • For each node, there is only one distinct path to the top node

The team-oriented sales force

The team-oriented sales force structure

The team-oriented sales force structure – Click to enlarge

Now it’s starting to get a bit nasty. In almost every data warehouse or BI project in the pharmaceutical sector I’ve seen team-oriented sales forces. The operational area of a rep is not restricted to a distinct part of a region. Instead, all the reps form a team, which is sharing and spreading activities over the entire region.

The main characteristics of a team-oriented sales force structure are:

  • Nodes (especially bricks) have multiple parents on the next upper level
  • For some nodes, there is more than one distinct path to the top node

The above mentioned aspects form constellations that are commonly called “diamond shapes”.

The consequences of a team-oriented structure are clearly identifyable in the following table:

Aggregating sales volume in a team-oriented sales force structure

Aggregating sales volume in a team-oriented sales force structure – Click to enlarge

This is a snippet from a Power BI report based on sales data for a fictional german pharmaceutical company. Like in a real-world scenario, raw prescription data are present at the bricks level. The data (in this case sales volume and sales volume market share for one specific product) are aggregated for each node of the hierarchy.

Whereas “Region 111700” belongs to a sales line with a conventional structure, the other regions are part of a sales line with a team-oriented structure. It can easily be seen that each territory (Gebiet) of “Region 111700” has different values, whereas all the territories of the other regions have the same values. Plus, the values for each territory in the team-oriented line are identical to the values of the respective region.
This is correct, because in the conventional structure each brick belongs to only one distinct territory, while in a team-oriented sales force a brick belongs to all of the territories inside a region. Nevertheless, the sales volume of a brick must not be included more than once in the total of the region.

BI systems are ususally not able to deliver correct aggregations for team-oriented sales force structures without special interventions.

In the next post of this special series I will discuss some more “peculiar” pharma sales force structures and I will also give some examples from other sectors.

18. January 2017 · Comments Off on The ETL process – Part 2 – Guidelines & Standards – continued · Categories: Data Warehouse, Development, ETL, Management · Tags: , , ,

The ETL process – Part 2 – Guidelines & Standards – cont.

Guidelines lead the way

Guidelines lead the way

In one of the previous posts I have introduced the first two topics of the top four guidelines and standards for the ETL process.

Especially point one regarding big-scale ETL systems may have provoked some disagreement, disconcernment, or confusion. Everything I said is only relevant when there is the possibility and willingness of having a (at least temporary) team of dedicated developers. This might not be feasible for smaller projects. However, if there are any issues a commercial ETL tool cannot cope with, it would often be better to develop a new and independent system than tweaking the tool with weird workarounds. I have also heared of clients that have abandoned requirements, just because the ETL system could not fulfill them.

Here are topics three and four:

  • Don’t throw away anything that ever entered the data warehouse.

    Psychologists know the syndrome of compulsive hoarding. While this is a serious disease, it might come in handy in data warehousing. A general goal should be to be able to restore any state of the warehouse that ever existed. Quite frequently, the client establishes new structures in dimensions or hierarchies. There might even be major paradigm shifts in business models. In this context, clients often decide to forget about the structures and data of the past, just to find out later that some comparison or A/B analysis of present and past could be quite enlightening. It could also be interesting to aggreate and analyze historic data according to actual structures.
    I have also undergone scenarios where the client recognized that the whole restructuring or paradigm shift didn’t lead to the desired results. So they eventually decided to do a rollback to the old structures.
    Another scenario is the worst case of a technical failure that leads to a defunct database. To make things worse, an up-to-date backup is also no longer available.
    In all those cases, a well designed and fostered archive of all the interface data can be a lifesaver.
    There will be a detailed discussion about archiving in one of the upcoming posts.

  • Exclusively use human-readable files as input for interfaces.

    First of all: No direct-reads from data sources. This is not only a direct consequence of the previous topic, but there are also some more things to consider about directly reading from the data source:

    • The data source could be in an inconsistent state.
    • Querying the data source could seriously slow down other attached online systems.
    • The state of the data in the data source might be volatile.
    • The dependency on the structure of the data source requires a higher grade of coordination and management.
    • Auditing of the data transferred is more difficult.
    • Special connectors to specific data sources are necessary.

If it is only possible, try to convince the people concerned of delivering all interface data in human-readable flat files. Possible formats are CSV, TXT, or more nosql-oriented formats like JSON and XML. External data providers often have their own proprietary formats, but most of them are at least human-readable.
Why human-readable? Because this makes life much easier when auditing or lookups are necessary. We can open an interface file in an editor and look directly at or for the interesting data. Plus, in case of a data quality problem that needs immediate mending, we could quickly patch the data in the interface file. As mentioned in point two of guidelines & standards, please apply this only as an absolute exception and only to avoid showstoppers, since the responsibility for data quality should be in the realm of the respective data source.
Last but not least, it’s quite easy to compress, archive, backup and restore flat files. Conversely, archiving and backing up direct-read data requires extra steps to export it from the warehouse.

This concludes the top four list of guidelines & standards for the ETL process. The next posts are all about metadata for the ETL process.


14. January 2017 · Comments Off on Use TOAD · Categories: Development, Misc · Tags: , ,


The TOAD logo

The TOAD logo

After the first part of guidelines and standards I’d like to do a little interlude and introduce TOAD.

Whenever I start a new data warehouse or database project I warmly recommend to the client to purchase a handful of TOAD licences for the development team. Sometimes even the free version will do the job.

For me, TOAD is an indispensable tool for developing and debugging stored procedures, SQL statements, and database objects. Compared to TOAD, the built-in tools of the database vendors like SQL Server Management Studio (SSMS) or Oracle SQL Developer appear somewhat ridiculous.

There are versions for many DB systems like Oracle, SQL Server, DB2, MySql, SAP, Hadoop and there is also an agile and vivid community.

Just my two cents for the weekend. And, BTW, I’m in no way associated with Quest. I’m just a happy and pleased user for many years.

11. January 2017 · Comments Off on The ETL process – Part 2 – Guidelines & Standards · Categories: Data Warehouse, Development, ETL, Management · Tags: , , , , , , , , ,

The ETL process – Part 2 – Guidelines & Standards

Guidelines lead the way

Guidelines lead the way

After achieving some results from the analysis of the ETL process (Part 1 – The Analysis), it quickly becomes evident that it is not sensible to finally come up with an “egg-laying woolly-milk-sow” (as we say in German: “eierlegende Wollmilchsau” :-)).
However, if the analysis has been complete and painstaking, the requirements for the ETL process should be clear after that. Despite the fact that there should be room for extensions and new functionalities (especially in an agile environment), clear-cut red lines should be drawn.

Here are the first two of my top four list of guidelines:

  • Beware of too much proprietary or closed-source software.

    My best friends, the big-scale ETL tools and frameworks, fall into that category.
    Some people, especially staff-members of big consulting companies, would strictly contradict. Usually, their chief argument is that the use of tools or frameworks decreases the degree of dependence on software developers. That is nothing but the truth!
    But what’s finally also true is that the use of tools or frameworks increases the degree of dependence on staff-members of big consulting companies or software vendors, and on special developers, who are proficient in those tools and frameworks.
    Without a doubt you won’t find people like these around every corner. To top it all off they are usually significantly more expensive than developers, who are not that highly specialized in the development or customizing for a very specific product. And finally there is the cost for the products themselves.
    On the other hand, chances of finding some really brilliant developers with excellent skills in the fields mentioned below are much better. Even after most of them have left the team after the system has gone into production, it should not be too difficult to find new developers when it’s necessary. If there hasn’t been a knowledge drain and the system is well documented, the integration of new team members should be quite smooth.

    Skills needed or appropriate for development of the entire ETL process (strongly IMHO):

    • SQL, T-SQL, PL/SQL for Stored Procedures
    • C# for SQL Server CLR Stored Procedures
    • Java for Oracle Java Stored Procedures
    • bash or power shell for shell script programming
    • Shell tools like grep, awk, sed, etc.
    • ftp script programming
    • cron job or scheduler programming

  • Data sources are responsible for their own data quality.

    In literally all of my data warehouse projects and without exception we have discovered data quality issues in the source systems. Fortunately, this usually happens at a quite early stage of the project. The start of a data warehouse project can eventually even take credit for the discovery of serious flaws in source and/or legacy systems.
    The data warehouse process must not be the sweeper that eliminates the slip-ups from earlier stages of the data flow. We all know that nothing lives longer than a quick workaround. There is no doubt about the necessity to provide these workarounds to avoid showstoppers; especially at an early stage of the production phase. But, by all means try to get rid of them as quickly as possible. They turn out to be a heavy burden in the long run.
    Sometimes the need to promote data quality on an enterprise-wide level might occur. It could be necessary to escalate those issues through the corporate or department hierarchy; sometimes even up to the CIO and/or the CTO.

The subject of next post will be the remaining points of the top four guidelines list.


The ETL process – Part 1- The Analysis

Diagram of a part of an ETL process in a typical graphical ETL tool

According to the principles I have shortly described in the first post of this blog (http://dwhblog.org/what-to-expect-from-this-data-warehouse-blog/), the very first step is the analysis.

I have always been quite annoyed by graphical ETL tools. This is surely a matter of personal taste, but I think that most of those systems are quite redundant and not very efficient in modeling the ETL process. Plus, many of them are quite expensive and I hardly ever saw the real countervalue of their effect on the project budget.

In one of my biggest DWH projects we have evaluated a handful of the leading ETL tools on the market. None of them was really able to cope with all of our requirements. So we finally came to the conclusion to develop an ETL system by ourselves. The overall cost turned out to be significantly lower than the yearly price of even the least expensive commercial tool.

After thorough analysis and development we had an ETL system that consisted mainly of shell scripts, human-readable files for configuration and metadata, and stored procedures. Of course we also established a consistent system for the documentation of the process and its parameters.

All the findings from this analysis (and more) have been the basis for a data warehouse framework I have developed. This framework is no longer using shell scripts, but is completely hosted in the database system.

Main results of the analysis

The following paragraph describes the main findings of the analysis from a bird’s eye view. However, not all of the aspects are mentioned here. There will be special posts dedicated to describing everything in greater detail.

Data delivery structure:

  • Data sources deliver data in data delivery groups.
  • Data delivery groups encapsulate one or more interfaces.
  • Interfaces consist of data fields.
  • Data fields are characterized by data type, length, domain, etc..

Staging area:

  • Data from interfaces flow into the staging area.
  • The staging area is divided into different staging levels.
  • The data flows through the staging levels according to a well-defined staging priority.
  • Because staging can be very time-consuming, the ETL process should be able to dynamically minimize staging depending on the actual data input.
  • Master data is part of the staging area.

Multidimensional data:

  • Part of the data of the staging area flow into the multidimensional data area.
  • Multidimensional data is made up of dimensions and facts.
  • Combinations of dimensions and facts form data cubes.
  • Different cubes can share dimensions, but use different hierarchies or consolidation paths.
  • Multidimensional data consists of different levels of consolidation.Because consolidation can be very time-consuming, the ETL process should be able to dynamically minimize consolidations depending on the actual data input.

Process data:

  • Process data consists of log data, status data, protocol data, etc..

User access and security data:

  • User access and security data define the association of users with certain slices of the data cubes.
  • They are also important for data-driven subscriptions for automatic deployment of reports.

Map and geospatial data:

  • Even though map and geospatial data are part of the interfaces and staging area, they often need some special processing.

Data marts:

  • Subject-oriented data marts are data sources for associated BI systems.
  • Data marts are built from subsets of multidimensional data, master data, process data, user access and security data, and map & geospatial data.
  • The structures of the data cubes in a data mart don’t necessarily match exactly the structures of the cubes in the DWH.
  • Because data mart maintenance can be very time-consuming, the ETL process should be able to dynamically minimize data mart maintenance depending on the actual data input.
  • The downtime resulting from data mart maintenance must be as short as possible.


After this overview of the main results of the analysis, the subject of the next post will be “guidelines and standards”.