GoodData | Developer Network

Too Many Date Dimensions? Try Events!

by ZD (@zsvoboda)

Time can have many different meanings in analytical projects. For example an opportunity can be associated with the time when it was created and the time when a sales guy closed it. I saw quite a few models with way more times associated with a single entity (e.g. helpdesk models). Here is a typical example of a simple sales model that uses created and closed date dimensions.

Sales Data Model

However end users have sometimes difficult time to figure out why there are two date dimensions and which one they should use in their reports. “There is only one time” they think. There is one more problem except the end user usability.

You can easily create a report Revenue by Time that is sliced by the Closed Date.

Revenue by Time

You can also create a report Pipeline Generation by Time that is sliced by the Created Date.

Pipeline by Time

But as the first report uses the closed and the second report the created time dimensions, you can’t put both metrics to a single chart:

Pipeline  Revenue by Time

You need to model your analytical project differently to achieve the report above. We are getting to something what we call event data models. Here you go:

Sales Data Model (Event)

We need to create a special Event dataset that contains all events related to a specific opportunity. In our case there are two events per each opportunity: created & closed . The Event dataset then has twice as much records as the original Opportunity dataset. We obviously need to adjust the metrics to not return all amounts multiplied by two. Here are few metrics definitions:

  • Pipeline [Sum]: SELECT SUM(Event Amount) WHERE Event Type = created
  • Revenue [Sum]: SELECT SUM(Event Amount) WHERE Event Type = closed AND Status IN (Won)
  • Lost Revenue [Sum]: SELECT SUM(Event Amount) WHERE Event Type = closed AND Status IN (Lost)

These metrics are relatively simple and self describing. The metrics that compute that last opportunity value are a bit more complicated. We need to always find the ‘latest’ available event for each opportunity in the Event dataset. This is why we have the Event ID FACT in the model. This fact is a unique ID of each event. The later an event occurred, the higher the Event ID is. So the maximum Event ID for a specific opportunity identifies the latest event for that opportunity. So we need to first compute this MAX ID.

  • Last Opportunity Event ID: SELECT MAX(Event ID) BY Opportunity ID ALL IN ALL OTHER DIMENSIONS

then the current (latest) opportunity amount, revenue, and pipeline are:

  • Current Amount [Sum]: SELECT SUM(Event Amount) WHERE Event ID = Last Opportunity Event ID
  • Current Revenue [Sum]: SELECT Current Amount [Sum] WHERE Status = Won
  • Current Pipeline [Sum]: SELECT Current Amount [Sum] WHERE Status = Open
  • Current Lost Revenue [Sum]: SELECT Current Amount [Sum] WHERE Status = Lost

Think about this data model design when you are creating the 6th data dimension in your project. ;-)

blog comments powered by Disqus