TFS Warehouse Adapter – How to link Dimensions to new Fact Entries

The Background

I’ve been working on a project to create a Timesheet add-in for TFS.  Other members of my team have done the easy work and created all of the database, UI and Web Service stuff.  I was tasked with creating the Warehouse Adapter to move the timesheet entries from the operational store to the TFS Warehouse.

I decided that I would create a new Fact table to hold the Timesheet Entries.  This Fact would link to the Current Work Item Fact so that we can tell which time entries go with a given Work Item.  We also have data on the Team Project, the Date that the time entry is for and the Person that did the work.  These entries live in Dimension tables, specifically the “Team Project”, “Date” and “Person” Dimensions.

When I created my Warehouse Adapter I had to write some code in the MakeSchemaChanges() method to create the “Timesheet Entries” Fact table if it didn’t already exist.  During this process I specified that the Timesheet Entries Fact table uses the Person, Date and Team Project Dimensions.  When the Timesheet Entries Fact table was created it now had 2 additional Int fields for Person and Team Project and one additional DateTime field for Date.  These are the foreign keys to the Dimension tables shown in yellow hilight on the graphic below.

TimesheetFact

Ok, so this all makes structural sense to me.

When the Warehouse Service runs, it call my Adapter 3 times.  The first is a call to the adapter’s Initialize() method to give it an opportunity to get a reference to the Warehouse’s DataStore object.  The second is the call to the adapter’s MakeSchemaChanges() method which we noted above gives us an opportunity to update the Warehouse schema with our new Facts, FactLinks and Dimensions (if applicable).  The third is a call to the MakeDataChanges() method which is where we actually grab data from the transactional store and push it into the warehouse store.

The Problem

In the MakeDataChanges() method I grab my timesheet data and load up my shiny, new Timesheet Entries Fact table using this code:

 1: private static FactEntry CreateTimesheetFactEntry(IDataStore dataStore, int teamProjectId, int id, double hours, string entryUserName, DateTime entryDate)
 2:  {
 3:  FactEntry fe = dataStore.CreateFactEntry(FactName);
 4:  fe.TrackingId = id.ToString(CultureInfo.InvariantCulture);
 5:  fe["Logical Tracking Id"] = id;
 6:  fe["Hours"] = hours;
 7:  fe["Date"] = entryDate;
 8:  fe["Team Project"] = teamProjectId;
 9:  fe["Person"] = entryUserName;
 10:  
 11:  return fe;
 12:  }

I need to pass the Id for the Team Project on Line 8 and the Id for the Person on Line 9 into my FactEntry to create the correct links to the Person and Team Project Dimension tables.  So how do I get these Ids? 

The Team Project Id is easy to get because there is a class in the Microsoft.TeamFoundation namespace called LinkingUtilities that supports retrieving links.  It has a way to use the TeamProject name to get its URI and then the ToolSpecificId.  The Person Id isn’t so easy. 

I would have like to follow the same pattern that I used for retrieving a FactEntry, just call the dataStore.GetFactEntry() method but there is no dataStore.GetDimensionEntry() method in the API.  I know that I could go straight against the TFSWarehouse database using SQL code but I’m trying to keep configuration to a minimum and also stay within the API until I can no longer do so.  This pretty much brought this part of my project to a stand-still until I could figure out how Microsoft did it with their Adapters.

The Research

Next step…post a question on the MSDN Team Foundation Server – Reporting & Warehouse forum.  Within a day I had a response from Dave Brokaw at Microsoft.  This was not the answer I was expecting!  Dave said:

“When inserting into the database, the SaveDimensionMember API automatically converts from a “business” or “op store” key for a given DimensionUse, such as a string SID for the stock Person dimension, to the actual integer or “surrogate” key used in the warehouse database.  Thus, you should be in good shape if you can set the DimensionUses’ value in the FactEntry to the appropriate SID.

In the data warehouse world the Int fields are often known as “surrogate keys”, differentiated from the “business keys” (or in some cases more realistically “op store keys”) that are used in the source databases.  They generally provide better performance via faster joins.
Having the API automatically find and link up the related dimensions in this way also helps performance, since fewer round trips to the database are required.  Additionally, it allows related dimension members to be created transactionally when a fact arrives that references them, even if they haven’t been populated yet by the adapter that normally creates them.  For example, the Work Item Tracking adapter might run before the CSS adapter, and save a Work Item fact that references a Person that the CSS adapter hasn’t pulled in yet.  These are known in the warehousing world as “early-arriving facts”.  Prepopulating dimension members as needed ensures that the cube processing does not see “dangling” references in such facts.”             – Dave Brokaw

So I don’t have to figure out the Id for the used Dimension fields, all I have to do is pass the value of the KeyField and the SaveDimensionMember API will automatically figure out the Id and stick it into the field on the FactEntry.

The Solution

What does this mean in my case?  That I need to figure out what the KeyField is for each of my Dimensions.  [I’ve colored them in Red on the Database Diagram above.]

I start by opening the warehouseschema.xml file that can be found on the TFS App Tier in C:Program FilesMicrosoft Visual Studio 2008 Team Foundation ServerTools.  This XML file contains the schema of the TFS Warehouse.  If you scroll down the file you get to the Dimensions section.  In here I’m looking for the Person, Date and Team Project Dimension entries.

The Team Project Dimension is defined like this:

 1: <Dimension>
 2:  <Name>Team ProjectName>
 3:  <FriendlyName>Team ProjectFriendlyName>
 4:  <Fields>
 5:  <Field>
 6:  <Name>Project UriName>
 7:  <FriendlyName>Project UriFriendlyName>
 8:  <Type>NVARCHARType>
 9:  <Length>256Length>
 10:  <Visible>trueVisible>
 11:  <RelationalOnly>trueRelationalOnly>
 12:  <CalculatedMembers />
 13:  Field>
 14:  <Field>
 15:  <Name>Team ProjectName>
 16:  <FriendlyName>Team ProjectFriendlyName>
 17:  <Type>NVARCHARType>
 18:  <Length>256Length>
 19:  <Visible>trueVisible>
 20:  <RelationalOnly>falseRelationalOnly>
 21:  <CalculatedMembers />
 22:  Field>
 23:  <Field>
 24:  <Name>Is DeletedName>
 25:  <FriendlyName>Is DeletedFriendlyName>
 26:  <Type>BITType>
 27:  <Length>0Length>
 28:  <Visible>trueVisible>
 29:  <RelationalOnly>trueRelationalOnly>
 30:  <CalculatedMembers />
 31:  Field>
 32:  Fields>
 33:  <KeyFieldName>Project UriKeyFieldName>
 34:  <Levels />
 35: Dimension>

Notice on Line 33 that the KeyFieldName value is Project Uri.  This means that we can pass the Project Uri value in our FactEntry on Line 8 of CreateTimesheetFactEntry().  When we save the FactEntry this will be resolved to the actual Id of the Dimension and that Id value will be stored in the Fact.

Now we need to look at the Person Dimension.  It is defined like this:

 1: <Dimension>
 2:  <Name>PersonName>
 3:  <FriendlyName>PersonFriendlyName>
 4:  <Fields>
 5:  <Field>
 6:  <Name>PersonName>
 7:  <FriendlyName>PersonFriendlyName>
 8:  <Type>NVARCHARType>
 9:  <Length>256Length>
 10:  <Visible>trueVisible>
 11:  <RelationalOnly>falseRelationalOnly>
 12:  <CalculatedMembers />
 13:  Field>
 14:  <Field>
 15:  <Name>SIDName>
 16:  <FriendlyName>SIDFriendlyName>
 17:  <Type>NVARCHARType>
 18:  <Length>256Length>
 19:  <Visible>trueVisible>
 20:  <RelationalOnly>trueRelationalOnly>
 21:  <CalculatedMembers />
 22:  Field>
 23:  ... Additional Fields Removed ...
 24:  Fields>
 25:  <KeyFieldName>SIDKeyFieldName>
 26:  <Levels />
 27: Dimension>

The Person Dimension’s KeyFieldName is found on Line 25.  It shows that the SID is the value that we can use in our Fact.  So we can put the user’s SID value in our FactEntry on Line 9 of CreateTimesheetFactEntry() and as above, the warehouse will look-up the SID on Save and replace it with the Id of that SID’s row in the Person Dimension table.

Lastly, we need to look at the Date Dimension.  It is defined as:

 1: <Dimension>
 2:  <Name>DateName>
 3:  <FriendlyName>DateFriendlyName>
 4:  <Fields>
 5:  <Field>
 6:  <Name>UTCDateTimeName>
 7:  <FriendlyName>UTCDateTimeFriendlyName>
 8:  <Type>DATETIMEType>
 9:  <Length>0Length>
 10:  <Visible>trueVisible>
 11:  <RelationalOnly>trueRelationalOnly>
 12:  <CalculatedMembers />
 13:  Field>
 14:  ... Additional Fields Removed ...
 15:  Fields>
 16:  <KeyFieldName>UTCDateTimeKeyFieldName>
 17:  <Levels>
 18:  ... Levels Removed ...
 19:  Levels>
 20: Dimension>

As you can see on Line 16, the KeyFieldName is UTCDateTime.  This makes my life much easier because the EntryDate of my Timesheet Entry is already defined as a DateTime type.  All I have to do is pass the EntryDate.ToUniversalTime value to the Fact on Line 7 of CreateTimesheetFactEntry().

NOTE:  The Date dimension’s UTCDateTime field always has a time of 12:00:00 am (midnight) so make sure that the date you pass as the Date Dimension KeyField has a time component of midnight. (See below)

Select Top 10 __Id, UTCDateTime From Date

__Id UTCDateTime
----------------------- -----------------------
2008-04-09 00:00:00.000 2008-04-09 00:00:00.000
2008-04-10 00:00:00.000 2008-04-10 00:00:00.000
2008-04-11 00:00:00.000 2008-04-11 00:00:00.000
2008-04-12 00:00:00.000 2008-04-12 00:00:00.000
2008-04-13 00:00:00.000 2008-04-13 00:00:00.000
2008-04-14 00:00:00.000 2008-04-14 00:00:00.000
2008-04-15 00:00:00.000 2008-04-15 00:00:00.000
2008-04-16 00:00:00.000 2008-04-16 00:00:00.000
2008-04-17 00:00:00.000 2008-04-17 00:00:00.000
2008-04-18 00:00:00.000 2008-04-18 00:00:00.000

(10 row(s) affected)

So with all this, our code changes to something like this:

 1: private static FactEntry CreateTimesheetFactEntry(IDataStore dataStore, string teamProjectUri, int id, double hours, string entryUserSID, DateTime entryDate)
 2:  {
 3:  FactEntry fe = dataStore.CreateFactEntry(FactName);
 4:  fe.TrackingId = id.ToString(CultureInfo.InvariantCulture);
 5:  fe["Logical Tracking Id"] = id;
 6:  fe["Hours"] = hours;
 7:  fe["Date"] = entryDate.ToUniversalTime();
 8:  fe["Team Project"] = teamProjectUri;
 9:  fe["Person"] = entryUserSID;
 10:  
 11:  return fe;
 12:  }

Conclusion

To make a long story; if you have a Fact table that uses Dimensions you don’t have to figure out the Ids that need to be stored in the Fact entry.  All you have to do is give the FactEntry object a KeyField value for that particular Dimension and it will be replaced by the correct Id value when the FactEntry is saved.  This is possible because the SaveDimensionMember API will do a lookup of the passed value on the KeyField to get the matching Dimension row.

You can find out what the KeyField is for a given Dimension table by looking at the Dimension definition in the warehouseschema.xml file and noting the KeyFieldName entry’s value.

You can get the Team Project Uri from the Team Project name by using the LinkingUtilities class in the Microsoft.TeamFoundation namespace.

Thanks

Thanks go out to Dave Brokaw and all the folks on the Team System team at Microsoft.  I have always found them to be quick in their responses and immensely helpful. 

If you ever have a question about Team System or TFS, I highly recommend posting your question to the MSDN Team System forums.  There are folks inside the Team System team, MVPs and regular users that are always looking at new questions and are quick to answer or help you dig deeper to find the right question for your situation.