Archive

Archive for October, 2012

Modeling Several Boolean Attributes Using a Junk Dimension

October 26, 2012 1 comment

One of the discussions that I recently had involved modeling a business process associated with several Boolean attributes. The current model addressed each of these attributes as a separate dimension. While there is nothing necessarily wrong with this approach, when reviewing a model, I always try to think the way Ralph Kimball and colleagues at the Kimball Group would think. If you’re not modeling the way that they would model that process, you probably should re-evaluate and redesign.

So how would the Kimball Group model this scenario? The answer is most likely with a Junk dimension since a Junk dimension consolidates several low cardinality attributes into a single useful dimension. Since all of these attributes have only two possible values, either Yes or No, seems like a very likely candidate for a Junk dimension!

There’s several advantages to modeling this scenario with a Junk dimension. Below are three of them (and you’re always free to disagree)

  1. Fact table contains a single field rather than a whole bunch of columns to refer to these flags. More on that later.
  2. Greater flexibility on defining the labels of each attribute, especially when utilized with a view layer abstraction between the source and the user presentation layer, and
  3. Easier to enhance the cube should additional Boolean attributes be identified later in the project lifecycle.

Defining the Junk Dimension

While in most cases, the key for a dimension is a meaningless number and assigned by an Identity or Sequence, in the case of a Junk dimension consolidating several Boolean attributes a better approach is to calculate this key. For instance, if I’ve consolidated four Boolean attributes, I can calculate this key by multiplying the value of the Boolean by a corresponding power of 2, either 20, 21, 22, or 23 depending on the identity of the attribute. A T-SQL statement that defines a table consolidating four Boolean attributes is given below.

WITH cteFlags AS (
    SELECT    0 AS Value
    UNION ALL
    SELECT    1
)
SELECT    
    (
        Power(2, 3.0) * Flag3.Value 
     +  Power(2, 2.0) * Flag2.Value 
     +  Power(2, 1.0) * Flag1.Value 
     +  Flag0.Value
    ) AS ID, 
    Flag0.Value,
    Flag1.Value,
    Flag2.Value,
    Flag3.Value
FROM 
    cteFlags Flag0
CROSS JOIN 
    cteFlags Flag1
CROSS JOIN 
    cteFlags Flag2
CROSS JOIN 
    cteFlags Flag3
ORDER BY
    ID

The ETL to load fact tables is now slightly more complex. Rather than each dimension being represented as separate columns, a single column is used. But since the surrogate key for the Junk dimension is not incrementally assigned but coded based on business domain values, this value can be calculated rather than being looked up based on the individual column values.

Greater Flexibility

One of the problems with inclusion of each Boolean attribute as a separate dimension is that when modeled as a role-playing dimension in a multidimensional database, you’ll end up with a whole bunch of Yes and No or True and False entities. When querying this model, it’s not too difficult to get completely lost in the meaning of each of those values. Using a Junk dimension, one can easily manipulate the labels or names applied to each of those entities. So the True flag applied to the Base Fee Schedule Overridden? attribute can be changed from Yes to Base Fee Schedule Not Used or some other description more in line with the business domain.

Easier to Enhance

Another advantage is the ease in which additional Boolean attributes can be added to an existing data mart. Most of the values for these attributes are likely to be False for the vast majority of cases as these attributes are generally used to identify exceptions. Therefore, by simply adding another column to your Junk dimension with False as its value you’ve already accurately included the additional attribute for the vast majority of fact records without even touching the fact table. How Agile is that?

To complete inclusion of the exceptions, there are two steps that need to be executed. First, new rows need to be added to the Junk dimension. These can be simply added by appending the true values for the additional attribute to the existing Junk dimension records. To illustrate using the previous case, the query defining new rows to the Junk dimension created above would simply be the following:

WITH cteFlags AS (
    SELECT    0 AS Value
    UNION ALL
    SELECT    1
)
SELECT    
    (
        Power(2, 4.0) * New_Attribute.Value
     +  Power(2, 3.0) * Flag3.Value 
     +  Power(2, 2.0) * Flag2.Value 
     +  Power(2, 1.0) * Flag1.Value 
     +  Flag0.Value
    ) AS ID, 
    Flag0.Value,
    Flag1.Value,
    Flag2.Value,
    Flag3.Value,
    Flag4.Value
FROM 
    cteFlags Flag0
CROSS JOIN 
    cteFlags Flag1
CROSS JOIN 
    cteFlags Flag2
CROSS JOIN 
    cteFlags Flag3
CROSS JOIN 
    cteFlags New_Attribute
WHERE    
    New_Attribute.Value = 1
ORDER BY
    ID

With the new dimension records added, the second step is to update the fact records identified as “exceptions”. Again, as opposed to a complete reloading of the fact table to include the additional attribute, the Junk dimension key value for these exceptions can be updated accordingly. One way to complete this task is to load a temporary staging table that has the identifying fact fields for all records with a true New_Attribute value. By joining to this table and applying the following update conditions to the corresponding records,  JunkDimensionKey = JunkDimensionKey + 24, you’ve now successfully included a new Boolean attribute into the model. If there are many records where the New_Attribute value is true and your fact table is very large, your best approach would likely be to use a technique that I described in a previous blog post nearly two years ago.

Categories: Data Warehousing

SSRS–Creating the Stephen Few Data Visualization

October 23, 2012 Leave a comment

In my last blog post, I described an approach where the DateTool dimension of Marco Russo was extended to include an Aggregate Month attribute. When referenced in a query in combination with a member of the Aggregation attribute in an MDX query, the Aggregate Month attribute produces the time periods that compose that aggregation. Enhancing your SSAS multidimensional cube in this manner provides functionality that is similar to the Time Intelligence functionality available in PerformancePoint Services, sort of a way of providing dynamic date specific named sets. I feel they’re advantageous for several reasons but I’m a bit on the biased side. For one thing, these “dynamic time-based sets” such as the last twelve months prior to the selected month will be available in any front-end and not just PPS. And as referenced in this SSAS forum post, this approach was successfully used to provide a “Date Sets Alternative” in Tableau.

The original purpose of the enhancement was to facilitate creation of the Stephen Few sales performance dashboard data visualization from Information Dashboard Design. The goal was to make creation of the visualization simple enough that it could be created in ReportBuilder. The end result using the modified Adventure Works source will end up looking something like the following.

image

In the visualization above, the Year-Over-Year YTD Reseller Sales Growth is shown by Country. For this demonstration, a targeted YTD YoY growth of 15% is expected. Any YoY growth rate that is 0.85 or below is considered poor while any value between 85% and 115% was considered Satisfactory as indicated by the linear ranges of the bullet graph shown in the fourth column of the visualization.

To create this visualization, the following steps need to be executed.

  1. Create a single dataset to source all elements,
  2. Create a Tablix control bound to the dataset with a Grouping on Country,
  3. Add Sparkline, Bullet Graph, and Indicator controls, and
  4. Add the legend and scale adornments to the visualization.

Creating the Dataset

Having made the modifications to the Adventure Works cube described in the previous blog post, creating a single dataset to source all data visualization components is simplified greatly. For each Country shown in the diagram above, the dataset produced will need to include the following data elements.

  1. Reseller Sales Amount by month for the Past 12 Months,
  2. YTD Reseller Sales (horizontal bar in bullet graph above, fifth column, and Indicator value),
  3. Prior Year YTD Sales to calculate target value (vertical line in bullet graph above), and
  4. Columns that represent the  boundary values for the linear ranges defining the bullet graph.

The challenge in creating such a dataset is that the date context and date aggregation are at different grains within the same query. To create the dataset, first define a data source to the Adventure Works cube. Next, create a new dataset and in the Query Designer window, drag the [Date].[Month] and [Date Utility].[Aggregation] attribute hierarchies to the Filter Pane. Check the Parameters checkbox for the [Date].[Month] attribute so that consumers can dynamically change the filter context and specify [Last 12 Months] as the filter for the [Date Utility].[Aggregation] filter. [I also added a Filter to limit Countries to only Canada, France, United Kingdom, and the United States only because the Adventure Works cube is pretty sparse] The initial dataset was defined by dragging the [Geography].[Country], [Date Utility].[Aggregate Month], and the[Measures].[Reseller Sales Amount] to the Data Pane. Now you have a query produces the Reseller Sales Amount by Country for the Last 12 Months from whatever selected month the User selects to run the report. How easy was that?

To complete the query, the Actual, Goal, and boundary values for each of the linear ranges need to be added to the query. Since the Aggregation context differs from the [Last 12 Months] created as a filter, calculated members will be created to overwrite this context. These calculated members are created in the Calculated Members Pane of the designer and have the names and definitions as given in the following table.

hdmh4ae3

Two additional points should be specified that were left out of the Definition column above. As the Value and Goal measures overwrite the current context of the [Date Utility].[Aggregate Month] attribute, these metrics were wrapped in an IIf function so that every member of the [Date Utility].[Aggregate Month] hierarchy was returned. (i.e. IIf(([Measures].[Reseller Sales Amount], [Date Utility].[Aggregate Month].CurrentMember) = 0, NULL, …..) ) Additionally, since the constants as defined will also return every member of the [Date Utility].[Aggregate Month] hierarchy, these constants were multiplied by [Measures].[Reseller Sales Growth Value]/[Measures].[Reseller Sales Growth Value] so that NULLs were returned for months outside of the Last 12 Months range.

Defining the Tablix Control

Adding the Tablix control is the simplest step in the process. Insert a Table control to the report and set the DatasetName property equal to the Dataset created above. Add two additional columns to the table, create a grouping on Country (and delete the row matrix column created without deleting the group), and delete the Details section from the report. Bind the third column to the Country field and define the fifth column to the Max([Reseller Sales Growth Value]). The last little bit of formatting involved setting the BorderStyle to None for all textboxes except for the bottom border of the column headers. That’s it. The framework for the data visualization is now defined and now the more difficult process of defining the data visualizations begins.

Creating the Visualizations

Insert a Sparkline in the first column of the table. Set the Category Groups to Aggregate Month and the Values to the Reseller Sales Amount. That’s it for that.

Defining the Indicator is not much more difficult. The only thing to watch out for is that the expressions for Values and States should use the Max, Min, or Avg aggregations of Reseller Sales Growth Value and Reseller Sales Growth Goal rather than the default value of Sum. Once completed, four out of five columns are now there. The only one left is the bullet graph.

Rather than detail how the bullet graph was formatted, most of the steps that I followed are described in detail in Tim Kent’s post from 2009 without all the jiggery-pokery that was required in previous versions of SSRS. The linear ranges were defined using the Max value of the dataset columns explicitly created for that purpose. The linear ranges were defined to have a start and end width of 75% while the solid fill colors for the Poor, Satisfactory, and Good ranges were defined to be Gray, Silver, and WhiteSmoke respectively, providing the contrast necessary to distinguish in the figure shown previously. That’s it for the visualizations.

Completing the Data Visualization

To complete the process, legends and scales were added to the Tablix. The legend is the simpler task. The rightmost report header columns were merged and a rectangle control was inserted to allow for free-entry into that cell. Three additional small rectangles and three textboxes were added to identify the Poor, Satisfactory, and Good linear ranges. The BackGroundColor property for each of the rectangles was set to the appropriate color as identified above.

The scale was added by inserting a Country group summary row and adding another bullet graph to the fourth column of that row. The width properties for all linear ranges and pointers was set to zero so that these elements of the Gauge control were effectively not displayed. Trial and error techniques were used to determine the optimal Scale properties. The most likely change that will be necessary is to increase the Font size of the Scale to an absurdly large amount. For the visualization shown above, the Font size ended up being 48pt.

The completed report is available on my Skydrive account.

Summary

The greatest challenge in replicating the Stephen Few data visualization with SSRS is the creation of a single dataset in which all visualizations can be sourced. By enhancing the DateTool dimension of Marco Russo to include an additional Aggregate Month attribute, the enhanced Date Utility dimension discussed in the previous blog post greatly simplifies the necessary steps.

Categories: SSRS

Enhancing the DateTool Dimension

October 8, 2012 1 comment

Technorati Tags: ,

I had this issue in the past and addressed it in a very inelegant manner involving a many-to-many relationship that I won’t reproduce here. For all you poor souls that inherited that approach, I sincerely apologize. I was young (not really) and inexperienced (also, not really) and knew when I implemented it, it was not the right way (definitely true). However, the users liked it, needed it, and once deployed, it was really difficult to retract.

The scenario that I’m addressing here has its roots in that same place. Say that I want to reproduce the Sales Performance Dashboard data visualization that Stephen Few presents in the last chapter of his book, Information Dashboard Design, within a SQL Server Reporting Services Tablix control. An example of this data visualization can be found here. The visualization includes a Sparkline, an Indicator, and a Bullet Graph for various different financial performance metrics. The difficulty in reproducing this visualization in SSRS is that all data used to generate these visualizations must be included in the same dataset. While the indicator and bullet graph are populated by Year-to-Date values, the Sparkline is sourced by current month values over the last twelve months. Therefore, two different date aggregation contexts, Year-to-Date and Last 12 Months, and two different date contexts, the selected month for the Bullet Graph and Indicator, and the component months for the Sparkline need to included within the dataset sourcing the visualization.

To produce a dataset that could be used to generation this data visualization, the DateTool dimension described by Marco Russo in this blog post was enhanced. The dimension will be referred to as the [Date Utility] dimension in the remainder of this blog post. These enhancements included adding new utility attributes to represent the component members involved in a date aggregation. To illustrate using the Adventure Works cube, if [Date].[Calendar].[May, 2008] and [Date Utility].[Aggregation].[Last 12 Months] are specified in the slicer, including the newly added [Date Utility].[Aggregate Month] attribute to the row axis would generate a set equivalent to [Date].[Calendar].[Jun, 2007] : [Date].[Calendar].[May, 2008]. This set represents the time periods that compose the ([Date].[Calendar].[May, 2008], [Date Utility].[Aggregation].[Last 12 Months]) calculated cell with the current cube context.

The first modification to Russo’s Date Utility dimension was to simply Cross Join the [DateTool].[Aggregation] and [DateTool].[Comparison] views with another view that included each possible [Date].[Calendar].[Month] member. (For this case, detail at the granularity of the Day level was not needed.) Even though the approach involves cross joining three views, each has a relatively small number of possibilities. The definition of the [DateTool].[CalendarMonths] and [DateTool].[DateTool] views are given below.

CREATE VIEW [DateTool].[CalendarMonths] AS
	SELECT	
		dt.MonthKey,
		dt.MonthDesc,
		dt.MonthStartDate,
		dt.QuarterKey,
		dt.QuarterDesc,
		dt.QuarterStartDate,
		dt.SemesterKey,
		dt.SemesterDesc,
		dt.SemesterStartDate,
		dt.Year
	FROM	DateTool.DimDate dt
	WHERE	dt.DateValue		= dt.MonthStartDate 
GO
CREATE VIEW [DateTool].[DateTool] AS
	SELECT	
		a.ID_Aggregation, 
		s.ID_Comparison,
		m.MonthKey, 
		(
			CAST( ID_Comparison AS VARCHAR ) + ' - ' 
			+ CAST( ID_Aggregation AS VARCHAR ) 
			+ ' - ' + CAST(m.MonthKey AS varchar)  
		) AS Description
	FROM	DateTool.DateAggregation a
	CROSS JOIN DateTool.DateComparison s
	CROSS JOIN DateTool.CalendarMonths m
GO

The second modification to Russo’s Date Utility dimension was to add an Aggregate Year –> Aggregate Quarter –> Aggregate Month natural hierarchy to the [Date Utility] dimension. A Business Information Model of the Date Utility dimension is shown below

image

The last modification is to define MDX script calculations to return cell values for the Aggregate Month, Aggregate Quarter, and Aggregate Year attributes. These additional calculations are placed between the script defining the Aggregation members and the script defining the Comparison members. This placement allows an aggregated comparison to also be decomposed into its constituent time periods.

The entire solution (at least with the Aggregate Month calculations defined) is available from my Skydrive account. To describe the approach, the definition of the subcube defined by the [Date Utility].[Aggregation].[Last 12 Months] member and [Date Utility].[Aggregate Month].[Aggregate Month] level will be described as shown in the code below. Only when a single member of the [Date].[By Calendar].[Month] dimension is included in the current context will a non-NULL value be returned. If that check passes, the next check is to determine if the current member of the [Date Utility].[Aggregate Month] attribute is within the last 12 months of [Date].[By Calendar].CurrentMember using an Intersect function call. If so, a tuple is returned that assigns the equivalent of LinkMember( [Date Utility].[Aggregate Month].CurrentMember, [Date].[Month] ); otherwise, a NULL value is returned.

SCOPE( 
    [Date Utility].[Aggregation].[Last 12 Months], 
    [Date Utility].[Aggregate Month].[Aggregate Month] 
);
    THIS = IIf(
        COUNT(EXISTING { [Date].[By Calendar].[Month].Members } ) = 1,
        IIf(
            Intersect(
                { [Date Utility].[Aggregate Month].CurrentMember },
                { 
                    LinkMember( 
                        [Date].[By Calendar].CurrentMember, 
                        [Date Utility].[Aggregate Month] 
                    ).Lag(11)
                    :
                    LinkMember( 
                        [Date].[By Calendar].CurrentMember, 
                        [Date Utility].[Aggregate Month] 
                    )
                }
            ).Count > 0,
            (
                LinkMember( 
                    [Date Utility].[Aggregate Month].CurrentMember, 
                    [Date].[Month] 
                ),
                [Date Utility].[Aggregation].DefaultMember,
                [Date Utility].[Aggregate Month].DefaultMember
            ),
            NULL
        ),
        NULL
    );
END SCOPE;

Lastly, to demonstrate its application, consider the following MDX statement which includes a single member of the [Date].[By Calendar] hierarchy and a single member of the [Date Utility].[Aggregation] attribute in the WHERE clause.

SELECT    
    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    NON EMPTY
    {
        [Date Utility].[Aggregate Month].[Aggregate Month].Members 
    } ON ROWS
FROM    
    [Adventure Works]
WHERE
    (
        [Date].[By Calendar].[May, 2008],
        [Date Utility].[Aggregation].[Last 12 Months]
    )    

Including the [Date Utility].[Aggregate Month] attribute on the ROW axis returns the [Date Utility].[Aggregation] constituent members as shown below

Aggregate Month Internet Sales Amount
Jun, 2007 $554,799.23
Jul, 2007 $886,668.84
Aug, 2007 $847,413.51
Sep, 2007 $1,010,258.13
Oct, 2007 $1,080,449.58
Nov, 2007 $1,196,981.11
Dec, 2007 $1,731,787.77
Jan, 2008 $1,340,244.95
Feb, 2008 $1,462,479.83
Mar, 2008 $1,480,905.18
Apr, 2008 $1,608,750.53
May, 2008 $1,878,317.51

While this query doesn’t really illustrate why you would include an Aggregate Month enhancement, in a future post, use of the enhanced Date Utility dimension will be used to show how the Stephen Few data visualization including an Indicator, a Bullet Graph, and a Sparkline can be generated using Report Builder.

Categories: SSAS