Modeling Several Boolean Attributes Using a Junk Dimension

October 26, 2012 1 comment

One of the discussions that I recently had involved modeling a business process associated with several Boolean attributes. The current model addressed each of these attributes as a separate dimension. While there is nothing necessarily wrong with this approach, when reviewing a model, I always try to think the way Ralph Kimball and colleagues at the Kimball Group would think. If you’re not modeling the way that they would model that process, you probably should re-evaluate and redesign.

So how would the Kimball Group model this scenario? The answer is most likely with a Junk dimension since a Junk dimension consolidates several low cardinality attributes into a single useful dimension. Since all of these attributes have only two possible values, either Yes or No, seems like a very likely candidate for a Junk dimension!

There’s several advantages to modeling this scenario with a Junk dimension. Below are three of them (and you’re always free to disagree)

  1. Fact table contains a single field rather than a whole bunch of columns to refer to these flags. More on that later.
  2. Greater flexibility on defining the labels of each attribute, especially when utilized with a view layer abstraction between the source and the user presentation layer, and
  3. Easier to enhance the cube should additional Boolean attributes be identified later in the project lifecycle.

Defining the Junk Dimension

While in most cases, the key for a dimension is a meaningless number and assigned by an Identity or Sequence, in the case of a Junk dimension consolidating several Boolean attributes a better approach is to calculate this key. For instance, if I’ve consolidated four Boolean attributes, I can calculate this key by multiplying the value of the Boolean by a corresponding power of 2, either 20, 21, 22, or 23 depending on the identity of the attribute. A T-SQL statement that defines a table consolidating four Boolean attributes is given below.

WITH cteFlags AS (
    SELECT    0 AS Value
    UNION ALL
    SELECT    1
)
SELECT    
    (
        Power(2, 3.0) * Flag3.Value 
     +  Power(2, 2.0) * Flag2.Value 
     +  Power(2, 1.0) * Flag1.Value 
     +  Flag0.Value
    ) AS ID, 
    Flag0.Value,
    Flag1.Value,
    Flag2.Value,
    Flag3.Value
FROM 
    cteFlags Flag0
CROSS JOIN 
    cteFlags Flag1
CROSS JOIN 
    cteFlags Flag2
CROSS JOIN 
    cteFlags Flag3
ORDER BY
    ID

The ETL to load fact tables is now slightly more complex. Rather than each dimension being represented as separate columns, a single column is used. But since the surrogate key for the Junk dimension is not incrementally assigned but coded based on business domain values, this value can be calculated rather than being looked up based on the individual column values.

Greater Flexibility

One of the problems with inclusion of each Boolean attribute as a separate dimension is that when modeled as a role-playing dimension in a multidimensional database, you’ll end up with a whole bunch of Yes and No or True and False entities. When querying this model, it’s not too difficult to get completely lost in the meaning of each of those values. Using a Junk dimension, one can easily manipulate the labels or names applied to each of those entities. So the True flag applied to the Base Fee Schedule Overridden? attribute can be changed from Yes to Base Fee Schedule Not Used or some other description more in line with the business domain.

Easier to Enhance

Another advantage is the ease in which additional Boolean attributes can be added to an existing data mart. Most of the values for these attributes are likely to be False for the vast majority of cases as these attributes are generally used to identify exceptions. Therefore, by simply adding another column to your Junk dimension with False as its value you’ve already accurately included the additional attribute for the vast majority of fact records without even touching the fact table. How Agile is that?

To complete inclusion of the exceptions, there are two steps that need to be executed. First, new rows need to be added to the Junk dimension. These can be simply added by appending the true values for the additional attribute to the existing Junk dimension records. To illustrate using the previous case, the query defining new rows to the Junk dimension created above would simply be the following:

WITH cteFlags AS (
    SELECT    0 AS Value
    UNION ALL
    SELECT    1
)
SELECT    
    (
        Power(2, 4.0) * New_Attribute.Value
     +  Power(2, 3.0) * Flag3.Value 
     +  Power(2, 2.0) * Flag2.Value 
     +  Power(2, 1.0) * Flag1.Value 
     +  Flag0.Value
    ) AS ID, 
    Flag0.Value,
    Flag1.Value,
    Flag2.Value,
    Flag3.Value,
    Flag4.Value
FROM 
    cteFlags Flag0
CROSS JOIN 
    cteFlags Flag1
CROSS JOIN 
    cteFlags Flag2
CROSS JOIN 
    cteFlags Flag3
CROSS JOIN 
    cteFlags New_Attribute
WHERE    
    New_Attribute.Value = 1
ORDER BY
    ID

With the new dimension records added, the second step is to update the fact records identified as “exceptions”. Again, as opposed to a complete reloading of the fact table to include the additional attribute, the Junk dimension key value for these exceptions can be updated accordingly. One way to complete this task is to load a temporary staging table that has the identifying fact fields for all records with a true New_Attribute value. By joining to this table and applying the following update conditions to the corresponding records,  JunkDimensionKey = JunkDimensionKey + 24, you’ve now successfully included a new Boolean attribute into the model. If there are many records where the New_Attribute value is true and your fact table is very large, your best approach would likely be to use a technique that I described in a previous blog post nearly two years ago.

Categories: Data Warehousing

SSRS–Creating the Stephen Few Data Visualization

October 23, 2012 Leave a comment

In my last blog post, I described an approach where the DateTool dimension of Marco Russo was extended to include an Aggregate Month attribute. When referenced in a query in combination with a member of the Aggregation attribute in an MDX query, the Aggregate Month attribute produces the time periods that compose that aggregation. Enhancing your SSAS multidimensional cube in this manner provides functionality that is similar to the Time Intelligence functionality available in PerformancePoint Services, sort of a way of providing dynamic date specific named sets. I feel they’re advantageous for several reasons but I’m a bit on the biased side. For one thing, these “dynamic time-based sets” such as the last twelve months prior to the selected month will be available in any front-end and not just PPS. And as referenced in this SSAS forum post, this approach was successfully used to provide a “Date Sets Alternative” in Tableau.

The original purpose of the enhancement was to facilitate creation of the Stephen Few sales performance dashboard data visualization from Information Dashboard Design. The goal was to make creation of the visualization simple enough that it could be created in ReportBuilder. The end result using the modified Adventure Works source will end up looking something like the following.

image

In the visualization above, the Year-Over-Year YTD Reseller Sales Growth is shown by Country. For this demonstration, a targeted YTD YoY growth of 15% is expected. Any YoY growth rate that is 0.85 or below is considered poor while any value between 85% and 115% was considered Satisfactory as indicated by the linear ranges of the bullet graph shown in the fourth column of the visualization.

To create this visualization, the following steps need to be executed.

  1. Create a single dataset to source all elements,
  2. Create a Tablix control bound to the dataset with a Grouping on Country,
  3. Add Sparkline, Bullet Graph, and Indicator controls, and
  4. Add the legend and scale adornments to the visualization.

Creating the Dataset

Having made the modifications to the Adventure Works cube described in the previous blog post, creating a single dataset to source all data visualization components is simplified greatly. For each Country shown in the diagram above, the dataset produced will need to include the following data elements.

  1. Reseller Sales Amount by month for the Past 12 Months,
  2. YTD Reseller Sales (horizontal bar in bullet graph above, fifth column, and Indicator value),
  3. Prior Year YTD Sales to calculate target value (vertical line in bullet graph above), and
  4. Columns that represent the  boundary values for the linear ranges defining the bullet graph.

The challenge in creating such a dataset is that the date context and date aggregation are at different grains within the same query. To create the dataset, first define a data source to the Adventure Works cube. Next, create a new dataset and in the Query Designer window, drag the [Date].[Month] and [Date Utility].[Aggregation] attribute hierarchies to the Filter Pane. Check the Parameters checkbox for the [Date].[Month] attribute so that consumers can dynamically change the filter context and specify [Last 12 Months] as the filter for the [Date Utility].[Aggregation] filter. [I also added a Filter to limit Countries to only Canada, France, United Kingdom, and the United States only because the Adventure Works cube is pretty sparse] The initial dataset was defined by dragging the [Geography].[Country], [Date Utility].[Aggregate Month], and the[Measures].[Reseller Sales Amount] to the Data Pane. Now you have a query produces the Reseller Sales Amount by Country for the Last 12 Months from whatever selected month the User selects to run the report. How easy was that?

To complete the query, the Actual, Goal, and boundary values for each of the linear ranges need to be added to the query. Since the Aggregation context differs from the [Last 12 Months] created as a filter, calculated members will be created to overwrite this context. These calculated members are created in the Calculated Members Pane of the designer and have the names and definitions as given in the following table.

hdmh4ae3

Two additional points should be specified that were left out of the Definition column above. As the Value and Goal measures overwrite the current context of the [Date Utility].[Aggregate Month] attribute, these metrics were wrapped in an IIf function so that every member of the [Date Utility].[Aggregate Month] hierarchy was returned. (i.e. IIf(([Measures].[Reseller Sales Amount], [Date Utility].[Aggregate Month].CurrentMember) = 0, NULL, …..) ) Additionally, since the constants as defined will also return every member of the [Date Utility].[Aggregate Month] hierarchy, these constants were multiplied by [Measures].[Reseller Sales Growth Value]/[Measures].[Reseller Sales Growth Value] so that NULLs were returned for months outside of the Last 12 Months range.

Defining the Tablix Control

Adding the Tablix control is the simplest step in the process. Insert a Table control to the report and set the DatasetName property equal to the Dataset created above. Add two additional columns to the table, create a grouping on Country (and delete the row matrix column created without deleting the group), and delete the Details section from the report. Bind the third column to the Country field and define the fifth column to the Max([Reseller Sales Growth Value]). The last little bit of formatting involved setting the BorderStyle to None for all textboxes except for the bottom border of the column headers. That’s it. The framework for the data visualization is now defined and now the more difficult process of defining the data visualizations begins.

Creating the Visualizations

Insert a Sparkline in the first column of the table. Set the Category Groups to Aggregate Month and the Values to the Reseller Sales Amount. That’s it for that.

Defining the Indicator is not much more difficult. The only thing to watch out for is that the expressions for Values and States should use the Max, Min, or Avg aggregations of Reseller Sales Growth Value and Reseller Sales Growth Goal rather than the default value of Sum. Once completed, four out of five columns are now there. The only one left is the bullet graph.

Rather than detail how the bullet graph was formatted, most of the steps that I followed are described in detail in Tim Kent’s post from 2009 without all the jiggery-pokery that was required in previous versions of SSRS. The linear ranges were defined using the Max value of the dataset columns explicitly created for that purpose. The linear ranges were defined to have a start and end width of 75% while the solid fill colors for the Poor, Satisfactory, and Good ranges were defined to be Gray, Silver, and WhiteSmoke respectively, providing the contrast necessary to distinguish in the figure shown previously. That’s it for the visualizations.

Completing the Data Visualization

To complete the process, legends and scales were added to the Tablix. The legend is the simpler task. The rightmost report header columns were merged and a rectangle control was inserted to allow for free-entry into that cell. Three additional small rectangles and three textboxes were added to identify the Poor, Satisfactory, and Good linear ranges. The BackGroundColor property for each of the rectangles was set to the appropriate color as identified above.

The scale was added by inserting a Country group summary row and adding another bullet graph to the fourth column of that row. The width properties for all linear ranges and pointers was set to zero so that these elements of the Gauge control were effectively not displayed. Trial and error techniques were used to determine the optimal Scale properties. The most likely change that will be necessary is to increase the Font size of the Scale to an absurdly large amount. For the visualization shown above, the Font size ended up being 48pt.

The completed report is available on my Skydrive account.

Summary

The greatest challenge in replicating the Stephen Few data visualization with SSRS is the creation of a single dataset in which all visualizations can be sourced. By enhancing the DateTool dimension of Marco Russo to include an additional Aggregate Month attribute, the enhanced Date Utility dimension discussed in the previous blog post greatly simplifies the necessary steps.

Categories: SSRS

Enhancing the DateTool Dimension

October 8, 2012 1 comment

Technorati Tags: ,

I had this issue in the past and addressed it in a very inelegant manner involving a many-to-many relationship that I won’t reproduce here. For all you poor souls that inherited that approach, I sincerely apologize. I was young (not really) and inexperienced (also, not really) and knew when I implemented it, it was not the right way (definitely true). However, the users liked it, needed it, and once deployed, it was really difficult to retract.

The scenario that I’m addressing here has its roots in that same place. Say that I want to reproduce the Sales Performance Dashboard data visualization that Stephen Few presents in the last chapter of his book, Information Dashboard Design, within a SQL Server Reporting Services Tablix control. An example of this data visualization can be found here. The visualization includes a Sparkline, an Indicator, and a Bullet Graph for various different financial performance metrics. The difficulty in reproducing this visualization in SSRS is that all data used to generate these visualizations must be included in the same dataset. While the indicator and bullet graph are populated by Year-to-Date values, the Sparkline is sourced by current month values over the last twelve months. Therefore, two different date aggregation contexts, Year-to-Date and Last 12 Months, and two different date contexts, the selected month for the Bullet Graph and Indicator, and the component months for the Sparkline need to included within the dataset sourcing the visualization.

To produce a dataset that could be used to generation this data visualization, the DateTool dimension described by Marco Russo in this blog post was enhanced. The dimension will be referred to as the [Date Utility] dimension in the remainder of this blog post. These enhancements included adding new utility attributes to represent the component members involved in a date aggregation. To illustrate using the Adventure Works cube, if [Date].[Calendar].[May, 2008] and [Date Utility].[Aggregation].[Last 12 Months] are specified in the slicer, including the newly added [Date Utility].[Aggregate Month] attribute to the row axis would generate a set equivalent to [Date].[Calendar].[Jun, 2007] : [Date].[Calendar].[May, 2008]. This set represents the time periods that compose the ([Date].[Calendar].[May, 2008], [Date Utility].[Aggregation].[Last 12 Months]) calculated cell with the current cube context.

The first modification to Russo’s Date Utility dimension was to simply Cross Join the [DateTool].[Aggregation] and [DateTool].[Comparison] views with another view that included each possible [Date].[Calendar].[Month] member. (For this case, detail at the granularity of the Day level was not needed.) Even though the approach involves cross joining three views, each has a relatively small number of possibilities. The definition of the [DateTool].[CalendarMonths] and [DateTool].[DateTool] views are given below.

CREATE VIEW [DateTool].[CalendarMonths] AS
	SELECT	
		dt.MonthKey,
		dt.MonthDesc,
		dt.MonthStartDate,
		dt.QuarterKey,
		dt.QuarterDesc,
		dt.QuarterStartDate,
		dt.SemesterKey,
		dt.SemesterDesc,
		dt.SemesterStartDate,
		dt.Year
	FROM	DateTool.DimDate dt
	WHERE	dt.DateValue		= dt.MonthStartDate 
GO
CREATE VIEW [DateTool].[DateTool] AS
	SELECT	
		a.ID_Aggregation, 
		s.ID_Comparison,
		m.MonthKey, 
		(
			CAST( ID_Comparison AS VARCHAR ) + ' - ' 
			+ CAST( ID_Aggregation AS VARCHAR ) 
			+ ' - ' + CAST(m.MonthKey AS varchar)  
		) AS Description
	FROM	DateTool.DateAggregation a
	CROSS JOIN DateTool.DateComparison s
	CROSS JOIN DateTool.CalendarMonths m
GO

The second modification to Russo’s Date Utility dimension was to add an Aggregate Year –> Aggregate Quarter –> Aggregate Month natural hierarchy to the [Date Utility] dimension. A Business Information Model of the Date Utility dimension is shown below

image

The last modification is to define MDX script calculations to return cell values for the Aggregate Month, Aggregate Quarter, and Aggregate Year attributes. These additional calculations are placed between the script defining the Aggregation members and the script defining the Comparison members. This placement allows an aggregated comparison to also be decomposed into its constituent time periods.

The entire solution (at least with the Aggregate Month calculations defined) is available from my Skydrive account. To describe the approach, the definition of the subcube defined by the [Date Utility].[Aggregation].[Last 12 Months] member and [Date Utility].[Aggregate Month].[Aggregate Month] level will be described as shown in the code below. Only when a single member of the [Date].[By Calendar].[Month] dimension is included in the current context will a non-NULL value be returned. If that check passes, the next check is to determine if the current member of the [Date Utility].[Aggregate Month] attribute is within the last 12 months of [Date].[By Calendar].CurrentMember using an Intersect function call. If so, a tuple is returned that assigns the equivalent of LinkMember( [Date Utility].[Aggregate Month].CurrentMember, [Date].[Month] ); otherwise, a NULL value is returned.

SCOPE( 
    [Date Utility].[Aggregation].[Last 12 Months], 
    [Date Utility].[Aggregate Month].[Aggregate Month] 
);
    THIS = IIf(
        COUNT(EXISTING { [Date].[By Calendar].[Month].Members } ) = 1,
        IIf(
            Intersect(
                { [Date Utility].[Aggregate Month].CurrentMember },
                { 
                    LinkMember( 
                        [Date].[By Calendar].CurrentMember, 
                        [Date Utility].[Aggregate Month] 
                    ).Lag(11)
                    :
                    LinkMember( 
                        [Date].[By Calendar].CurrentMember, 
                        [Date Utility].[Aggregate Month] 
                    )
                }
            ).Count > 0,
            (
                LinkMember( 
                    [Date Utility].[Aggregate Month].CurrentMember, 
                    [Date].[Month] 
                ),
                [Date Utility].[Aggregation].DefaultMember,
                [Date Utility].[Aggregate Month].DefaultMember
            ),
            NULL
        ),
        NULL
    );
END SCOPE;

Lastly, to demonstrate its application, consider the following MDX statement which includes a single member of the [Date].[By Calendar] hierarchy and a single member of the [Date Utility].[Aggregation] attribute in the WHERE clause.

SELECT    
    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    NON EMPTY
    {
        [Date Utility].[Aggregate Month].[Aggregate Month].Members 
    } ON ROWS
FROM    
    [Adventure Works]
WHERE
    (
        [Date].[By Calendar].[May, 2008],
        [Date Utility].[Aggregation].[Last 12 Months]
    )    

Including the [Date Utility].[Aggregate Month] attribute on the ROW axis returns the [Date Utility].[Aggregation] constituent members as shown below

Aggregate Month Internet Sales Amount
Jun, 2007 $554,799.23
Jul, 2007 $886,668.84
Aug, 2007 $847,413.51
Sep, 2007 $1,010,258.13
Oct, 2007 $1,080,449.58
Nov, 2007 $1,196,981.11
Dec, 2007 $1,731,787.77
Jan, 2008 $1,340,244.95
Feb, 2008 $1,462,479.83
Mar, 2008 $1,480,905.18
Apr, 2008 $1,608,750.53
May, 2008 $1,878,317.51

While this query doesn’t really illustrate why you would include an Aggregate Month enhancement, in a future post, use of the enhanced Date Utility dimension will be used to show how the Stephen Few data visualization including an Indicator, a Bullet Graph, and a Sparkline can be generated using Report Builder.

Categories: SSAS

Excel 2010, Connection files on the Network, SharePoint Data Connection Libraries, and the Enterprise

June 17, 2012 3 comments

As shown below, in the Workbook Connections dialog of Excel 2010, when a user attempts to add a new connection a list of existing Office Data Connection (.odc) files to add is presented. One of the categories in this list is Connection files on the Network. I searched, and searched and searched on the internet for a way to enable SharePoint Data Connection Libraries (DCL) to be listed in this section and the only “workable” solution that I came to was David Lean’s blog post of nearly two years ago. To get a DCL visible on the client, David had to hack the registry of the client computer even after specifying the URL of the DCL in the Publish Links to Office Client Applications functionality in a User Profile Service Application (UPSA). Even David in this post says this seems insane and he hadn’t thoroughly researched. Over a year and a half later, still nothing new was been published, at least that I could find. Wanting to know why, I thought if I stopped the UPSA I bet this is still going to work and what do you know, I disabled the UPSA and I was still able to access my DCL directly through Excel. Being the forever academician, I had to know what was happening and why this worked the way it did. This blog post is the fruits and spoils of this investigation.

AddConnectionDialog

His solution, though “workable”, is actually a hybrid of two different ways the ultimate objective could be achieved. One approach is to use Group Policy exclusively to push registry entries out directly to the client as David explains in his blog post. This approach is the purest way to implement and is applicable if you’ll only have a very small number of predefined DCL and Document Libraries that you’ll want to publish. This method requires your domain admin set up the entries that define each published link. The other way is to use the Publish links to Office Client Applications functionality of a UPSA.  An UPSA is an absolute pain in the $#*$ to install. How bad? Well when SQL Server 2012 RC3 was released last summer, there was a tweet of Chris Webb’s where he stated, “Installing SQL Server 2012 and SharePoint 2010 is a lot like brain surgery; one mistake and you’re screwed.” Having now installed SQL Server 2012 several times as well as installing the UPSA several times, the SharePoint Reporting Services Service Application installation process is an absolute breeze compared to the UPSA installation process; more on that later.

Why?

Before describing the process, I thought I’d briefly diverge and discuss why publication of a DCL is important in an enterprise reporting architecture built on the Microsoft Business Intelligence suite. And since my career pretty much revolves around SSAS, the reasons described here will be almost exclusively SSAS centric. There are several reasons besides the one already mentioned in the David Lean’s blog post. First, connections to SSAS databases can be created and managed in a central location without users having any knowledge of the connection details. It’ll just work without non-technical personnel having to know how, why, or where. Second, when using Excel Services in a reporting architecture against SSAS, the connection details must be stored in a trusted site in your SharePoint farm. Religiously using connections stored in DCLs published in the Connection files on the Network will minimize Excel Services issues. And lastly, when using the UPSA Publish Links to Office Client Applications feature, creation of new connections and the publication of those new connections to the appropriate audiences is greatly facilitated.

So now knowing why, how? I’ll describe the steps that I used to accomplish beginning with the UPSA approach from soup to nuts. The end result is that registry entries are created on the client machine. Those registry entries, the same ones mentioned in David Lean’s blog post, are what enables Office client applications to see  published DCLs and Document Libraries. The only difference between the two approaches is how those blog entries are created. 

The User Profile Service Application Approach

Creating the User Profile Service Application

I’m not going to present an exhaustive step-by-step guideline to installing the UPSA. Spenser Harbar has already published the definitive guide for that. I’m only including this section to describe some of the issues I ran into when creating a UPSA with SP1. I can’t stress enough that to get a UPSA working you should follow the guide step-by-step and not skip a single step. I’m just including this section to document some of the issues I faced.

  • First issue – When attempting to Manage the UPSA for the first time, I was greeted with a nasty, non-descript error. Nothing was logged to the Event log. Turns out that a Windows update was applied that broke the UPSA with application of Sharepoint SP1. To resolve the issue, the patch related to KB2560890 had to be reinstalled. More can be read on that issue here.
  • Second issue – When attempting to create an Active Directory connection, I received an “Unable to process Create message” error. Turns out this was also another SP1 issue. To resolve the issue, I had to install the August 2011 Cumulate Update. Turns out the cumulative update would have also rectified the first issue, but I didn’t know that at the time.

Once the August 2011 Cumulative Update was applied, I finally had a working UPSA connected to my AD. To add a published link, I followed the approach described on Technet. Since I was testing with the SharePoint Business Intelligence Center site template, I added a link to the http://testsite:15000/Data Connections for PerformancePoint location. That should complete all the SharePoint server configuration that was needed. [I was using a different port for my My Sites site collection, hadn’t created a host header, nor added DNS entries because frankly, I was just testing this functionality out. In a real environment, you would want to do it the right way.]

There is one other task that I performed that I cannot confirm that was needed. One person states that to push published links to client applications to function correctly, the Desktop Experience Feature has to be installed on the SharePoint server. I’m not sure if that’s the case but in my situation, I had this Feature installed.

Setting the Default My Site Location

Now, if you’re like me, you thought you were done. If I just launch Excel 2010 from my client machine, I should now see my published links in the Save and Send menu. However, it didn’t quite work that way. What I discovered, mainly from reading how this functionality worked with SharePoint 2007 and Office 2007, was that I needed to open a browser and navigate to my SharePoint site, http://testsite. I then needed to select the My Site menu option. When I clicked on the link corresponding to my logon id, what I was expecting to see based upon the SharePoint 2010 documentation, was an option to specify this My Site as my default My Site. However, that functionality seemed to disappear with Office 2010. Unfortunately, that wasn’t the intention with Office 2010 because in order for the Publish links to Office client applications to function, a critical registry entry is needed and the Set as Default My Site option is the exact option that performs that step. Back to the internet and the resolution of the Third issue.

  • Third issue – You’ll need to hack the registry for the Set as Default My Site functionality to show up with Office 2010. Paul LieBrand has described the correction for this in his blog post. http://paulliebrand.com/2011/09/27/set-as-default-my-site-sharepoint-sites-or-my-sharepoint-sites-missing-in-office-2010-on-sharepoint-2007/ I was trying to avoid a solution where I’d have to hack the registry on each client machine but we’ll get to a fix for that later. There’s a far better solution to accomplish what the Set as Default My Site functionality does anyway using Group Policy. Once the registry corrections are applied, when you navigate to your My Site location and click on the My Content menu option, a modal dialog box will now be displayed asking you if you wish to set this location as your default My Site location. Once this location is set, the dialog box is not shown again.What does this step actually do? All that has been done thus far is a registry entry, PersonalSiteURL has been set under the HKCU\Software\AppDataLow\Microsoft\Office\14.0\Common\Portal registry key as shown below:

image

    So unless you can get everyone that needs to use the DCL to create their My Site location, click on the My Content menu option, and set their default My Site location, you’re going to be fielding complaints.

Publish Links

So now, when I launch Excel and I choose the option to add an External Data Connection, I’m expecting to see my published DCLs shown in the Connection files on the Network section of my Existing Connections dialog. The first time I click on the Add button of the Workbook Connections dialog box, I see nothing. Why isn’t it working? Actually, it did just work but it needed me to click on the Add button for it to be kick started. When I clicked on the Add button a couple of registry entries were created. First, a LinkPublishingTimeStamp registry entry was created in the HKCU\Software\AppDataLow\Microsoft\Office\14.0\Common\Portal registry key, the same key where my PersonnalSiteURL registry entry was created. This registry entry records the time the SharePoint UPSA application was last checked for published links. Rather than query the UPSA every time for published links that should be displayed in Existing Connections and other dialogs, as this list should be relatively static and reduce the amount of network traffic created by this feature, Office will not query the UPSA service again until 24 hours past this recorded timestamp value.

Along with the LinkPublishingTimeStamp registry entry, a whole slew of registry keys are created with one registry key for each link defined in the Publish Links to Office Client Applications feature in Central Administration. These links will be stored underneath the HKCU\Software\Microsoft\Office\14.0\Common\Server Links\Published\My Site registry key as shown below,

image

These registry keys are ultimately the original goal that we set out to achieve. When I attempt to add a new connection to my Excel workbook, I see a new option, Sharepoint Connections, i.e. the name of the registry key in the picture above in the Connection files on the Network section of my Existing Connections dialog as shown below.

image

A folder will be shown for each key listed under the HKCU\Software\Microsoft\Office\14.0\Common\Server Links\Published\My Site registry key with a LinkType Q-WORD registry entry with a hexadecimal value of 4000000. This LinkType value identifies the published link as a Data Connection Library.  The IsMember and IsPublished registry entry keys are not optional as some blog posts have implied. These registry entries must also be present with a value of 0 and 1 respectively as shown in the figure above.

Once these registry entries exist for a user and on a given computer, the shared DCL corresponding to the registry key will be listed whenever the Existing Connections dialog box is displayed. So these seems like a lot of work, with a lot of dependencies, with a lot of things that could go wrong, just to display a shared DCL. Seems like there may be a better way in simple deployments where only a single DCL is needed. That better way is to create the corresponding registry entries through Group Policy. Now to discuss that approach.

Group Policy

Set Default My Site

Before discussing how a shared DCL might be pushed out to clients using a GPO, I thought I would discuss the use of GPO that would be applicable even if the UPSA approach is used. Rather than require all users to create there own My Site and set that location as the default My Site on any computer that they might use, populating the HKCU\Software\AppDataLow\Microsoft\Office\14.0\Common\Portal registry key using GPO is a far superior approach. Plus, one of the advantages of using an GPO in this case is that the incorrect registry entry problems discussed above don’t have to be addressed on each and every client machine.

Since I don’t claim to be much of an expert in domain administration, creating a GPO was unfamiliar territory for me. But then again, Sharepoint configuration is not exactly my forte either. Luckily for me, creating a GPO is really pretty easy. The only drawback is that you’ll have to have domain admin rights in order to create to a GOP. But since I’m using a virtual domain running on my laptop, no problem.

To create a GPO you’ll need to launch the Group Policy Management from the Administrative tools folder or you can type GPMC.MSC from the command line prompt. Expand your forest and domain nodes until the Group Policy Object folder is displayed and then perform the following operations:

  1. Right click on the Group Policy Object folder and select the New option from the popup menu. Enter Personal Site as the name for the GPO and leave the Source Starter GPO set to (none).
  2. Right click on the newly created Personal Site GPO and select Edit from the popup menu.
  3. Expand the User Configuration, Preferences, and Windows Settings nodes in the tree on the left hand side to expose the Registry entry.
  4. Right-click on the Registry node and select the New|Registry Entry from the popup menu.
  5. Change the Action property to Create.
  6. Leave the Hive value set on HKEY_CURRENT_USER and enter Software\AppDataLow\Microsoft\Office\14.0\Common\Portal as the Key Path.
  7. For the Value name, enter PersonalSiteURL.
  8. Leave the Value type as the default of REG_SZ and enter http://testsite:15000/personal/%UserName%/ as the value data parameter. The dialog form should look like the screen shown below.
  9. Since we’re referencing the %UserName% variable to retrieve the current user id, select the Common tab and make sure the Run  in logged-on user’s security context (user policy option) is selected.
  10. Close out of the Group Policy editor.

image

Now all that’s left to do is to link the GPO you just created to an organizational unit in your AD and this registry entry is generated without a user having to create a personal site.

Publishing a DCL through a GPO

If it’s this easy to push a registry entry out to an AD organizational unit using a GPO, and ultimately, the solution we’re striving for involves just creating these registry entries, why not just publish your DCL and document libraries through GPO? If you’ve got a small number of predefined links and your domain administrator is comfortable with defining the GPO and managing the targeted audiences, GPO will be a better implementation choice. Much less can go wrong with the GPO approach and it’s a much better user experience as all the user has to do to see the published links is to log in.

I would recommend creating a separate GPO for each shared link. Creating separate GPOs for each link will ease maintenance and allow for each published link to be targeted to the appropriate audience. DCL links and document library links are created identically and differ only in the value of the LinkType registry entry. Therefore, I’ll only present the steps necessary to publish a DCL using a GPO.

  1. Right click on the Group Policy Object folder and select the New option from the popup menu. Enter an identifier for the shared link, such as Enterprise Connections and leave the Source Starter GPO set to (none).
  2. Right click on the newly created GPO and select Edit from the popup menu.
  3. Expand the User Configuration, Preferences, and Windows Settings nodes in the tree on the left hand side to expose the Registry entry.
  4. Right-click on the Registry node and select the New|Registry Entry from the popup menu.
  5. Change the Action property to Create.
  6. Leave the Hive value set on HKEY_CURRENT_USER and enter Software\Microsoft\Office\14.0\Common\Server Links\Published\My Site\Sharepoint Connections as the Key Path.
  7. For the Value name, enter URL.
  8. Leave the Value type as the default of REG_SZ and enter the URL of your SharePoint DCL, i.e http://testsite/Data Connections for PerformancePoint.
  9. Create three additional registry entries with  the Name, Data Type, and Values given in the table below.

Name

Data Type

Value

IsMember REG_DWORD 0
IsPublished REG_DWORD 1
LinkType REG_QWORD DCL: 00 00 00 04 00 00 00 00
Document Library: 00 20 00 00 00 00 00 00

Once you’ve linked this GPO to an organizational unit, the correct targeted audience will now be using the same data connection files to define all their Excel Services analyses.

Conclusions

This blog post started with the simple goal of presenting a DCL in the Connection files on the Network section of the Existing Connections dialog in Excel on a client machine. While the solution wasn’t necessarily complex, the available documentation is confusing, ambiguous, incomplete, and often erroneous; therefore, this blog post was written. What was found was that links that are listed in this section will appear in the HKCU\Software\Microsoft\Office\14.0\Common\Server Links\Published\MySite registry key. Whether these registry entries appear as a result of the Publish links to Office client applications or whether these registry entries are populated through GPO is a choice that each organization will need to decide. Below is my opinion on the pros and cons of each choice.

  • Group Policy Object
    • Pros
      • No additional configuration steps required of user. If users will access on different machines, the entries will be created when Group Policy is applied.
      • Easier implementation. No need to install and configure the User Profile Service Application and enable My Sites. (My sites not required if default My site reistry key set by GPO though)
      • Published links will appear even the first time a user attempts to add an external resource.
    • Cons
      • Entries must be created by someone with domain admin rights.
      • Creation of a published link more complex than defining through the user interface of a User Profile Service Application
  • User Profile Service Application
    • Pros
      • Published links created by Sharepoint farm admin rather than domain admin.
      • Definition of published link created through Central Administration
    • Cons
      • If My Default Site link must be populated for the push mechanism to work. (Probably best to populate this entry through GPO even if the UPSA approach is used.)
      • Entries will not appear be available the first time an external resource is added to a workbook.

So that summarizes my experience with this feature and I’m not going to guarantee that everything stated is completely correct. If the default My Site registry entry is populated through a GPO, much of the problems with the UPSA approach are circumvented. Plus, that approach allows an enterprise to use the UPSA without requiring enabling of  My Sites. If given my choice, that would be the approach that I would take.

Categories: Uncategorized

The SSAS Financial Cube–Part 1a–Ragged Hierarchies Cont.

March 3, 2012 3 comments

In this post, I was supposed to discuss the ByAccount aggregation function. However, before changing focus, I thought I would write some more on Ragged Hierarchies.

In my last post, I discussed how ragged hierarchies can be handled in SSAS using two different approaches. One can model the dimension using a parent-child hierarchy or one can flatten out a parent-child hierarchy, model each potential level as a separate attribute in that dimension, and hide missing levels in an user-navigable hierarchy using the HideMemberIf property. The advantage of the first approach is that it is more dynamic and easier to implement; the advantage of the second is that it is more performant. But what I left out of the last post is a few additional advantages of the second approach. Rather than move on this week to discuss the ByAccount aggregation function or currency conversion, I want to discuss a situation where you have to address multiple ragged hierarchies in the same dimension. How do you handle the situation where you have an Account dimension and two different, maybe only slightly different but yet still different, rollups? One case that I encountered in the past was with a downstream oil and gas company where product transportation costs were included in COGS in one case but due to its unpredictable nature, were excluded from COGS in another. Essbase handled this situation beautifully as in our case, the Accounts dimension was modeled as a dense dimension. Using Essbase’s Shared Member functionality, a second rollup was created by simply defining that rollup in the Dimension Outline based on leaf members used to define the primary account rollup.

Implementing multiple ragged hierarchies in the same dimension is a bit more difficult to implement in SSAS than it was with Essbase. If you choose to model the ragged hierarchies as parent-child hierarchies, you’ll encounter problems as SSAS has a limitation that a dimension can have one and only one parent-child hierarchy. One can overcome that limitation by using the Multiple Parent/Child Hierarchies approach described by Marco Russo and Alberto Ferrari starting on page 70 of the often referenced Many-to-Many Revolution white paper. However, there’s a double performance penalty to be paid using that approach. Not only can you not build aggregations on the intermediate levels in the parent-child hierarchy, now the many-to-many relationship has to be resolved now when an account hierarchy is included in a query. If one sets the IsAggregatable property to False on an account hierarchy, performance can become unacceptable even with a relatively small cube.

The more performant approach will be to flatten out each hierarchy into a separate physical table where the key for each table is the leaf level accounts. A separate physical table is created for each hierarchy just to simplify maintenance as trying to create a single procedure to flatten out all hierarchies into a single physical table becomes a bit difficult to maintain. A view is created to join all tables representing the distinct hierarchies into a single representation. As each table is defined by a primary key on the leaf level account, a view uniting each of these tables would be easy to created. A single SSAS dimension is created from that view where a distinct user-navigable hierarchy is created for each physically flattened table.

Which solution works best for your application? Depends. If you need to respond dynamically to changing number of hierarchies or levels, an approach like that described in The Many-to-Many Revolution is probably going to be the best approach. If you need a solution that addresses aggressive performance requirements, flattening out each hierarchy and combining those hierarchies into a single dimension is going to be the better approach.

What’s been your experiences addressing multiple ragged hierarchies in a single dimension?

Categories: SSAS Tags:

The SSAS Financial Cube–Part 1–Ragged Hierarchies

February 26, 2012 5 comments

It’s been way too long since I’ve posted anything. As one of my new year’s resolutions was to blog more and it’s nearly March, I ‘m failing miserably at that goal. Not to make excuses but seems like at my day job all I do is write and when I get home, writing is the last thing I want to do. So after what seems like hours of commuting, at home I’ve been trying to make up for a lack of SharePoint knowledge, a product I have been trying to avoid for nearly a decade but have been unable to over the last few months. Maybe a post or two on that later just to embarrass myself.

So instead of embarrassing myself with little SharePoint knowledge that I’ve gained, I’m going to post on something that I know a bit about. I’m going to try to commit to writing a few posts around building a financial cube in SSAS. I’ll try not to completely duplicate previous blogs, particularly those of Garrett Edmondson.

On my first BI engagement over a decade ago, I was involved in a multidimensional database product selection. The potential choices were Hyperion Essbase, Microsoft OLAP Services 7.0 (pre-2000), Cognos whatever it was called at the time, and Oracle Express. The scope of the project was limited to the Profit&Loss accounts so we did not need to handle complex per account semi-additive measures, though I will discuss those later in another post. We selected Essbase and one of the principal reasons was that this product gracefully handled ragged hierarchies while the Microsoft offering did not. Well, since then, the Microsoft offering has matured quite a bit. Been some bumps along the way such as that annoying MDX Compatibility setting issue but seems like now the kinks have ironed themselves out. They just seem to work now. However, there’s still a way and probably a better way to implement.

What is a Ragged Hierarchy?

So first, what is a ragged hierarchy? MSDN actually has a great page defining a ragged hierarchy and how to work with one in SSAS. It may be the best page on the site! Succinctly, it’s a hierarchy with a varying number of levels dependent upon the navigation path taken. Usually, you’ll run into ragged hierarchies with financially based dimensions and particularly with an Accounts dimension. For Cost and Revenue related expenses, the navigation path may have many, many levels depending on the complexity of an business’ value chain while for Operational expenses, the navigation path may only contain just two or three levels.

There exists two ways to address the implementation of a ragged hierarchy; using a parent-child hierarchy or flattening out that parent-child relationship and taking advantage of the HideMemberIf property in a user-navigable hierarchy. Which approach you use depends on the requirements of your BI solution.

Parent-Child Hierarchy

The most common approach to addressing a ragged hierarchy is with a Parent-Child dimension. In fact, using a parent-child hierarchy is exactly how the Account dimension is defined in the AdventureWorks database as shown below.

Amount        
        USA Operations
Net Income Operating Profit Operating Expenses   $27,661,868.50
    Gross Margin Net Sales $64,626,468.00
      Total Cost of Sales $20,236,365.00
    Gross Margin Total   $44,390,103.00
  Operating Profit Total     $16,728,234.50
  Other Income and Expense Interest Income   $100,687.00
    Interest Expense   $154,750.00
    Gain/Loss on Sales of Asset   ($123,795.00)
    Other Income   $13,250.50
    Curr Xchg Gain/(Loss)   $214,625.00
  Other Income and Expense Total   $50,017.50
  Taxes     $4,168,749.00
Net Income Total     $12,609,503.00

So that the raggedness of the hierarchy is more evident, the PivotTable was displayed in tabular form.

What are the pros and cons of a parent-child hierarchy? Ella Maschiach does an excellent job describing the pros in her blog post. It’s also the approach that is the easiest to understand and the approach that is the easiest to implement. However, there’s a serious performance penalty that one pays for this simplicity. One of the primary reasons one uses SSAS in a BI solution is for aggregation management. Rather than constructing summary fact tables to address performance problems with the most commonly references dimensional attributes, aggregations in SSAS handle that for me. However, with a parent-child hierarchy, as mentioned in the Analysis Services Performance Guide, aggregations will only be created on the leaf level of the parent-child hierarchy and the All level only if the IsAggregatable property is not set to false. Therefore, with very large parent-child hierarchies that are used heavily in end user queries, there will be a tremendous performance penalty paid. If performance is at a premium, avoid parent-child hierarchies.

Naturalizing a Parent-Child Hierarchy

When your solution contains a parent-child hierarchy with many members, you’ll want to naturalize the dimension. This approach flattens out the parent-child hierarchy so that a separate column is explicitly created for each potential level in the hierarchy. You may want to still model your dimension using a recursive relationship because of its ease of maintenance. When creating a new member in the hierarchy, only the parent member would have to be specified. Once a dimension has been flattened, creating a new member requires that all ancestors of that member would also have to be specified on that dimension record.

From a recursive relationship, you’ve got a couple choices. You can flatten the dimension out first by using a CTE and then building your SSAS dimension from that structure. Itzik Ben-Gan described an approach in a SQL Server Mag article where a recursive relationship was flattened out so that a record was created for each node in the hierarchy. With that table, you can build a dimension where each level of the hierarchy is a separate attribute of the dimension. The dimension structure once flattened would look something like the following.

image

and the Attribute Relationships table for that dimension would define a natural hierarchy from Account Key –> Account Level 3 –> Account Level 2 –> Account Level 1.

image

Each level in this case is associated with a Row Number attribute that defines the natural order in which the hierarchy should be displayed.

When flattened in this manner, a ragged hierarchy is converted into a regular hierarchy where every navigation path has the same number of levels. For navigation paths where levels should be missing, the child name is repeated in that level as shown below.

Account Key 

Account Desc

Level 3 Key

Level 3 Desc

Level 2 Key 

Level 2 Desc

Level 1 Key 

Level 1 Desc

1 Net Income 1 Net Income 1 Net Income 1 Net Income
2 Operating Profit 2 Operating Profit 2 Operating Profit 1 Net Income
3 Operating Expenses 3 Operating Expenses 2 Operating Profit 1 Net Income
4 Gross Margin 3 Operating Expenses 2 Operating Profit 1 Net Income

To “hide” these levels, the HideMemberIf property of each level in the user-navigable hierarchy needs to be set to Parent Name as shown below.

clip_image003

Why flatten out a hierarchy when it’s much less intuitive and harder to maintain? Simple answer is while aggregations cannot be created on intermediate levels of a parent-child hierarchy, aggregations can be created on any level of a flattened hierarchy.

So say you have a parent-child hierarchy that’s performing less than optimally. Do you have to do all this to change it? Not exactly. You can use the Naturalize Parent-Child Dimension menu option in Solution Explorer that is installed with BIDSHelper. This added feature will do much of what’s been described in this section for you. Pretty cool.

Next Steps

Handling ragged hierarchies is just one of the requirements that’s required in constructing a financial. In a couple future posts, if I ever get around to it, I’ll discuss the ByAccount aggregation function that allows differing aggregation functions to be applied to different sections of an Account hierarchy, and a couple of approaches for addressing currency conversion.

Categories: SSAS Tags:

Is SSAS Too Lenient on MDX Syntax?

October 3, 2011 4 comments

I started my career in IT as a Visual Basic application programmer. VB had the wonderful/horrible data type called a Variant. Variant data types were great in that you could store anything in a Variant. Variants were horrible because you could store anything as a Variant. Because you could store anything in a Variant, you never knew what you had. As I matured as a programmer, I grew to hate the Variant data type. I grew to hate scripting. I grew to love strongly typed languages. Applications written in strongly-typed languages are much easier to maintain, much easier to understand, and are much less likely to incur run-time errors that would have been discovered when the application was initially compiled.

Well, you may be asking yourself, what does this have to do with MDX? When it comes to interpreting and understanding an MDX statement, it just seems all functions and parameters don’t care much what you pass them. Everything’s a Variant and the parser will do it’s best job at artificial intelligence to interpret what you want it to return. And it just seems to me that building the parser in this way makes MDX much harder to understand rather than easier to construct.

For example, there was a recent post on the SSAS forum that inspired me to write this blog. Not so much because of the blog post itself. But because of the root cause of the problem. The issue posted was why was the following query acceptable,

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Product].[Product Categories].[Category].Members
        * [Date].[Calendar].[Calendar Year].[CY 2008]
    } ON ROWS
FROM    [Adventure Works]

but when you switch the [Product].[Product Categories].[Category].Members and [Date].[Calendar].[Calendar Year].[CY 2008] as below

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Date].[Calendar].[Calendar Year].[CY 2008]
        * [Product].[Product Categories].[Category].Members
    } ON ROWS
FROM    [Adventure Works]
execution of the query results in an the following error: Query (5, 3) The * function expects a tuple set expression for the 1 argument. A string or numeric expression was used.

Why would should a small change make such a big difference? The question that probably should be asked is not so much why the second construct generates an error but more why the first construct does not. The CROSSJOIN function takes two sets as parameters. In the first construct, the first two parameters to the CROSSJOIN function are a member, [Customer].[Customer Geography].[City].&[Berkshire]&[ENG], and a set [Product].[Product Categories].[Category].Members. In this case, the parser decides since one of the parameters is a set, you really meant to execute a Crossjoin so it implicitly converts the member passed as the initial parameter to a set. In the second construct, the parser sees that both parameters passed to the Crossjoin are Members and neither is a set. In this case, it chokes like a cat with a hairball. If the first parameter is fully qualified as a set, { [Customer].[Customer Geography].[City].&[Berkshire]&[ENG] }, or the second parameter is qualified as a set, { [Date].[Calendar].[Calendar Year].[CY 2008] }, then the error is resolved.

Just seems to me is that when a function requires a set as a parameter, you should have to qualify the parameter as a set. When the function requires a tuple, you need to qualify it as a tuple. And when you need to pass a member, none of this passing a hierarchy and the default member is assumed. Just my opinion but these features in the language make MDX less tractable rather than more.

Categories: MDX
Follow

Get every new post delivered to your Inbox.