Archive

Archive for the ‘MDX’ Category

Is SSAS Too Lenient on MDX Syntax?

October 3, 2011 4 comments

I started my career in IT as a Visual Basic application programmer. VB had the wonderful/horrible data type called a Variant. Variant data types were great in that you could store anything in a Variant. Variants were horrible because you could store anything as a Variant. Because you could store anything in a Variant, you never knew what you had. As I matured as a programmer, I grew to hate the Variant data type. I grew to hate scripting. I grew to love strongly typed languages. Applications written in strongly-typed languages are much easier to maintain, much easier to understand, and are much less likely to incur run-time errors that would have been discovered when the application was initially compiled.

Well, you may be asking yourself, what does this have to do with MDX? When it comes to interpreting and understanding an MDX statement, it just seems all functions and parameters don’t care much what you pass them. Everything’s a Variant and the parser will do it’s best job at artificial intelligence to interpret what you want it to return. And it just seems to me that building the parser in this way makes MDX much harder to understand rather than easier to construct.

For example, there was a recent post on the SSAS forum that inspired me to write this blog. Not so much because of the blog post itself. But because of the root cause of the problem. The issue posted was why was the following query acceptable,

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Product].[Product Categories].[Category].Members
        * [Date].[Calendar].[Calendar Year].[CY 2008]
    } ON ROWS
FROM    [Adventure Works]

but when you switch the [Product].[Product Categories].[Category].Members and [Date].[Calendar].[Calendar Year].[CY 2008] as below

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Date].[Calendar].[Calendar Year].[CY 2008]
        * [Product].[Product Categories].[Category].Members
    } ON ROWS
FROM    [Adventure Works]
execution of the query results in an the following error: Query (5, 3) The * function expects a tuple set expression for the 1 argument. A string or numeric expression was used.

Why would should a small change make such a big difference? The question that probably should be asked is not so much why the second construct generates an error but more why the first construct does not. The CROSSJOIN function takes two sets as parameters. In the first construct, the first two parameters to the CROSSJOIN function are a member, [Customer].[Customer Geography].[City].&[Berkshire]&[ENG], and a set [Product].[Product Categories].[Category].Members. In this case, the parser decides since one of the parameters is a set, you really meant to execute a Crossjoin so it implicitly converts the member passed as the initial parameter to a set. In the second construct, the parser sees that both parameters passed to the Crossjoin are Members and neither is a set. In this case, it chokes like a cat with a hairball. If the first parameter is fully qualified as a set, { [Customer].[Customer Geography].[City].&[Berkshire]&[ENG] }, or the second parameter is qualified as a set, { [Date].[Calendar].[Calendar Year].[CY 2008] }, then the error is resolved.

Just seems to me is that when a function requires a set as a parameter, you should have to qualify the parameter as a set. When the function requires a tuple, you need to qualify it as a tuple. And when you need to pass a member, none of this passing a hierarchy and the default member is assumed. Just my opinion but these features in the language make MDX less tractable rather than more.

Advertisements
Categories: MDX

Generate, Top Count, and SubTotals Revisited

September 12, 2010 1 comment

In my previous post, I used Generate, TopCount, and EXISTING to produce a list of the top products and the total for all other products associated with a category. The problem this post deals with is only slightly more complicated. Rather than address the top products associated with a product category, in this example, we want to display the top products associated with a sales region and in addition to the top products, we also want to display the totals for the current region’s top 5 products. The approach used in the previous post took advantage of the fact that each product was associated with one and only one product category. Since a product can be purchased in more than one region, a slightly different approach has to be used here.  Like before, the Generate statement combined with the TopCount function can be used to produce the top products for each region, [Region Top Products]. However, unlike the previous post, the sales region has to be included in the second argument of the Generate and included in the resultant set to identify the context of the current region since a single product can be returned more than once.

WITH SET [Region Top Products] AS
    Generate(
        { [Sales Territory].[Sales Territory].[Region].Members },
        [Sales Territory].[Sales Territory].CurrentMember
        * TopCount( 
            { [Product].[Product Categories].[Product].Members },
            5,
            ( [Measures].[Internet Sales Amount], [Sales Territory].[Sales Territory].CurrentMember )
        )
    )

The region subtotals are also slightly more difficult to return. Like before, a new calculated member in the [Product].[Product Categories] hierarchy needs to be created to represent the top 5 product totals. However, once again since both the identity of the region AND the product is relevant, both have to be accounted for. To do so, this region subtotal member is defined as an aggregate over the intersection of the [Region Top Products] set defined above and the crossjoin of the current sales region and products. Since this crossjoin must be re-evaluated in the context of the current sales region in each iteration, the EXISTING declarative must once again be used. The resultant definition is given below.

MEMBER [Product].[Product Categories].[All].[Top 5 Product Total] AS
    Aggregate(
        Intersect(
            [Sales Territory].[Sales Territory].CurrentMember
            * EXISTING { [Product].[Product Categories].[Product].Members },
            [Region Top Products]
        ),
        [Measures].[Internet Sales Amount]
    )

And lastly, to include the subtotals together with the top product list, two sets are combined and ordered by the sales region name. The complete final query is returned below.

WITH SET [Region Top Products] AS
    Generate(
        { [Sales Territory].[Sales Territory].[Region].Members },
        [Sales Territory].[Sales Territory].CurrentMember
        * TopCount( 
            { [Product].[Product Categories].[Product].Members },
            5,
            ( [Measures].[Internet Sales Amount], [Sales Territory].[Sales Territory].CurrentMember )
        )
    )
MEMBER [Product].[Product Categories].[All].[Top 5 Product Total] AS
    Aggregate(
        Intersect(
            [Sales Territory].[Sales Territory].CurrentMember
            * EXISTING { [Product].[Product Categories].[Product].Members },
            [Region Top Products]
        ),
        [Measures].[Internet Sales Amount]
    )
SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    Order(
        [Region Top Products]
        + CrossJoin(
            { [Sales Territory].[Sales Territory].[Region].Members },
            { [Product].[Product Categories].[Top 5 Product Total] }
        ),
        [Sales Territory].[Sales Territory].CurrentMember.MEMBER_CAPTION,
        BDESC
    ) ON ROWS
FROM    [Adventure Works]
Categories: MDX

Generate, TopCount, and All Others

September 4, 2010 Leave a comment

Recently, there was a post on the Analysis Services forum that I made a very sad attempt at answering. The question asked was how to construct an MDX statement that shows the Top 2 children associated with a parent and also show the amount associated with all other children as an aggregated total. After struggling with this problem and posting several real sorry attempts at solving, I think with a whole lot of guidance from Stefano, a decent, easily explainable solution has been derived. I’ll try to explain here and hope others can use the approach to solve similar problems.

This problem is not a new but one that Chris Webb (and likely many others) have written about. His approach was applicable to MSAS but with the ability to evaluate sets using the current context with the EXISTING declarative, the approach given in that post is overly complex now in 2005 and beyond. I apologize in advance to anyone if this solution has already been posted but I searched and couldn’t find any references. Not that I searched all that hard. Plus, I thought it pretty cool so writing about it again is not such a bad thing.

The problem sounds simple enough. First step is to calculate the top products for each category member. Using the Adventure Works as an example, a Generate statement combined with the TopCount function accomplishes what we wish to achieve. So to calculate the Top 2 products associated with each category member, we can create a named set, [Top Items], defined as the following.

WITH SET [Top Items] AS 
        Generate(
                { [Product].[Category].[Category].Members },
                TopCount(
                        EXISTING [Product].[Product].[Product].members,
                        2,
                        [Measures].[Reseller Sales Amount]
                )
        )

There are a couple points to make about the above definition. One, the EXISTING declarative before [Product].[Product].[Product].Members set is critical. If not used, the TopCount set is evaluated only once and those top products are the only ones returned in the final results. That’s not what we want. Second point to make is that the current [Product].[Category] member is NOT included in the resultant set. Stefano’s solution (and mine on the post) added it in the second argument of the Generate function and then used the Extract function to only return the [Product].[Product] hierarchy members. Just not including the category hierarchy in the second argument of the Generate statement accomplishes the same result without the penalty of performing a CrossJoin and then an Extract.

Once a set of the top two products associated with each category member has been constructed, the next step is to determine the aggregate of all other products associated with each category. To achieve this objective, a new calculated member, [Product].[Product].[All].[All Other Products], is defined as the Aggregate of all products less those included in the [Top Items] set. Again, an EXISTING declarative is necessary so that the [Product].[Product].[Product].Members set is evaluated under the context of the current Category member.

MEMBER [Product].[Product].[All].[All Other Products] AS
        Aggregate( { EXISTING { [Product].[Product].[Product].Members } - [Top Items] } )

And lastly, to wrap the whole thing up, a set of [Top Items] and the [All Other Products] member is created and CrossJoin-ed with Categories along the rows. Adding the [All Products] member to the set just to verify that the results are correct verifies the approach.

WITH SET [Top Items] AS 
        Generate(
                { [Product].[Category].[Category].Members },
                TopCount(
                        EXISTING [Product].[Product].[Product].members,
                        2,
                        ( [Measures].[Reseller Sales Amount] )
                )
        )
MEMBER [Product].[Product].[All].[All Other Products] AS
        Aggregate( { EXISTING { [Product].[Product].[Product].Members } - [Top Items] } )
SELECT        { 
                [Measures].[Reseller Sales Amount] 
        } ON COLUMNS,
        {
                { [Product].[Category].[Category].Members }
                * {
                        [Top Items], 
                        [Product].[Product].[All Other Products], 
                        [Product].[Product].[All] 
                } 
        } ON ROWS
FROM        [Adventure Works]

Voila! And very efficiently as well.

Technorati Tags: ,,,

Categories: MDX