Is SSAS Too Lenient on MDX Syntax?

October 3, 2011 4 comments

I started my career in IT as a Visual Basic application programmer. VB had the wonderful/horrible data type called a Variant. Variant data types were great in that you could store anything in a Variant. Variants were horrible because you could store anything as a Variant. Because you could store anything in a Variant, you never knew what you had. As I matured as a programmer, I grew to hate the Variant data type. I grew to hate scripting. I grew to love strongly typed languages. Applications written in strongly-typed languages are much easier to maintain, much easier to understand, and are much less likely to incur run-time errors that would have been discovered when the application was initially compiled.

Well, you may be asking yourself, what does this have to do with MDX? When it comes to interpreting and understanding an MDX statement, it just seems all functions and parameters don’t care much what you pass them. Everything’s a Variant and the parser will do it’s best job at artificial intelligence to interpret what you want it to return. And it just seems to me that building the parser in this way makes MDX much harder to understand rather than easier to construct.

For example, there was a recent post on the SSAS forum that inspired me to write this blog. Not so much because of the blog post itself. But because of the root cause of the problem. The issue posted was why was the following query acceptable,

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Product].[Product Categories].[Category].Members
        * [Date].[Calendar].[Calendar Year].[CY 2008]
    } ON ROWS
FROM    [Adventure Works]

but when you switch the [Product].[Product Categories].[Category].Members and [Date].[Calendar].[Calendar Year].[CY 2008] as below

SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    {
        [Customer].[Customer Geography].[City].&[Berkshire]&[ENG]
        * [Date].[Calendar].[Calendar Year].[CY 2008]
        * [Product].[Product Categories].[Category].Members
    } ON ROWS
FROM    [Adventure Works]
execution of the query results in an the following error: Query (5, 3) The * function expects a tuple set expression for the 1 argument. A string or numeric expression was used.

Why would should a small change make such a big difference? The question that probably should be asked is not so much why the second construct generates an error but more why the first construct does not. The CROSSJOIN function takes two sets as parameters. In the first construct, the first two parameters to the CROSSJOIN function are a member, [Customer].[Customer Geography].[City].&[Berkshire]&[ENG], and a set [Product].[Product Categories].[Category].Members. In this case, the parser decides since one of the parameters is a set, you really meant to execute a Crossjoin so it implicitly converts the member passed as the initial parameter to a set. In the second construct, the parser sees that both parameters passed to the Crossjoin are Members and neither is a set. In this case, it chokes like a cat with a hairball. If the first parameter is fully qualified as a set, { [Customer].[Customer Geography].[City].&[Berkshire]&[ENG] }, or the second parameter is qualified as a set, { [Date].[Calendar].[Calendar Year].[CY 2008] }, then the error is resolved.

Just seems to me is that when a function requires a set as a parameter, you should have to qualify the parameter as a set. When the function requires a tuple, you need to qualify it as a tuple. And when you need to pass a member, none of this passing a hierarchy and the default member is assumed. Just my opinion but these features in the language make MDX less tractable rather than more.

Categories: MDX

Setting up the Denali Client Machine

After the connection to the Tabular database was set up, I also wanted to test Crescent with a PowerPivot model created in the old fashioned way, by uploading an Excel workbook to a SharePoint PowerPivot gallery. So I downloaded the HelloWorldPicnic demo. I uploaded the included Excel file to the PowerPivot gallery and when I tried to create a Crescent report, I got the following error.

image

I thought if I reprocessed the original workbook and uploaded again that the error would be resolved. However, when I tried to launch the PowerPivot window, I got yet another error.

image

I had downloaded the PowerPivot add-in from the PowerPivot site. However, apparently that isn’t the correct version. To work with the Denali, I needed the Microsoft SQL Server Denali PowerPivot for Excel version. Once I downloaded and installed that version, I was able to launch the PowerPivot window and refresh the data. However, re-uploading that workbook to the PowerPivot gallery did not resolve the problem.

The way I ended up resolving the issue was to create a BISM connection to the PowerPivot workbook. However, I could not create it successfully using the method described in the Create a BISM Connection to a PowerPivot Workbook article available on MSDN. I had to create it using the same method I used to create my BISM connection to the tabular project that was described in my last blog post. When a PowerPivot model is deployed to SharePoint, that model is created as a database on the {ServerName}\PowerPivot instance of Analysis Services. (Yes, a full install of Denali will contain three named instances of Analysis Services as described in this blog post by Andrew Fryer). Connecting to that PowerPivot instance and expanding the Databases node in Object Explorer, I got the name of the database corresponding to my HelloWorldPicnicDenaliCTP3 workbook. I used a server name of {ServerName}\PowerPivot and that database name to create my BISM connection. Using that approach, I was able to use the Create “Crescent” Report option from the pull down menu available from the BISM connection to successfully launch Crescent.

Don’t know what my problem was or if this issue should be logged in Connect. Would be interested to know if anyone else experiences this problem.

Categories: SQL Server Denali

Installing Denali CTP 3 for Remote Access

If you’ve read the Getting up and running with SQL Server Denali for Business Intelligence blog that was posted last week and you’ve got as limited an exposure to SharePoint as I do, you’ve probably had the same question I had. Okay, I install SharePoint. I don’t run the SharePoint Products Configuration Wizard. I install SP1. Hey, when do I run the SharePoint Products Configuration Wizard? What’s not clear in that excellent blog post is that when installing the SQL Server PowerPivot for SharePoint feature on the Denali install, when the PowerPivot configuration tool is executed, much more than just PowerPivot is configured. Your entire SharePoint farm is configured when that tool is executed. The SharePoint configuration databases are created in the PowerPivot named instance of your Denali DB Engine so there’s no install of a SQL Server 2008 Express Edition database.

I also created a virtual network to give Denali a spin. That article assumes all applications and services will be installed and run on the same machine. I wanted to use a Windows 7 VM that already had Office Professional 2010 installed rather than have to install Office on my Denali box. To enable my Windows 7 VM to access the Denali services there were quite a few steps that needed to be performed afterwards that are not explained in that post. I’m hoping I’ve caught all of those issues and this post will help others create a similar environment successfully.

Enable Remote Access to the Default Instance of the Database Engine

By default, remote connections to all the Denali instances will be blocked. You can enable remote access by changing a few configuration settings. These settings include enabling TCP connections to the default instance of the database engine and opening port 1433 to allow remote access. My instructions assume that you followed the blog post mentioned above to the letter. You should have two instances of the database engine. The default instance which hosts the AdventureWorksDWDenali sample database (assuming that you’ve attached it), and a PowerPivot named instance which hosts your SharePoint configuration databases.

  • To enable the TCP/IP protocol,
    • Open the SQL Server Configuration Manager.
    • Expand the SQL Server Network Configuration Node if not already expanded and select the Protocols for MSSQLServer node on the left hand side.
    • Right-mouse click on the TCP/IP member and choose the Enable option from the popup menu.
  • To open port 1433, there are several references available on line that can describe it better than I can in this blog post. One is this video available on Technet.

Enable Access to Analysis Services

I should confess before going any further that I did not follow the instructions in the above blog post to the letter. I installed the Multidimensional server instance as the default and installed the new Tabular server on a named instance that I appropriately named Tabular. For some reason, named instances of Analysis Services are installed on a dynamic port by default. Since whatever port the Tabular instance is assigned will have to be opened in the Firewall, a dynamic port assignment is going to be a problem as it will be different port every time the service is restarted. To assign a fixed port, start SSMS and connect to the Tabular instance of Analysis Services. In the Object Explorer pane, right-mouse click on the node representing your instance and select the Properties option from the popup menu. In the Analysis Services Properties window, scroll down until you reach the Port window. Change the value from 0 to a fixed port, somewhere in the 49000 range is what I used and restart the service. Once you’ve changed to a fixed port, you need to open up the firewall. You’ll need to open the firewall to three ports, 2383 for the default instance, 2382 for the Browser service, and of course, the port that you’ve just assigned to the named instance of Analysis Services. For a better thorough description of the steps needed, see the Configure Windows Firewall for Analysis Services Access article on MSDN.

Setting Up to Give Project Crescent a Spin

I wanted to explore some of the new features of Denali so I worked through the Tabular Modeling Tutorial on MSDN from start to finish. The end result was that I now had a Tabular database to use as a source for a Crescent report. However, there were a few tasks that still had to be completed.

Once the connection to my Tabular database was created, to create a Crescent report, all that needs to be done is to select the Create “Crescent” Report option from the pull down menu on the BISM connection just created. Looking forward to playing with the new tool. Hopefully, I’ll blog about it in the future!

Categories: SQL Server Denali

SSAS–Who/What’s Killing My Server?

February 2, 2011 3 comments

When diagnosing a problem with the database engine, one of the first tools I use is to execute a sp_who2 command to determine who’s running what. Wouldn’t it be that nice if there was an equivalent command for SSAS. Using SSAS 2008 DMV’s (Yaniv Mor blog post), one can use linked servers to create a nearly equivalent command. I created a simple DB Engine stored procedure, sp_SSAS_who, that accomplishes what I needed it to do. Below is the script:

CREATE PROCEDURE dbo.sp_SSAS_who AS BEGIN

        -- *******************************************************

        -- Variable declarations

        -- *******************************************************

        SELECT        ssasSessions.SESSION_SPID AS [SPID],

                ssasSessions.SESSION_USER_NAME AS [User ID],

                ssasSessions.SESSION_CURRENT_DATABASE AS [Database],

                ROUND(CAST(ssasCommands.COMMAND_CPU_TIME_MS AS int)/1000, 2) AS [Command Time(in sec)],

                ssasCommands.COMMAND_START_TIME AS [Start],

                ssasCommands.COMMAND_END_TIME AS [End],

                ssasCommands.COMMAND_TEXT AS [MDX Command]

        FROM        OPENQUERY(SSAS_LINKED_SERVER, ') AS ssasSessions

                        SELECT        *

                        FROM        $system.DISCOVER_Sessions

                '

        LEFT        JOIN OPENQUERY(SSAS_LINKED_SERVER, ') AS ssasCommands

                        SELECT        *

                        FROM        $system.DISCOVER_Commands

                '

                ON        ssasSessions.SESSION_SPID = ssasCommands.SESSION_SPID

SSAS_LINKED_SERVER is a linked server to my SSAS instance. You’d need to create the definition of the linked server on your box before this would be useful. Ideally, I should parameterize this so that the name of the linked server is a variable. One of the bonuses of this approach is that I can execute sp_who2 and sp_SSAS_who in the same window to see what’s going on with both services.

I apologize if someone else has published this approach elsewhere. I Googled and can up with zilch so thought I would post.

Categories: SSAS Tags:

Adding Subtotals using GROUPING SETS

November 7, 2010 Leave a comment

I recently had to convert an existing report constructed against SQL Server 2000 to our new SQL Server 2008 dimensional data warehouse.  The output of the report procedure was a single set of aggregated totals by different levels of the date dimension. After reviewing the code for some time, which consisted of several pages of stream of consciousness SQL and the creation of a multitude of temp tables, I arrived at the conclusion that there just had to be a better way. That’s when I stumbled upon a little documented enhancement in SQL Server 2008 called GROUPING SETS.

Using the AdventureWorksDW sample database as a source, the use of GROUPING SETS lets us produce output like below using a single SQL statement.

SubTotalOutput

How is this done? Lets first consider the case where we want to generate an aggregate by month. Using a GROUP BY statement, this result set is produced by the following:

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName

 

This statement simply produces a single vector aggregate with CalendarYear, CalendarQuarter, MonthNumberOfYear, and EnglishMonthName as coordinates. I can rewrite this same statement using the new GROUPING SETS function by passing in the same coordinates to a GROUPING SETS statement.
 

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS (

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            )

        )

The two statements produce the same exact output. However, the use of GROUPING SETS lets me specify another vector aggregate,  say just Year and Quarter like below

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS (

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            ),

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter

            )

      )

This statement produces aggregates at two different levels, Month and Quarter, of the time dimension.

MonthAndQuarterAggregate

To add Yearly totals and GrandTotals, two new tuples are added to the GROUPING SETS function, (ordDate.CalendarYear), () so that the final statement looks like the following.

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS(

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            ),          — Month summary

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter

            ),          — Quarter summary

            (

                  ordDate.CalendarYear

            ),          — Year summary

            ()          — Grand Totals

      )

Quite beautiful.

Modifying a Huge Table and SQL Server Partition Sliding

November 2, 2010 2 comments

It’s been quite some time since I last posted. Since last time, I’ve moved my blog to WordPress as Live Spaces will be no more. Not that I’ve not had ideas. Just not been inspired to sit down and look at a computer after staring at a monitor for hours all day long. But today is different I guess.

I make the point of scripting out all objects in my data warehouse and saving the object definitions in VSS. For table objects, one can’t just drop the object and recreate otherwise all the existing data gets dropped also. So for table objects, I use the following formula: (I use a template that I’ll post on my SkyDrive account if anyone is interested.)

  • Drop all indexes on the table,
  • Drop all constraints including the primary key constraint,
  • Create a table with the modified structure with a _NEW suffix appended to the end (i.e. TableName_NEW)
  • Migrate the data from the existing table to the new table (i.e INSERT INTO TableName_NEW (……) SELECT ….. FROM TableName)
  • Check rowcounts in both tables and if equal, drop the old table and rename the new table (DROP TABLE TableName; EXEC sp_rename ‘TableName_NEW’, ‘TableName’
  • Recreate constraints and indexes.

This approach has worked well until recently. In this particular situation, I needed to add another dimension reference to a fact table containing several millions of rows. The outline above works well except for the step copying data from the existing to the new table. As written above, this step is logged and after several hours, eventually fills the transaction log (I’m guessing as I didn’t actually have the patience to attempt). A slightly modified approach was needed. Following an approach used by SSIS and taking advantage of partitioning, instead of copying all the data at once, the following script was used:

DECLARE @sSQL varchar(8000);

DECLARE @iPartitionNo int;

DECLARE @iLeftBoundary int;

DECLARE @iRightBoundary int;

DECLARE @sFileGroupName varchar(10);

 

SET @iLeftBoundary = 0;

DECLARE cPartitions INSENSITIVE CURSOR FOR    

       SELECT  CAST(prv.boundary_id AS int) AS PartitionNo,

               CAST(prv.value AS int) AS LeftPartitionBoundary,

               CAST(ds.name AS varchar(10)) AS FileGroupName

       FROM    sys.partition_range_values prv

       INNER   JOIN sys.partition_functions pfunc

              ON      prv.function_id = pfunc.function_id

       INNER   JOIN sys.partition_schemes psch

              ON      prv.function_id = psch.function_id

       INNER   JOIN sys.destination_data_spaces dsp

              ON      psch.data_space_id = dsp.partition_scheme_id

              AND     prv.boundary_id = dsp.destination_id

       INNER   JOIN sys.data_spaces ds

              ON      dsp.data_space_id = ds.data_space_id

       WHERE   pfunc.[name]            = ‘MyPartitionFunction’;

OPEN cPartitions;

FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

WHILE (@@FETCH_STATUS = 0) BEGIN

       IF EXISTS (

              SELECT  *

              FROM    sys.tables tbl

              INNER   JOIN sys.schemas sch

                      ON      tbl.schema_id = sch.schema_id

              WHERE   tbl.name            = ‘MyTableTemp’

                AND   sch.name            = ‘dbo’

       ) BEGIN

              DROP TABLE dbo.MyTableTemp;

       END;

      

       SET @sSQL =

              SELECT  FieldList

              INTO    dbo.MyTableTemp

              FROM    dbo.MyTable

               WHERE   PartitioningField >= CAST(@iLeftBoundary AS varchar)+    

                 AND   PartitioningField <=’ + CAST(@iRightBoundary AS varchar)

          

       EXEC(@sSQL);

      

       SET @sSQL =

       ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_PK

       PRIMARY KEY (PartitioningField) ON [‘ + @sFileGroupName + ‘]’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_CK

              CHECK ‘+ ‘(PartitioningField >= ‘ +

                      CAST(@iLeftBoundary AS varchar) +

              AND

              PartitioningField <= ‘ +

                      CAST(@iRightBoundary AS varchar) +

              ‘)’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp SWITCH TO dbo.MyTable_NEW

              PARTITION ‘ + CAST(@iPartitionNo AS varchar);

       EXEC(@sSQL);

      

       SET @iLeftBoundary = @iRightBoundary + 1;

       FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

END;

CLOSE cPartitions;

DEALLOCATE cPartitions;

 

DECLARE @OldRowCount int

DECLARE @NewRowCount int

 

SELECT @OldRowCount = COUNT(*)

FROM   dbo.MyTable

 

SELECT @NewRowCount = COUNT(*)

FROM   dbo.MyTable_NEW

      

SELECT @OldRowCount, @NewRowCount

      

IF (@OldRowCount = @NewRowCount) BEGIN

       DROP TABLE dbo.MyTable

       EXEC sp_rename ‘dbo.MyTable_NEW’, ‘MyTable’

END ELSE BEGIN

       DROP TABLE dbo.MyTable_NEW

END

Those who’ve worked with me must be floored that I’m posting TSQL code using a cursor. I normally break into hives when I see a cursor being used. But in this case, short of migrating to SSIS, it was the best I could do.
 
The SELECT… INTO dbo.TableNameTemp operation is not logged. So instead of taking minutes to run, it takes seconds. This step on my lightweight development server took about 30 seconds to copy over 10 million rows with over 30 columns (all numeric, of course. It is a fact table after all) Once the temp table was created, in order to be slid into the _NEW table, the source and destination partitions have to be located on the same file group and the two partitions have to have the same constraints applied. By adding a primary  constraint to the temp table and creating it on the same partition as the target table, both conditions are satisfied.
 
If anyone finds themselves in the same predicament, I hope they find this approach useful.

Generate, Top Count, and SubTotals Revisited

September 12, 2010 1 comment

In my previous post, I used Generate, TopCount, and EXISTING to produce a list of the top products and the total for all other products associated with a category. The problem this post deals with is only slightly more complicated. Rather than address the top products associated with a product category, in this example, we want to display the top products associated with a sales region and in addition to the top products, we also want to display the totals for the current region’s top 5 products. The approach used in the previous post took advantage of the fact that each product was associated with one and only one product category. Since a product can be purchased in more than one region, a slightly different approach has to be used here.  Like before, the Generate statement combined with the TopCount function can be used to produce the top products for each region, [Region Top Products]. However, unlike the previous post, the sales region has to be included in the second argument of the Generate and included in the resultant set to identify the context of the current region since a single product can be returned more than once.

WITH SET [Region Top Products] AS
    Generate(
        { [Sales Territory].[Sales Territory].[Region].Members },
        [Sales Territory].[Sales Territory].CurrentMember
        * TopCount( 
            { [Product].[Product Categories].[Product].Members },
            5,
            ( [Measures].[Internet Sales Amount], [Sales Territory].[Sales Territory].CurrentMember )
        )
    )

The region subtotals are also slightly more difficult to return. Like before, a new calculated member in the [Product].[Product Categories] hierarchy needs to be created to represent the top 5 product totals. However, once again since both the identity of the region AND the product is relevant, both have to be accounted for. To do so, this region subtotal member is defined as an aggregate over the intersection of the [Region Top Products] set defined above and the crossjoin of the current sales region and products. Since this crossjoin must be re-evaluated in the context of the current sales region in each iteration, the EXISTING declarative must once again be used. The resultant definition is given below.

MEMBER [Product].[Product Categories].[All].[Top 5 Product Total] AS
    Aggregate(
        Intersect(
            [Sales Territory].[Sales Territory].CurrentMember
            * EXISTING { [Product].[Product Categories].[Product].Members },
            [Region Top Products]
        ),
        [Measures].[Internet Sales Amount]
    )

And lastly, to include the subtotals together with the top product list, two sets are combined and ordered by the sales region name. The complete final query is returned below.

WITH SET [Region Top Products] AS
    Generate(
        { [Sales Territory].[Sales Territory].[Region].Members },
        [Sales Territory].[Sales Territory].CurrentMember
        * TopCount( 
            { [Product].[Product Categories].[Product].Members },
            5,
            ( [Measures].[Internet Sales Amount], [Sales Territory].[Sales Territory].CurrentMember )
        )
    )
MEMBER [Product].[Product Categories].[All].[Top 5 Product Total] AS
    Aggregate(
        Intersect(
            [Sales Territory].[Sales Territory].CurrentMember
            * EXISTING { [Product].[Product Categories].[Product].Members },
            [Region Top Products]
        ),
        [Measures].[Internet Sales Amount]
    )
SELECT    {
        [Measures].[Internet Sales Amount]
    } ON COLUMNS,
    Order(
        [Region Top Products]
        + CrossJoin(
            { [Sales Territory].[Sales Territory].[Region].Members },
            { [Product].[Product Categories].[Top 5 Product Total] }
        ),
        [Sales Territory].[Sales Territory].CurrentMember.MEMBER_CAPTION,
        BDESC
    ) ON ROWS
FROM    [Adventure Works]
Categories: MDX