Archive

Archive for November, 2010

Adding Subtotals using GROUPING SETS

November 7, 2010 Leave a comment

I recently had to convert an existing report constructed against SQL Server 2000 to our new SQL Server 2008 dimensional data warehouse.  The output of the report procedure was a single set of aggregated totals by different levels of the date dimension. After reviewing the code for some time, which consisted of several pages of stream of consciousness SQL and the creation of a multitude of temp tables, I arrived at the conclusion that there just had to be a better way. That’s when I stumbled upon a little documented enhancement in SQL Server 2008 called GROUPING SETS.

Using the AdventureWorksDW sample database as a source, the use of GROUPING SETS lets us produce output like below using a single SQL statement.

SubTotalOutput

How is this done? Lets first consider the case where we want to generate an aggregate by month. Using a GROUP BY statement, this result set is produced by the following:

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName

 

This statement simply produces a single vector aggregate with CalendarYear, CalendarQuarter, MonthNumberOfYear, and EnglishMonthName as coordinates. I can rewrite this same statement using the new GROUPING SETS function by passing in the same coordinates to a GROUPING SETS statement.
 

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS (

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            )

        )

The two statements produce the same exact output. However, the use of GROUPING SETS lets me specify another vector aggregate,  say just Year and Quarter like below

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS (

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            ),

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter

            )

      )

This statement produces aggregates at two different levels, Month and Quarter, of the time dimension.

MonthAndQuarterAggregate

To add Yearly totals and GrandTotals, two new tuples are added to the GROUPING SETS function, (ordDate.CalendarYear), () so that the final statement looks like the following.

SELECT  ordDate.CalendarYear,

        ordDate.CalendarQuarter,

        ordDate.MonthNumberOfYear,

        ordDate.EnglishMonthName,

        SUM(fact.SalesAmount) AS SalesAmount

FROM    dbo.FactInternetSales fact

INNER   JOIN dbo.DimTime ordDate

        ON   fact.OrderDateKey = ordDate.TimeKey

GROUP BY

        GROUPING SETS(

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter,

                  ordDate.MonthNumberOfYear,

                  ordDate.EnglishMonthName

            ),          — Month summary

            (

                  ordDate.CalendarYear,

                  ordDate.CalendarQuarter

            ),          — Quarter summary

            (

                  ordDate.CalendarYear

            ),          — Year summary

            ()          — Grand Totals

      )

Quite beautiful.

Modifying a Huge Table and SQL Server Partition Sliding

November 2, 2010 2 comments

It’s been quite some time since I last posted. Since last time, I’ve moved my blog to WordPress as Live Spaces will be no more. Not that I’ve not had ideas. Just not been inspired to sit down and look at a computer after staring at a monitor for hours all day long. But today is different I guess.

I make the point of scripting out all objects in my data warehouse and saving the object definitions in VSS. For table objects, one can’t just drop the object and recreate otherwise all the existing data gets dropped also. So for table objects, I use the following formula: (I use a template that I’ll post on my SkyDrive account if anyone is interested.)

  • Drop all indexes on the table,
  • Drop all constraints including the primary key constraint,
  • Create a table with the modified structure with a _NEW suffix appended to the end (i.e. TableName_NEW)
  • Migrate the data from the existing table to the new table (i.e INSERT INTO TableName_NEW (……) SELECT ….. FROM TableName)
  • Check rowcounts in both tables and if equal, drop the old table and rename the new table (DROP TABLE TableName; EXEC sp_rename ‘TableName_NEW’, ‘TableName’
  • Recreate constraints and indexes.

This approach has worked well until recently. In this particular situation, I needed to add another dimension reference to a fact table containing several millions of rows. The outline above works well except for the step copying data from the existing to the new table. As written above, this step is logged and after several hours, eventually fills the transaction log (I’m guessing as I didn’t actually have the patience to attempt). A slightly modified approach was needed. Following an approach used by SSIS and taking advantage of partitioning, instead of copying all the data at once, the following script was used:

DECLARE @sSQL varchar(8000);

DECLARE @iPartitionNo int;

DECLARE @iLeftBoundary int;

DECLARE @iRightBoundary int;

DECLARE @sFileGroupName varchar(10);

 

SET @iLeftBoundary = 0;

DECLARE cPartitions INSENSITIVE CURSOR FOR    

       SELECT  CAST(prv.boundary_id AS int) AS PartitionNo,

               CAST(prv.value AS int) AS LeftPartitionBoundary,

               CAST(ds.name AS varchar(10)) AS FileGroupName

       FROM    sys.partition_range_values prv

       INNER   JOIN sys.partition_functions pfunc

              ON      prv.function_id = pfunc.function_id

       INNER   JOIN sys.partition_schemes psch

              ON      prv.function_id = psch.function_id

       INNER   JOIN sys.destination_data_spaces dsp

              ON      psch.data_space_id = dsp.partition_scheme_id

              AND     prv.boundary_id = dsp.destination_id

       INNER   JOIN sys.data_spaces ds

              ON      dsp.data_space_id = ds.data_space_id

       WHERE   pfunc.[name]            = ‘MyPartitionFunction’;

OPEN cPartitions;

FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

WHILE (@@FETCH_STATUS = 0) BEGIN

       IF EXISTS (

              SELECT  *

              FROM    sys.tables tbl

              INNER   JOIN sys.schemas sch

                      ON      tbl.schema_id = sch.schema_id

              WHERE   tbl.name            = ‘MyTableTemp’

                AND   sch.name            = ‘dbo’

       ) BEGIN

              DROP TABLE dbo.MyTableTemp;

       END;

      

       SET @sSQL =

              SELECT  FieldList

              INTO    dbo.MyTableTemp

              FROM    dbo.MyTable

               WHERE   PartitioningField >= CAST(@iLeftBoundary AS varchar)+    

                 AND   PartitioningField <=’ + CAST(@iRightBoundary AS varchar)

          

       EXEC(@sSQL);

      

       SET @sSQL =

       ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_PK

       PRIMARY KEY (PartitioningField) ON [‘ + @sFileGroupName + ‘]’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_CK

              CHECK ‘+ ‘(PartitioningField >= ‘ +

                      CAST(@iLeftBoundary AS varchar) +

              AND

              PartitioningField <= ‘ +

                      CAST(@iRightBoundary AS varchar) +

              ‘)’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp SWITCH TO dbo.MyTable_NEW

              PARTITION ‘ + CAST(@iPartitionNo AS varchar);

       EXEC(@sSQL);

      

       SET @iLeftBoundary = @iRightBoundary + 1;

       FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

END;

CLOSE cPartitions;

DEALLOCATE cPartitions;

 

DECLARE @OldRowCount int

DECLARE @NewRowCount int

 

SELECT @OldRowCount = COUNT(*)

FROM   dbo.MyTable

 

SELECT @NewRowCount = COUNT(*)

FROM   dbo.MyTable_NEW

      

SELECT @OldRowCount, @NewRowCount

      

IF (@OldRowCount = @NewRowCount) BEGIN

       DROP TABLE dbo.MyTable

       EXEC sp_rename ‘dbo.MyTable_NEW’, ‘MyTable’

END ELSE BEGIN

       DROP TABLE dbo.MyTable_NEW

END

Those who’ve worked with me must be floored that I’m posting TSQL code using a cursor. I normally break into hives when I see a cursor being used. But in this case, short of migrating to SSIS, it was the best I could do.
 
The SELECT… INTO dbo.TableNameTemp operation is not logged. So instead of taking minutes to run, it takes seconds. This step on my lightweight development server took about 30 seconds to copy over 10 million rows with over 30 columns (all numeric, of course. It is a fact table after all) Once the temp table was created, in order to be slid into the _NEW table, the source and destination partitions have to be located on the same file group and the two partitions have to have the same constraints applied. By adding a primary  constraint to the temp table and creating it on the same partition as the target table, both conditions are satisfied.
 
If anyone finds themselves in the same predicament, I hope they find this approach useful.