Adding Subtotals using GROUPING SETS
I recently had to convert an existing report constructed against SQL Server 2000 to our new SQL Server 2008 dimensional data warehouse. The output of the report procedure was a single set of aggregated totals by different levels of the date dimension. After reviewing the code for some time, which consisted of several pages of stream of consciousness SQL and the creation of a multitude of temp tables, I arrived at the conclusion that there just had to be a better way. That’s when I stumbled upon a little documented enhancement in SQL Server 2008 called GROUPING SETS.
Using the AdventureWorksDW sample database as a source, the use of GROUPING SETS lets us produce output like below using a single SQL statement.
How is this done? Lets first consider the case where we want to generate an aggregate by month. Using a GROUP BY statement, this result set is produced by the following:
SELECT ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName,
SUM(fact.SalesAmount) AS SalesAmount
FROM dbo.FactInternetSales fact
INNER JOIN dbo.DimTime ordDate
ON fact.OrderDateKey = ordDate.TimeKey
GROUP BY
ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName
SELECT ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName,
SUM(fact.SalesAmount) AS SalesAmount
FROM dbo.FactInternetSales fact
INNER JOIN dbo.DimTime ordDate
ON fact.OrderDateKey = ordDate.TimeKey
GROUP BY
GROUPING SETS (
(
ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName
)
)
The two statements produce the same exact output. However, the use of GROUPING SETS lets me specify another vector aggregate, say just Year and Quarter like below
SELECT ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName,
SUM(fact.SalesAmount) AS SalesAmount
FROM dbo.FactInternetSales fact
INNER JOIN dbo.DimTime ordDate
ON fact.OrderDateKey = ordDate.TimeKey
GROUP BY
GROUPING SETS (
(
ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName
),
(
ordDate.CalendarYear,
ordDate.CalendarQuarter
)
)
This statement produces aggregates at two different levels, Month and Quarter, of the time dimension.
To add Yearly totals and GrandTotals, two new tuples are added to the GROUPING SETS function, (ordDate.CalendarYear), () so that the final statement looks like the following.
SELECT ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName,
SUM(fact.SalesAmount) AS SalesAmount
FROM dbo.FactInternetSales fact
INNER JOIN dbo.DimTime ordDate
ON fact.OrderDateKey = ordDate.TimeKey
GROUP BY
GROUPING SETS(
(
ordDate.CalendarYear,
ordDate.CalendarQuarter,
ordDate.MonthNumberOfYear,
ordDate.EnglishMonthName
), — Month summary
(
ordDate.CalendarYear,
ordDate.CalendarQuarter
), — Quarter summary
(
ordDate.CalendarYear
), — Year summary
() — Grand Totals
)
Quite beautiful.
Modifying a Huge Table and SQL Server Partition Sliding
It’s been quite some time since I last posted. Since last time, I’ve moved my blog to WordPress as Live Spaces will be no more. Not that I’ve not had ideas. Just not been inspired to sit down and look at a computer after staring at a monitor for hours all day long. But today is different I guess.
I make the point of scripting out all objects in my data warehouse and saving the object definitions in VSS. For table objects, one can’t just drop the object and recreate otherwise all the existing data gets dropped also. So for table objects, I use the following formula: (I use a template that I’ll post on my SkyDrive account if anyone is interested.)
- Drop all indexes on the table,
- Drop all constraints including the primary key constraint,
- Create a table with the modified structure with a _NEW suffix appended to the end (i.e. TableName_NEW)
- Migrate the data from the existing table to the new table (i.e INSERT INTO TableName_NEW (……) SELECT ….. FROM TableName)
- Check rowcounts in both tables and if equal, drop the old table and rename the new table (DROP TABLE TableName; EXEC sp_rename ‘TableName_NEW’, ‘TableName’
- Recreate constraints and indexes.
This approach has worked well until recently. In this particular situation, I needed to add another dimension reference to a fact table containing several millions of rows. The outline above works well except for the step copying data from the existing to the new table. As written above, this step is logged and after several hours, eventually fills the transaction log (I’m guessing as I didn’t actually have the patience to attempt). A slightly modified approach was needed. Following an approach used by SSIS and taking advantage of partitioning, instead of copying all the data at once, the following script was used:
DECLARE @sSQL varchar(8000);
DECLARE @iPartitionNo int;
DECLARE @iLeftBoundary int;
DECLARE @iRightBoundary int;
DECLARE @sFileGroupName varchar(10);
SET @iLeftBoundary = 0;
DECLARE cPartitions INSENSITIVE CURSOR FOR
SELECT CAST(prv.boundary_id AS int) AS PartitionNo,
CAST(prv.value AS int) AS LeftPartitionBoundary,
CAST(ds.name AS varchar(10)) AS FileGroupName
FROM sys.partition_range_values prv
INNER JOIN sys.partition_functions pfunc
ON prv.function_id = pfunc.function_id
INNER JOIN sys.partition_schemes psch
ON prv.function_id = psch.function_id
INNER JOIN sys.destination_data_spaces dsp
ON psch.data_space_id = dsp.partition_scheme_id
AND prv.boundary_id = dsp.destination_id
INNER JOIN sys.data_spaces ds
ON dsp.data_space_id = ds.data_space_id
WHERE pfunc.[name] = ‘MyPartitionFunction’;
OPEN cPartitions;
FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;
WHILE (@@FETCH_STATUS = 0) BEGIN
IF EXISTS (
SELECT *
FROM sys.tables tbl
INNER JOIN sys.schemas sch
ON tbl.schema_id = sch.schema_id
WHERE tbl.name = ‘MyTableTemp’
AND sch.name = ‘dbo’
) BEGIN
DROP TABLE dbo.MyTableTemp;
END;
SET @sSQL = ‘
SELECT FieldList
INTO dbo.MyTableTemp
FROM dbo.MyTable
WHERE PartitioningField >= CAST(@iLeftBoundary AS varchar)+ ‘
AND PartitioningField <=’ + CAST(@iRightBoundary AS varchar)
EXEC(@sSQL);
SET @sSQL = ‘
ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_PK
PRIMARY KEY (PartitioningField) ON [‘ + @sFileGroupName + ‘]’;
EXEC(@sSQL);
SET @sSQL = ‘
ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_CK
CHECK ‘+ ‘(PartitioningField >= ‘ +
CAST(@iLeftBoundary AS varchar) + ‘
AND
PartitioningField <= ‘ +
CAST(@iRightBoundary AS varchar) +
‘)’;
EXEC(@sSQL);
SET @sSQL = ‘
ALTER TABLE dbo.MyTableTemp SWITCH TO dbo.MyTable_NEW
PARTITION ‘ + CAST(@iPartitionNo AS varchar);
EXEC(@sSQL);
SET @iLeftBoundary = @iRightBoundary + 1;
FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;
END;
CLOSE cPartitions;
DEALLOCATE cPartitions;
DECLARE @OldRowCount int
DECLARE @NewRowCount int
SELECT @OldRowCount = COUNT(*)
FROM dbo.MyTable
SELECT @NewRowCount = COUNT(*)
FROM dbo.MyTable_NEW
SELECT @OldRowCount, @NewRowCount
IF (@OldRowCount = @NewRowCount) BEGIN
DROP TABLE dbo.MyTable
EXEC sp_rename ‘dbo.MyTable_NEW’, ‘MyTable’
END ELSE BEGIN
DROP TABLE dbo.MyTable_NEW
END