Archive

Posts Tagged ‘SQL Server’

Modifying a Huge Table and SQL Server Partition Sliding

November 2, 2010 2 comments

It’s been quite some time since I last posted. Since last time, I’ve moved my blog to WordPress as Live Spaces will be no more. Not that I’ve not had ideas. Just not been inspired to sit down and look at a computer after staring at a monitor for hours all day long. But today is different I guess.

I make the point of scripting out all objects in my data warehouse and saving the object definitions in VSS. For table objects, one can’t just drop the object and recreate otherwise all the existing data gets dropped also. So for table objects, I use the following formula: (I use a template that I’ll post on my SkyDrive account if anyone is interested.)

  • Drop all indexes on the table,
  • Drop all constraints including the primary key constraint,
  • Create a table with the modified structure with a _NEW suffix appended to the end (i.e. TableName_NEW)
  • Migrate the data from the existing table to the new table (i.e INSERT INTO TableName_NEW (……) SELECT ….. FROM TableName)
  • Check rowcounts in both tables and if equal, drop the old table and rename the new table (DROP TABLE TableName; EXEC sp_rename ‘TableName_NEW’, ‘TableName’
  • Recreate constraints and indexes.

This approach has worked well until recently. In this particular situation, I needed to add another dimension reference to a fact table containing several millions of rows. The outline above works well except for the step copying data from the existing to the new table. As written above, this step is logged and after several hours, eventually fills the transaction log (I’m guessing as I didn’t actually have the patience to attempt). A slightly modified approach was needed. Following an approach used by SSIS and taking advantage of partitioning, instead of copying all the data at once, the following script was used:

DECLARE @sSQL varchar(8000);

DECLARE @iPartitionNo int;

DECLARE @iLeftBoundary int;

DECLARE @iRightBoundary int;

DECLARE @sFileGroupName varchar(10);

 

SET @iLeftBoundary = 0;

DECLARE cPartitions INSENSITIVE CURSOR FOR    

       SELECT  CAST(prv.boundary_id AS int) AS PartitionNo,

               CAST(prv.value AS int) AS LeftPartitionBoundary,

               CAST(ds.name AS varchar(10)) AS FileGroupName

       FROM    sys.partition_range_values prv

       INNER   JOIN sys.partition_functions pfunc

              ON      prv.function_id = pfunc.function_id

       INNER   JOIN sys.partition_schemes psch

              ON      prv.function_id = psch.function_id

       INNER   JOIN sys.destination_data_spaces dsp

              ON      psch.data_space_id = dsp.partition_scheme_id

              AND     prv.boundary_id = dsp.destination_id

       INNER   JOIN sys.data_spaces ds

              ON      dsp.data_space_id = ds.data_space_id

       WHERE   pfunc.[name]            = ‘MyPartitionFunction’;

OPEN cPartitions;

FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

WHILE (@@FETCH_STATUS = 0) BEGIN

       IF EXISTS (

              SELECT  *

              FROM    sys.tables tbl

              INNER   JOIN sys.schemas sch

                      ON      tbl.schema_id = sch.schema_id

              WHERE   tbl.name            = ‘MyTableTemp’

                AND   sch.name            = ‘dbo’

       ) BEGIN

              DROP TABLE dbo.MyTableTemp;

       END;

      

       SET @sSQL =

              SELECT  FieldList

              INTO    dbo.MyTableTemp

              FROM    dbo.MyTable

               WHERE   PartitioningField >= CAST(@iLeftBoundary AS varchar)+    

                 AND   PartitioningField <=’ + CAST(@iRightBoundary AS varchar)

          

       EXEC(@sSQL);

      

       SET @sSQL =

       ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_PK

       PRIMARY KEY (PartitioningField) ON [‘ + @sFileGroupName + ‘]’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp ADD CONSTRAINT MyTableTemp_CK

              CHECK ‘+ ‘(PartitioningField >= ‘ +

                      CAST(@iLeftBoundary AS varchar) +

              AND

              PartitioningField <= ‘ +

                      CAST(@iRightBoundary AS varchar) +

              ‘)’;

       EXEC(@sSQL);

      

       SET @sSQL =

              ALTER TABLE dbo.MyTableTemp SWITCH TO dbo.MyTable_NEW

              PARTITION ‘ + CAST(@iPartitionNo AS varchar);

       EXEC(@sSQL);

      

       SET @iLeftBoundary = @iRightBoundary + 1;

       FETCH cPartitions INTO @iPartitionNo, @iRightBoundary, @sFileGroupName;

 

END;

CLOSE cPartitions;

DEALLOCATE cPartitions;

 

DECLARE @OldRowCount int

DECLARE @NewRowCount int

 

SELECT @OldRowCount = COUNT(*)

FROM   dbo.MyTable

 

SELECT @NewRowCount = COUNT(*)

FROM   dbo.MyTable_NEW

      

SELECT @OldRowCount, @NewRowCount

      

IF (@OldRowCount = @NewRowCount) BEGIN

       DROP TABLE dbo.MyTable

       EXEC sp_rename ‘dbo.MyTable_NEW’, ‘MyTable’

END ELSE BEGIN

       DROP TABLE dbo.MyTable_NEW

END

Those who’ve worked with me must be floored that I’m posting TSQL code using a cursor. I normally break into hives when I see a cursor being used. But in this case, short of migrating to SSIS, it was the best I could do.
 
The SELECT… INTO dbo.TableNameTemp operation is not logged. So instead of taking minutes to run, it takes seconds. This step on my lightweight development server took about 30 seconds to copy over 10 million rows with over 30 columns (all numeric, of course. It is a fact table after all) Once the temp table was created, in order to be slid into the _NEW table, the source and destination partitions have to be located on the same file group and the two partitions have to have the same constraints applied. By adding a primary  constraint to the temp table and creating it on the same partition as the target table, both conditions are satisfied.
 
If anyone finds themselves in the same predicament, I hope they find this approach useful.