## Do I really need a data warehouse?

I recently received a link to a BI Requirements for IT white paper from Tableau Software and SQL Server magazine. Though most of the points I strongly agree with, the paper seems to suggest that building a data warehouse is simply unnecessary. Other than the fact that operational systems that run the business and the decision support systems from which the business is analyzed should not share the same physical resources, there is a fundamental mathematical reason data is stored redundantly in a well-architected IT data infrastructure.

Sorry to sound like a pure academic here, but it comes with being in school way too long. And I do mean, way, way, way too long. Like my academic training in chemical physics, data warehousing is also essentially applied elementary linear algebra. So as a simple example, consider the description of a point in space as shown in the figure. The most familiar representation is in a Cartesian or xyz coordinate system. In this case, a point is defined by specifying the distance along the x axis, along with the distance along the y-axis, and the height along the z-axis. In a real world implementation, the x and y coordinates would represent the distance along two walls of a room and the z-axis the distance from the floor.

However, a Cartesian coordinate system is not the only possible representation. Spherical coordinates could also be used. In this case, the point is defined by specifying the distance of a radial vector from the origin, r, and two angles, Θ and Φ. In both Cartesian and spherical coordinates, all three axes are orthogonal or mutually independent. Additionally, a linear transformation can be applied to convert one representation to the other.

Is one representation better than another? The simple answer is it depends. If the physical problem being addressed is solving an equation of motion subjected to a linear force, Cartesian coordinates would be more appropriate. Likewise, if the problem being addressed is the description of a Kepler orbit, spherical coordinates are more appropriate.

So what does this have to do with business systems? In an oversimplified description, there are two principle data models used in business applications. A normalized model is applied when there should be no redundancies in storage and is most appropriate for capturing business transactions. It’s generally a poor model when applied to addressing analysis of a business due to the complexity of the model and the amount of physical resources needed to address joining resources from disparate data sources. Therefore, rather than use the same data model for analyzing a business, a dimensional model as described by Ralph Kimball is more appropriate. And, as a linear transformation can be applied to convert from a Cartesian to a spherical coordinate system, a linear transformation (SQL Server Integration Services if using the Microsoft BI tool stack) can be applied to transform between the normal model and a dimensional model.

I know this is long and boring. However, to answer the question posed in the title, drawing an analogy with a simple physical problem, a data warehouse is absolutely essential.