Data Warehouse

A DataWarehouse is a database capturing transactions or summaries of transactions from one or more SourceSystem. The purpose is mainly for analysis of history. A good model for a DataWarehouse is a StarSchema.

Having done a couple of these, may I invite a discussion of ODS (OperationalDataStore) -- i.e. a really big historical database -- and the "classical" Data Warehouse -- i.e. a series of snapshots (marts) of dataset states (often denormalized) frozen in their moment and indexed for research and the inevitable contorted reporting requirements needed to serve management fads.

I've worked with both. I find that it is common to refer to a (really big) ODS as a warehouse, and I guess I'm sort of okay with that, in that it allows me to say "Data Warehouse" on my resume. To be fair, we really did do warehouse things with the ODS, so that's okay then.


-- GarryHamilton

I am not crazy about the term "warehouse" but I get it.

I like the idea of an OperationalDataStore being the current edge of the DataWarehouse. Most applications just need the ODS. Some applications need to go to the warehouse for more data.


On revisiting this term, I find that the term "warehouse" is, unhappily, too nearly descriptive of the use that the data actually gets.

There is this fascination (fixation?) with the idea of "having all the data ever" in one place, and treating it like an n-dimensional "cube" that can be "cut" along any axis to yield some kind of "interesting" correlation (this is the subject of many DataMining exercises).

The DataWarehouse is one place where both data manipulation skills and domain knowledge, along with an active imagination, are necessary to get anything meaningful out of it. If you don't know the domain, you won't know what questions to ask, nor what the answers mean, or even if the answers are meaningful at all. If you can't manipulate the data (it helps to be a magician here), then you may know what question to ask, but not how to ask it. And, of course, even domain expertise and data magic won't help if you can't conceive of asking the unasked.

And so the DataWarehouse is often unused or under-used. Specific department heads ask for specific slices of data, staged over time, to prove the point du jour. Controllers and treasurers use different slices to confirm their suspicions or crow over their predictive skills.

The thing that seldom gets asked is, "what's in there that we haven't thought of, that will tell us something new and useful?" Remember, for example, the discovery by a convenience store chain that diapers (nappies) and beer appeared together on a surprising number of receipts, leading to a shift in where merchandise was displayed. As novel as that finding was, the question was not really that far from the "edge" of sales thinking. What might they have discovered if an "off-the-edge" question had been asked?

I wonder if the term itself ("warehouse") tends to limit the usefulness of the vast quantities of historical stuff sitting in crates in subterranean drive arrays.

After all, who the heck goes "mining" in a bloomin' warehouse? Maybe a different term would encourage better engagement with all this otherwise just-archival information.

-- GarryHamilton

View edit of November 9, 2014 or FindPage with title or text search