What’s not to love about a good enterprise analytical data consolidation project? You can just see the PowerPoint slide, can’t you? On the left, boxes like stars in a clear night sky, each representing some number of analytical repositories or data sources. On the right, one nice big box. Neat, clean, and easy for management to understand. Mess on the left: bad. Tidiness on the right: good. Go forth and do.

If you’ve been working with corporate data for any length of time, you’ve probably experienced this. Maybe it was following a corporate acquisition. Got to get all of the new company’s data into the warehouse. Maybe it was part of an application rationalization initiative. Got to see all the data those source systems are generating. Maybe it was an attempt to consolidate analytical repositories or tools. Got to get all of the data under the same roof. 

And fast. Just get it all in there and we’ll figure it out later.

No sweat. We’ve been hauling around data for years. We have processes and applications and infrastructure that facilitate data movement. Plus, we get to grow our enterprise analytics estate. More data. Larger environment. More teams coming to us to get stuff. And best of all, we get to claim that we retired a bunch of other repositories. Dollars to the bottom line. Chicken lunches all around. All of the incentives are aligned.

Except for the most important one.

We mistakenly believe that merely bringing the data together will somehow make it more usable.

I suppose if it’s all together in the same database it’s easier to access. But that’s not the same thing as making it more usable. Usability requires understanding. And in these kinds of projects we too often just start shoveling data higgledy-piggledy as fast as we can without any thought as to what it is or how it’s going to be used.

Of course, there’s always time for each individual user to complete a week-long research project to figure out what the data contains before they use it. There’s always time to rewrite the reports that were wrong because the data was misunderstood. There’s always time to clean up the mess that gets created. There’s always time for the next rationalization project.

Consolidation projects are similar to the migration projects I discussed a couple weeks ago (Part 1, Part 2): you end up doing a whole lot of work, but you don’t really move the company forward much. The problem is that the moving the company forward part often doesn’t matter. That’s more difficult to measure. You can put up a big board with a tally of the tables or environments or applications moved and watch the numbers tick up. It’s harder to quantify the Data Debt. The result is that the focus is on the data movement mechanics, not on the data itself.

That effort would have been better spent understanding the data. 

It is more important to consolidate the information about the data than it is to consolidate the data itself.

If your objective is to consolidate the data just for the sake of consolidating the data, then I suppose you’ve accomplished your goal. Of course, you lied to your management when you showed them that PowerPoint slide at the beginning. Instead of a single box all neat and tidy on the right side of the slide, you should have shown them a single box with the constellation from the left side jammed inside of it. You haven’t really changed anything except that all those little boxes are now inside the big box. Congratulations! 

It’s even worse if your objective is application or analytics rationalization. All you’ve succeeded in doing is putting off the real work. Rationalization requires that you understand the data. Do that up front. You’re going to have to do it at some point anyway, and the probability of management losing interest and abandoning the effort altogether is high. Then you’re left with a bigger mess than you started with. And you’ve done a bunch of work and not accomplished much of anything.

On the other hand, when you go to the effort of understanding the data first, you may save yourself some work. You may discover that some of the applications or data are redundant and you don’t need to move them at all. You may discover that you only need a small number of fields from certain sources. You may discover that some of these applications should be deprecated. And if you’re pursuing a modern data architecture, you may discover that a Data Fabric may obviate the need to move the data at all.

The problem is that it takes vision and foresight and discipline. It also means that you have to change the way you measure progress. Don’t count sources and tables and gigabytes moved. Instead, count sources and tables and gigabytes understood. Definitions and expected content at minimum. 

Look, or perhaps, understand before you leap.