Listen to this article:
Show of hands: how many of you have ever heard corporate leadership say something like, “We are moving everything into the cloud and closing down our data center so we can scale faster and save money”? Maybe you heard the slightly constrained corollary, “We are moving all of our analytics into the cloud and getting rid of our on-premise data warehouse so we can scale faster and save money.” How’d that work out?
Let’s focus right now on data and analytics, but the same ideas apply to other areas. It’s so very alluring, especially at first. After all, moving everything to the cloud appears to resolve the two biggest complaints that business has of IT: speed and cost. Everything takes to long and costs too much. The cloud promised to fix it all, at least according to the 8×10 glossies: self-service, unlimited space, and lower cost. And empowerment!
It usually starts this way:
In the cloud we don’t have to wait for the data team. Their backlog is overflowing and it’ll be at least six months before they get to my project. I need answers now. Besides, how do I even know that their data will solve my problem? And don’t get me started with the governance committee and their ivory tower processes. Supposedly I got assigned to be a Data Sewer or Steward or something.
Anyway, I have a couple people who know a couple things. We’ll just pull the data and build it ourselves. Call the cloud platform salesperson that met with the team a couple months ago. He said that we could build it faster and for less than the data team had quoted. Just activate some tools, point-and-click, and we’ve got a data pipeline, repository, access tools, and numbers we can trust (because they’re our numbers). And storage is practically free.
It always starts out free. Free like a puppy.
With apologies to Rush, choosing to not manage data is itself a strategic choice. It’s understandable. It’s self-reinforcing. And for a while the strategy appears to be successful. Results are delivered faster. Chicken lunches all around!! The siren song is heard by leadership in other areas and they want to complete their analyses and deploy their applications just as fast. So, they imitate the pattern, creating their own data feeds and their own repositories. Now I have numbers I trust and I got them fast. More chicken lunches!!
Before long, the analytical environment starts to receive industry kudos. Everybody wants to be the center of attention at the Executive Leadership Summit. Who wants to be bothered with a measly terabyte-sized analytical environment. Our company’s is approaching an exabyte. Do you even know what an exabyte is?
The result is data multiplying uncontrollably like … you know.
And then one day, the numbers shared by two organization heads in an executive meeting don’t match. They haven’t been matching for a while, but at least they were pretty close. Close enough. Now, they’re not so close anymore. A meeting that had been called to address a business issue is now consumed with arguing over whose numbers are correct. A tiger team is created to identify the source of the differences and reconcile the numbers.
It isn’t long before it happens again. And again. We’re going to run out of tiger teams.
But the problems don’t end there. Your one data feed might have appeared to be less expensive to deploy. And his. And hers. And theirs. Over time, the company ends up with duplicate pipelines processing and storing the same data. Each little repository brings with it its own ingestion logic, transformations, storage, and maintenance burden. And then these little repositories become data sources with their own downstream dependencies. Nobody who’s deployed a private repository because it’s cheaper and easier did it anticipating that they would be supporting anybody else.
The enterprise loses its economies of scale.
Remember those mismatched report numbers and the tiger teams tasked with reconciling them? Maybe there are some people who relish the opportunity to do data reconciliation projects, but I’m not sure I know many if any. Lineage is hard to trace, and debugging is more like archeology.
What started out as a way to reduce time and cost is now failing on both fronts.
By the time you realize that you have been lured down a bad path, you’ve got a huge mess on your hands. Some companies will doggedly cling to this approach, even convincing themselves that this is a preferred approach. I believe that they do recognize, or eventually recognize that they are fooling themselves but continue in that direction anyway not because that’s what they want to do, but because they find changing course too daunting. Well, yes. The longer you keep making a mess the longer it’s going to take to clean up. Entropy only moves in one direction and it takes effort to reverse it.
There is something between overburdensome governance and data anarchy.
Stay tuned for that in “Taming the Data Rabbits.”