… and your data warehouse / data lake / data lakehouse.
A few months ago, I talked about how nearly all of our analytics architectures are Stuck in the 1990s. Maybe an executive at your company read that article, and now you have a mandate to “Modernize Analytics.” Let’s say that they even understand that just putting everything into the cloud or using cloud tools doesn’t count as modernization. They want to take advantage of some kind of Data Cloth or Fabric or Mesh or something. That’s OK. They’re on the right track. You can fix the vocabulary later. Mainly it’s a good thing that your executive wants to move analytics forward, and now it’s your job to do it.
You start thinking about modern architectures like Data Mesh and Data Fabric, and the Data Products that are fundamental to them. You consider the gap between current state and future state. You begin to list the things that your team will have to do in order to make that leap, and it suddenly hits you that you’ve been looking at your analytics resources in the wrong way.
Data quantity and availability have traditionally been prioritized over quality, understanding, and usability.
The whole self-perception of analytics teams tends to sound something like, “Our Data Lake has ten gazillion petabytes of data from more than a bazillion source systems!!” Demand from business users tends to sound something like, “Just get the data into your lakehouse today and I’ll figure it out from there.”
Information management folks are left chasing the train as it speeds away, and having conversations that tend to sound something like, “How much of that data do you know what it means or know what it’s supposed to contain.” “Some.” “Some? Like what percentage?” “Well, maybe one if you round up.”
You stare into the seemingly overwhelming reality that very, very few of your feeds, streams, or data sets satisfy the requirements for a Data Product.
It amazes me that we’ve managed to be reasonably successful. What a mess. Analytics users at nearly every company with whom I’ve discussed this topic have referred derisively to their Data Lake as a Data Swamp or Cesspool or Quagmire. The enterprise analytics team is often viewed similarly favorably, most often seen as a bottleneck. After all, the implementation of hundreds, maybe thousands of data feeds are dependent upon a single team. And the responsibility for all those feeds falls on the back of that team. Would that be tolerated in any other domain? Of course not. This approach is not scalable, it’s not sustainable, and it’s not the best use of resources.
Which leads you to a second realization:
Enterprise analytics modernization cannot happen with an enterprise analytics team that is responsible for the care and feeding of the data.
The typical enterprise analytics team evaluates requests for new data and works with the source systems to establish data file feeds and/or transaction streams. Sometimes the source system team pushes the data into the analytical ecosystem, and sometimes the enterprise team reaches into the source system repository and retrieves the data. The enterprise team then loads the data, manages the feed, and supports the users.
As far as the source system team is concerned, once the data is over the wall that’s it. Hopefully the data gets to where it needs to go. Hopefully it’s usable. And if there’s a question about content or quality, the response is always the same: “The data is good enough to run the business. It’s good enough for you. But if you feel you have to, submit a support ticket and we’ll get to it never.” In short, analytics is not considered to be within the scope of a source system team’s responsibility.
Modernizing analytics requires the material participation of the business process teams, and that will require an enterprise mindset shift.
Nuts. You knew this was coming. Your mandate to “Modernize Analytics” probably came with the “don’t adversely impact any other teams” constraint attached.
Many companies don’t have dedicated business process teams, but rather partnerships between a business team that defines the business process and a system team that implements that business process. And analytics is rarely on their radar.
The first job of executive management in analytics modernization is to make it clear that analytics is part of the business process teams’ responsibility; either the consolidated business process team or the business/system team partnership.
The focus must be on people and process. Not technology. These business process teams, regardless of how they’re constituted, must be responsible for creating Foundational Data Products, which include:
- Content
- Curation with Definitions and Expected Content at Minimum
- Transportation into the Analytical Environment (I’ll talk more about architectures later)
- Content Monitoring
- Maintenance and User Support
- Lifecycle Management
And it can’t be “submit a ticket and we’ll get to it never.” It must be a priority. Think of analytics as another business process for which each business process team is responsible.
Meanwhile, the enterprise analytics team prepares by developing the tools and processes that will enable the business process teams to do their job with as little friction as possible.
The key here is “with as little friction as possible.” Analytics and information management teams are notorious for over-engineering processes. We love process. We love completeness and precision and accuracy. We want everything to be perfect before we release anything. We need to let go a little.
Provide the tools and resources that allow the business process teams to analyze the data and to move it into the analytical ecosystem as easily as possible. Implement monitoring processes and applications that identify and expose errors. Define processes that ensure compliance, but analyze each step to automate and accelerate as much as possible.
Centralize standards. Distribute implementation.
Specifically, the business process teams are responsible for implementation. The enterprise analytics team ensures compliance.
This transformation can proceed incrementally. Some business process teams will be easier to engage than others. But executive support will undoubtedly be required. Sometimes it will take something a little more assertive than support.
Eventually, your data warehouse or data lake will be populated with Foundational Data Products. From the outside it may look very similar or maybe even the same as it always has. But functionally it will be very different: the processes are streamlined, the data is understood, and the implementation is distributed. You are now on the path to truly modernizing your analytics architecture.