This tale may sound familiar to companies that have not yet crossed the Data Chasm (and maybe even to some that have).
A project team was modernizing a major operational system. One that had been written decades earlier. One where nearly all the developers and subject matter experts with knowledge of the applications, data, and business rules had long since left the company or retired. One where the source data had not been clearly documented.
Nobody knew where the data came from, what it meant, what it contained, or how it was calculated. All the team could do was to try to reverse-engineer the data through the applications that generated it. Such a Data Forensics exercise is tedious, difficult, often aggravating, and always a waste of time.
What message do you suppose the team would send to the previous generation of developers if they could go back in time?
They would impress upon them the importance of documenting the data. (Besides making the present-day modernization faster and easier, it would also have made that data more useful and usable, and thus more valuable, in the interim.)
Yet, despite this hard-earned insight and experience, this generation of developers turned out to be no better at documenting their data than the previous one. They are doing the same thing to the next generation of operational system developers that was done to them.
The cycle of data neglect continues.
Information management is sometimes sold as an “investment”: put in a little extra work today that will pay greater dividends for years to come. And I do believe that development and business teams recognize the benefits. Yet, as release dates approach, information management is still the first thing jettisoned in the interest of timelines and deliverables.
We therefore cannot appeal to logic because the development and business teams already recognize the benefits but that still wasn’t enough to motivate action.
Demand exists for information management, or at least the products of information management. It’s just that nobody wants to do it.
And that’s oddly encouraging.
We see this pattern elsewhere. You want to perform the song, but you don’t want to practice the instrument. You want to lose weight, but you don’t want to change your eating habits. You want to run the race, but you don’t want to train. You want the application to function properly, but you don’t want to spend a lot of time and resources testing.
Speaking of testing, why do we spend so much time and resources on testing?
Nearly all developers spend some non-trivial portion of their time testing, and application teams often have dedicated testers. Some companies have groups or even entire organizations devoted to finding errors in developers’ work products. Testers don’t deliver new features or business capabilities. Seems to me it would be faster and more cost effective to lose the testers and hire better developers. Ones that don’t make so many errors.
Obviously, I’m being facetious. Companies recognize that it is important for applications to function properly, and no matter how skilled the developers, testing is always necessary. And much of testing is focused on data input and output. Let’s hang on to that one and we’ll come back to it later.
Testing also requires that resources be allocated to the function. Like maintaining a healthy diet and exercise and practice and training, I’m not sure that many people really love testing (although I do know some that do). Why do any of it? Because at some point someone decided that it was necessary and that sufficiently bad things would happen if it wasn’t done. Well, sufficiently bad things are happening today because we don’t understand our data properly (remember those 70 – 95% project failures), and if something doesn’t change, they’re going to get worse.
The first step is to recognize that somebody has to start doing something they’re not already doing.
At companies, especially large ones, one of the hardest things to do is something that’s not already being done. Especially when it doesn’t come with incremental headcount. It usually takes a visionary executive to step up and volunteer to do it.
I’m going to assume that you don’t have a visionary executive (when it comes to data, let’s be clear) or a management mandate. What can be done?
When I was in graduate school, a friend had an issue of Cosmopolitan (I think it was) on the coffee table in her living room. As I recall it was roughly a thousand pages of clothing and perfume advertisements, but the cover teaser for an article caught my attention. The title was something like “How to Get Your Partner to Do What You Want Him to Do.” I considered it reconnaissance behind enemy lines. The gist of the article was that you can’t change your partner to do what you want him to do. All you can do is to change your own behavior and perhaps he will change his in response—but you can’t even count on that.
So, if you don’t have a management mandate and you can’t get the development and business teams to engage, then you have to be the one that does something that you’re not already doing. If you don’t, it is unreasonable to expect that anyone else will.
If nothing else, start profiling some data … any data … and start communicating the results. Today.
There’s no reason to delay. You don’t need to spend a lot of time. You don’t need anything fancy or automated or purchased. That can come later once you have some traction. Pick frequently used tables and critical data elements. Write a program or script or SQL query that does a COUNT and GROUP BY. Publish the results on your departmental website or in your quarterly newsletter. Report them during your next project status meeting and publish them with the minutes.
Storytelling with data is extremely important. We’ll talk more about that in a future article. But for now, this is where you have to start telling the story of your data.
Start generating profile data and asking questions. You may discover that everything is great, and that’s great! But experience suggests that you will quickly find something interesting.
Welcome to Square One.
This is the third in a series of articles that explores the question of why we continue to see overwhelming numbers of analytics, artificial intelligence, machine learning, information management, and data warehouse project failures despite the equally overwhelming availability of resources, references, processes, SMEs, and tools…and what can be done about it.