The “product” metaphor has been useful when making the point that there is more to delivering a table, file, report, dashboard, or summary than just the data itself. At this point, the definition of Data Product is pretty clearly and consistently understood. You can read more about that here and in other Data Product blog articles. And, as always:

The “product” in Data Product is reliability. 

OK. That’s great. But I can’t make “reliability” a user story or put it on the deliverables list. I can’t say that “next Friday we will deliver reliability.” At least not directly. Reliability is an outcome, not a deliverable. Then, what am I actually delivering?

A Data Product is not a single thing, but a bundle of deliverables, all of which are required in order to have a Data Product.

We need curated data, interfaces to find and access it, operational guarantees, and the knowledge required to use it correctly. Delivering the full bundle is the difference between delivering a Data Product and just delivering yet another data something-or-other of questionable provenance.

Most of us have seen memes or videos where someone is asked where the electricity used to charge their electric car or their smartphone comes from. They reply, “The wall,” seemingly unaware of the network of power lines, substations, power plants, and generators behind the outlet. Similarly, food comes from the grocery store, of course.

So, let’s look behind the wall and beyond the store shelf and ask:

How do we know that we’ve successfully delivered a Data Product?

It’s easy to start making a checklist of deliverables, and we will, but it’s useful to articulate what we’re actually looking to accomplish through those checklist items. Don’t miss the forest for the trees. Furthermore, a higher level of understanding allows us to add overlooked items that support the goal and jettison extraneous items that don’t. The Data Product Definition of Done has four components.

Data Product Definition of Done

1. The information objects that comprise the Data Product are demonstrably correctly implemented.

It seems self-evident. After all, the information object is going to be created regardless of anything else that happens, but the last three words are key. Taking them in reverse order:

Implemented. Obviously you have to deliver the table, file, report, dashboard, summary, extract, or whatever. For most everyone it’s the information object that comes to mind when thinking about a Data Product. And too often the thinking and the deliverables stop there. And that’s why they don’t have Data Products and why they have problems with Data Quality and why their AI initiatives underperform. We seem to have become comfortable with problematic data. In reality, the data itself is just one part.

Correct. Is there anybody out there that doesn’t think that the data needs to be correct? Providing incorrect data is worse than providing no data at all. But how do you know that the data is correct? Which of the following sound familiar to you: we believe the data is correct because the query ran successfully, because we compared a couple records to the source system, because we did a SQL code review, or because the users signed off on it. If that’s the extent of your validation I wouldn’t bet the farm on the correctness of your data. Everybody wants their data to be correct, but there’s a difference between the wanting and actually doing what needs to be done.

Demonstrably. To demonstrate correctness we need to specifically define the expected content and compare our results against that standard. We need to specifically define the semantics and ensure that our results are consistent with that standard. For something that everybody seems to agree is necessary, the fuss and pushback about creating and collecting this information makes it seem like it’s some big scary overwhelming thing. You’d think that we were asking teams to translate War and Peace by hand while abandoning all other queued up demand. The fact of the matter is that we’re not asking anyone to do anything that they’re not already doing. It’s just that they’re doing it in ad hoc, undisciplined ways. How can you do testing without knowing what the data means and what it’s expected to contain? The details are almost certainly in the requirements, user stories, and even the source system applications. Try leveraging AI to give you a head start.

2. The Data Product can be found and properly used without assistance.

No hand-holding or training. The key here is clarity. Information about the Data Product should be easy to find. Consider airport signage. First of all, there’s lots of it. Second, it’s mostly pictures, block lettering, and arrows designed to help even non-native speakers navigate the space and accomplish key tasks. Where do your consumers have to look to find information about your Data Product? How intuitive is it to navigate there? And then is the information about the Data Product easy to consume? Review and edit for completeness and clarity.

Of course, this all presupposes some sort of metadata repository or catalog, but it doesn’t have to be as fancy as that. At least not at first. I talked about that last week. You don’t have to start with technology. Don’t start with technology. If you’ve just got low-tech, then be low-tech. Just be sure that it is clear low-tech.

Throughout development, continuously collect feedback from consumers and make incremental improvements as you go along. Again, AI can help here. Better yet, have several people who are not involved in the development of the Data Product try to find and use it. You’ll know pretty quick.

Here’s the test: If the original data team disappeared tomorrow, would the Data Product continue to create value? If yes, then you have a Data Product. If no, you have yet another ongoing data project.

3. The support processes are assigned, active, and visible, and the ownership is durable.

The most obvious process is the one that creates, populates, and/or appends data to the Data Product on an ongoing basis. This one is always going to happen regardless, but there are so many more that are needed. If you want a Data Product. They include operational support processes like monitoring application execution, measuring performance, calculating SLOs, and raising alerts. Technical support processes like maintenance, bug fixes, and enhancements. User support processes like training and help desk. And content support processes like ensuring data quality and semantic consistency over time. Check out these other articles and classes for details.

What’s missing, more often than not, is clear assignment of responsibility. Accountability must be attached to a role, not to an individual person. This clarifies escalation paths and allows ownership to survive organizational change. Most importantly, ownership must include not just responsibility but authority. Responsibility without authority is a recipe for disaster.

4. Value is defined, observed, and revisited. If nobody depends on it, it’s not really delivered.

Two more processes, in addition to those previously mentioned, merit special attention. The first is identifying the business actions that are enabled or facilitated by the Data Product. From there, business value can be calculated

Business value is measured differently for different Data Product categories. A Foundational Data Product is domain-aligned, sits close to the operational system, and is used by multiple downstream consumers. The business value is therefore often indirect and delayed. Here, measuring downstream utilization is key. On the other hand, Composed and Packaged Data Products should generate business value immediately. After all, they were created in response to specific business needs.

The second is to establish a process to collect qualitative feedback regarding utilization not just of the Data Product, but also of similar information objects. Are consumers using workarounds, or are they using your Data Product in a way that it was not intended and perhaps is not appropriate? Are they exporting or making copies of the data because it’s hard to use directly? Are they creating similar data assets elsewhere? This all requires awareness of the analytical estate generally, as well as keeping in constant contact with the consumers. These are early warning signals that utilization metrics alone won’t catch, indicating that your Data Product may not be fully satisfying user needs. Sometimes an adjustment needs to be made to the Data Product or its support processes, and sometimes this points to the need for a new one.

It seems like a lot, I understand, and can be overwhelming when considered all at once. But when you break it down, much of this is already happening as standard operating procedure, and the rest should be happening but has been neglected. Actually, the rest is probably already happening but inefficiently when questions or problems arise, leading to a greater overall expenditure of effort. Just because you’re not doing it intentionally or as part of a project plan doesn’t mean you’re not doing it. And it doesn’t mean that the enterprise isn’t suffering for it not being done.

(A side note: most organizations closely measure project-related activities but lump any “clean-up” afterward into the overhead or support bucket so it’s easy to overlook. Associating these activities with the original project will provide a more accurate picture of its true cost.) 

So, I’ll conclude with this question: Which components of this Definition of Done can be skipped without negatively impacting reliability?

When incompletely delivered, you don’t have a Data Product, you have a future problem. You just don’t know it yet.

Now that we’re clear about what we’re delivering, upcoming articles will look more closely at how to deliver it.