In a previous article I covered the foundation that needs to be laid before you even begin thinking about deploying data products. You’ve worked through and completed Phase Negative One, and now you are ready to be thinking about deploying Data Products.

Thinking. It’s not time to start coding yet.

Moving from the broad foundation, the next step is to lay the specific foundation for the specific Data Product that you’re implementing. Even if you’re using an agile methodology. Especially if you’re using an agile methodology. Agile isn’t an excuse to start coding and figure the rest out later. And the core value of “Working software over comprehensive documentation” doesn’t mean that understanding the Data Product isn’t necessary.

I think that the statement “Working software over comprehensive documentation” is too often misused the same way Minimum Viable Product is misused: as an excuse. An excuse to not produce any documentation or to produce poor quality documentation. Companies that took that statement to the extreme are now discovering that “documentation” is required, and that its absence or incompleteness is causing problems with their AI initiatives. Just because you don’t have to produce reams and reams of “comprehensive” documentation doesn’t mean you don’t have to understand the data before you start.

The key here is that every bit of documentation must have value. Just like every Data Product must have value. If it doesn’t, there’s no point in creating it. And just like Data Products, you should know who you’re creating it for and how they’re going to use it.

Now that we’ve completed Phase Negative One, have a Definition of Done, and understand the importance of documentation, we can now move on to:

Phase Zero: Framing and Charter

Phase Zero focuses first on semantics and a ratified charter as first-class deliverables, and semantic-alignment as the first order of business. Data Products don’t fail because they don’t deliver data. Organizations rarely struggle to move data anymore. We’ve been doing that for decades. Data Products fail because they are not reliable, reusable, or understandable. Sound familiar?

The primary Phase Zero deliverable is the Data Product Charter.

The Data Product Charter consists of:

  • Domain Alignment
  • Data Product Name
  • Business Outcomes Enabled
  • Primary Consumers and Their Core Business Questions
  • At Least One Concrete Downstream Dependency
  • Success Signals
  • Primary Contributing Operational Systems
  • Ownership
  • Mechanism for Ensuring Ongoing Accountability
  • Change Management and Versioning
  • Sunset/Decommissioning Criteria

The Data Product Charter establishes alignment, outcomes, ownership, and accountability.

Approval of all Data Product Charter components by all stakeholders is required for Phase Zero to be considered complete and before ANY coding begins.

Let’s look at these components in more detail.

Domain Alignment

Everything that follows depends upon this alignment. It’s important to get it right. I think that most of us have found ourselves in the middle of two or more groups arguing over a definition. At some point the camps dig in: “You can define it any way you want to as long as I don’t have to do anything differently or as long as it’s my way.” Nobody wants to give in. Meetings become contentious. Then conversation stops.

But you are still expected to deliver, so assumptions and arbitrary decisions are made. Usually by a developer who shouldn’t be making those decisions or by management who shouldn’t be making those decisions. You side with one group and maybe they use your deliverable. Try to side with both groups and nobody uses it. So many phone calls and emails asking “Why did you do it that way?” and “Why didn’t you do it my way?” Everybody has something to say now. Skipping semantic rigor causes quality and usability issues, and ultimately creates lots and lots of rework. 

It’s easy to underestimate the amount of cross-functional negotiation or to overestimate the amount of agreement. Oftentimes, both the IT and business teams will vigorously resist alignment. When you’re leading a Data Product development project, it’s important that you drive these discussions to a resolution. I’ve watched projects languish for years because of irreconcilable differences over domain definition.

Now, it’s easy for me to say that if you don’t have agreement you shouldn’t move forward, but sometimes you don’t have that choice. If there truly is an impasse, you may need to define two different domains. Perhaps a Sales Product in addition to the enterprise Product. Be extremely explicit. Include the intransigent group in the name. Make them responsible. It’s not optimal, but spinning wheels for years isn’t either.

Data Product Name

We do this all the time and I don’t need to go into a whole lot of detail. Naming is especially straightforward for a domain-aligned Foundational Data Products. Just always be sure to be descriptive and be complete. 

Business Outcomes Enabled
Primary Consumers and Their Core Business Questions
At Least One Concrete Downstream Dependency
Success Signals

I talked about these with the Definition of Done for Data Products. This is where they get agreed upon and documented. Alignment is critical regarding why the data product exists and what problem it will solve before ANY work starts. Success signals are primarily qualitative at this stage, focused on whether the product enables the intended decisions. How will we know that the Data Product is successful, and how will we know that there are problems. Ultimately, success is when everyone can explain the data product in one sentence and agrees that it is worth building. Quantitative metrics can complement them later. 

Significantly, approaching starting Data Product development this way makes delivery a requirement, not an afterthought. Even if you as the enterprise data team looked at your business and it’s a retail business and you know that your users are going to need Inventory data so you decide to build an Inventory Foundational Data Product. You’re probably right to assume that if you build it they will come, but there’s more to it than that. By having at least one specifically identified consumer and downstream dependency, you know that even as a Foundational Data Product you will be generating value immediately. 

Primary Contributing Operational Systems

This is as close to implementation as we come right now and we know how to do this. It’s on the list here so that we can establish ownership and accountability.

Ownership
Mechanisms for Ensuring Ongoing Accountability

Ownership is not just for today, but throughout the life of the Data Product. If there’s an expectation that ownership will change in six months, make that explicit in the charter. Also, ownership is attached to roles, not individuals. When someone leaves the role, ownership automatically transfers to whoever assumes it next.

Outcomes, definitions, business value, and adoption are owned by the business. Source system reliability and implementation constraints are owned by IT. Data Product infrastructure, feasibility, and coordination are owned by the Enterprise Data team. I discussed distributed ownership with Data Product architecture. This is not a one-time governance exercise. Without explicit business and IT ownership, the Data Product ends up being the responsibility of the data team alone, which leads to a lack of prioritization, responsibility, and feedback, and eventually an orphaned data product. In other words, pretty much the situation we have today.

Process and structure make getting stuff done easier. So, define how the required Data Product creation, definition, deployment, and management processes will fit into your existing SOP. Minimize the net new. Tracking and auditing ensure ongoing accountability. Measure it. Make it public.

There’s one additional point about ownership that I want to address now, and that’s sustainability, especially when priorities change. It happens in every organization. It’s normal. 

What is not acceptable is allowing priorities to shift in a way that erodes reliability. 

Once a Data Product is delivered, maintaining its reliability is operational work no different than maintaining a production application is operational work. Reliability takes precedence over enhancements. Commitments to consumers cannot be ignored simply because a new shiny object appears.

Maintaining Data Product reliability is non-negotiable operational work.

It is not discretionary. This makes the priority rules different from project development. Notice that it’s the maintenance of Data Product reliability that’s key here, not just that the Data Product application is running and producing data when it’s supposed to.

Correcting data quality defects, semantic inconsistencies, and SLA/SLO violations are operational incidents, not backlog items or new feature requests. 

Of course, enhancements to the Data Product go through the normal backlog and prioritization processes. And if priorities truly shift, then the existence of the Data Product is reevaluated and may require that ownership be transferred to another team or that the Data Product be replaced or retired.

Change Management and Versioning Discipline

This is often dealt with on the back-end, but change itself must be managed as a first-class concern. Will Data Product content be backward-compatible when changes occur? Do the semantics of an existing data element change, or is a new data element created? How are consumers notified? Reliability isn’t something that’s just implemented once and then never revisited. It must be continuously tended, like a garden which, if neglected, will sprout as many weeds as vegetables.

Sunset/Decommissioning Criteria

When you first get started with Data Products, my guess is that they are all going to be frequently used and in high-demand. After all, that demand drives the introduction of Data Products. Over time, though, you will accumulate more and more. It’s inevitable.

I once reviewed an analytics environment where more than a thousand reports were being regularly generated anywhere from annually to hourly. They all had consumers and customers. At least, they all had someone that was receiving the report or being notified that the report was available. According to the recipients, though, less than ten percent of the reports were actually being looked at, and less than ten percent of those were actually driving business actions and decisions. One percent of the reports generated any value. All were consuming resources. 

It’s not really feasible in a large organization to do a Data Product utilization audit manually. Evaluating the ongoing usefulness of your Data Products needs to be automated. My preferred approach is to require consumers to periodically articulate the business actions and decisions that are being driven by the Data Product in order to maintain access. I like quarterly but others have argued that once or twice a year is sufficient. This way you can support the investment in the Data Product through quantified ROI, and ensure that it is still having an impact. When a Data Product isn’t driving business actions or decisions anymore, then it needs to be retired. When nobody depends on it anymore it needs to be retired. This is especially important for AI training data where stale or irrelevant data quietly degrades the model.

Maybe we should call it a Data Product Commitment because the collective team creating, deploying, and supporting the Data Product is making a reliability commitment to the enterprise. Let’s NOT call it a Data Product Contract. Contract is already used in a different context and I’ll talk about that in an upcoming article. The last thing we need is more confusing vocabulary. Charter will work. 

Featured Image: NASA.

Categories: Data Products