This week we continue our Data Product dissection by looking at the data itself. What could be more fundamental? After all, it’s in the name!
Data Products come in three flavors: Foundational, Composed, and Packaged. It’s important for providers to understand these different types because they are conceived, implemented, and managed differently. I’ll cover that in a future article. Consumers, on the other hand, need to be aware of a few best practices but for the most part they shouldn’t have to understand or care. All they want is reliable data. So, let’s dive in.
Foundational Data Products are aligned to data domains and support expected business need.
Foundational Data Products are the raw material from which all Data Products are created, and they are critical to ensuring the reliability of all Data Products.
They are typically associated with operational system data from Authoritative Sources. After all, operational systems implement business functions, and data domains categorize the information associated with those same business functions. As a result, an operational system is (usually) focused on a particular data domain. Other sources include data feeds from third-party data providers and master data management systems.
The question frequently arises whether raw operational data (again, from an Authoritative Source) can be a Data Product. Many definitions exclude raw data. I prefer to operate from first principles, so let’s return to the Data Product requirements. Can raw data have data, access, metadata, and lifecycle management? Yes, it can. And if it does, then it is a Data Product.
In many ways, Foundational Data Products are what the Data Warehouse was originally intended to be: organized by domain, drawn from Authoritative Sources, curated with information that facilitates user access, and made available for analytical consumption.
Also like the original Data Warehouse concept, Foundational Data Products are not necessarily created in response to an identified business need, but are instead expected to support a broad range of anticipated business questions. If you are a retailer, it seems likely that analysts will be interested in sales, customer, inventory, and product data. You don’t need to wait.
Operational systems usually align one-to-one with Foundational Data Products, but not always. Sometimes multiple operational systems implement the same data domain, and therefore contribute to the same domain-oriented Data Product. This is especially prevalent in organizations where legacy systems are not fully retired when replaced. Different sets of customers might be managed by different systems, or different sets of attributes might be managed by different systems. We can curse the failure to appropriately complete the migration, but we’re stuck with it. This situation can cause some challenges, and I’ll talk more about this situation later.
Foundational Data Products are becoming increasingly important as analytical environments are modernized to incorporate Data Mesh and Data Fabric. Data Products are a key Data Mesh concept, and their implementation is often the first step in analytical ecosystem modernization.
Composed Data Products are aligned to consumption patterns and support identified business need.
These are what most business users consider to be “Data Products”: the common data sets, summaries, aggregations, and APIs that codify business knowledge, maximize reuse, promote consistency, avoid redundant computation, and accelerate results delivery.
Trust is critical. Clear and complete metadata is critical. The consumers are now multiple steps removed from the underlying data. They need to be confident that the source data is accurate, that the correct calculations were used, and that the correct transformations were applied.
Consider a Data Product that provides a corporate source for the Net Revenue generated by each customer. The SEC defines Net Revenue as the total amount of money generated by the sale of goods and services over a period of time, minus certain expenses such as returns, promotions, and discounts. Net Revenue is incorporated into many different analyses across multiple business units. A marketing analyst uses it to segment customers while a sales manager uses it to evaluate employee performance. But Marketing might exclude returns from its definition of Net Revenue while Sales might exclude certain discounts or promotions. Which flavor of Net Revenue was implemented in the “Net Revenue” Data Product? Consumers need to know.
Composed Data Products could be considered analogous to traditional Data Marts, except that the curation and maintenance functions are required to be considered a Data Product. They can be created a priori through a product development process or a posteriori by analyzing query patterns.
In theory, Composed Data Products could be chained arbitrarily, but in practice this is not a good idea. Complexity increases exponentially with data chain length. A problem or change to any link impacts the entire chain. Corporate processes often involve orchestrating data chains with dozens of links, and it should not come as a surprise that making substantive changes is time consuming and error prone.
Instead, whenever possible Composed Data Products should be produced from Foundational Data Products. Business knowledge from a set of Composed Data Products can be incorporated into another Composed Data Product, but that’s as far as it should go.
Composed Data Products, especially with multiple layers, can create performance and resource consumption challenges. What appears to be a single query against a single data object may end up spawning an unexpectedly large workload. Many data management platforms have difficulty optimizing multiply nested queries, especially when accessing disparate sources. It’s important to be aware of this possibility when creating Data Products, because it’s not reasonable to expect users to be concerned about this.
Finally, unless the Data Product is materialized it should not be used for transactional or even tactical application support.
Packaged Data Products are reusable analytical artifacts that directly deliver business value.
Packaged Data Products are strictly consumer-focused and include analyses, dashboards, reports, visualizations, triggers, alerts, models, and applications. They have similar benefits as Composed Data Products such as maximizing reuse, promoting consistency, and avoiding redundant computation.
Sometimes queries, metrics, calculations, and algorithms are considered Data Products but, again, let’s go back to the requirements: data, access, metadata, and lifecycle management. All of these generally have metadata and some kind of lifecycle management. One might even be able to access them through a function call, graphical development interface component, or API. But they don’t include data. A summary that implements a metric can be a Data Product, assuming that it satisfies the other requirements, but the metric definition itself does not.
The natural next question is: What data needs to be in a Data Product? I prefer to approach the question from the opposite direction: What data doesn’t need to be in a Data Product? That one’s much easier to answer: Data that’s not going to be consumed by anybody else. If the data (or aggregation or report or API) is going to be used by somebody who’s not you, then you need the metadata, the access, and the management. In other words, Data Product stuff.
Data Products are often described using the metaphor of a grocery store. Foundational Data Products are like ingredients: produce, flower, sugar, eggs, meat, etc. Composed Data Products are like packaged foods. If you want to make a cake you can buy the individual ingredients and assemble it from scratch, or you can buy a cake mix and just add water. Packaged Data Products are like a cake you pick up from the bakery.
Of course, the consumer doesn’t care about the type of Data Product they’re using or where it came from or who’s supporting it. They don’t care if it’s “raw” or “cooked.” They have a business need to fulfill. They just want to find the data, access the data, and be confident in the quality of the data. Recognizing the difference between Data Product types is important to the Data Product provider, especially when it comes to implementation and management. And we’ll cover those in future articles.