This is the first of an occasional series of articles that will examine data that we encounter in our everyday lives. Metrics like unemployment, inflation, stock market averages, crime rates, and the like. Most of us don’t take day-to-day action based upon these numbers, but for many they contribute to a general feeling of either comfort or uneasiness about the world around us. A deeper understanding can remove some of the mystery, and allow us to better differentiate between insights, propaganda, and conspiracy theories.

I take it as axiomatic that government statistics are created so as to allow them to support a political narrative, whatever that narrative happens to be at the time. We all know the three kinds of lies.

So, if we are going to be informed consumers of this information, what do we need to know? 

This sounds like a job for Data Quality. 

Accuracy: What exactly is the statistic measuring? What is it trying to communicate? How close is the underlying data collected and recorded to the real-world instances of the objects or phenomena?

Lineage: Where did the underlying data come from?

Completeness: Are all instances collected and included in the metric? If not, how was the subset selected and does it represent a representative cross-section?

Precision: Is the underlying data collected to an appropriate granularity, sufficiently fine-grained to accurately represent the instances?

Consolidation: How is the data accumulated, aggregated, summarized, and manipulated?

Consistency: Is the same data collected in the same way from the same sources from period to period? Are the calculations and adjustments the same period to period? If not, how do they differ? Are the metrics adjusted or do they evolve over time? What triggers an adjustment? What is different about the new data? Where did it come from?

Let’s illustrate using simple example: the Dow Jones Industrial Average (DJIA).

The DJIA dates back to 1896, and for more than a century has been a mainstay of morning newspapers and evening news broadcasts. It is, for many, a primary indicator of the country’s economic health as well as a real-time reflection of reaction to world, political, and financial events. But what is it actually measuring? How accurate of an indicator is it? How much should I really be paying attention to it and which, if any, decisions should I be making based upon it? Drilling into Data Quality dimensions provides some insight.

Accuracy: It is important to separate the metric from its interpretation. Yes, the metric accurately captures and calculates the stock price data. And yes, significant events, both positive and negative, impact the index. But no, most experts do not consider it to be an accurate proxy for the state of the overall American economy. We’ll return to this one shortly when we have a little more context. 

Lineage, Precision: Everything is good here since the stock price data comes from the New York Stock Exchange (NYSE) and NASDAQ.

Completeness: The number of companies represented increased from its original twelve to its current thirty in 1928, and are selected by a committee. This, though, represents less than one half of one percent of the more than 6,000 stocks traded on the NYSE and NASDAQ exchanges. The index originally included companies that produced commodities like oil, coal, rubber, lead, cotton, tobacco, and sugar. Through the years, it has become less industrially-focused with the addition of financial services, insurance, and technology companies. It does not include transportation companies or utilities, which are reported separately. 

Consolidation: Calculating the DJIA starts by summing the 30 companies’ stock prices. The result is then scaled by a factor called the “Dow Divisor” whose purpose is to keep the index consistent, counteracting structural changes such as stock splits, mergers and acquisitions, dividend payments with stock, and index composition.

Consistency: The data is collected in the same way from the same sources, but the index composition changes every few years as the committee identifies companies that it believes more accurately represent the broader economy. The Dow Divisor also changes periodically, and is calculated in such a way that the index is the same before and after the precipitating event.

It is clear that one of the primary objectives of the DJIA is relative consistency over time, enabling reasonable historical comparisons. It also has some significant limitations.

  • Less than one half of one percent of the more than 6,000 stocks traded on the NYSE and NASDAQ exchanges are represented. Critics question whether such a small number can adequately represent overall market performance.
  • The entire transportation and utilities sectors are excluded. They have their own separate indices and all three (industrial, transportation, and utilities) comprise the Dow Jones Composite Index.
  • Because the calculation starts by simply summing the companies’ stock prices, those with high prices have a disproportionate influence on the index than those with low prices. This morning the share price of United Health Group was more than ten times that of Cisco meaning that the weight of United Health Group stock is more than ten times greater.
  • Key factors including industry size and market capitalization are not considered. Walmart’s 2023 revenue was more than 13x that of Goldman Sachs, but has one-sixth the index weight.

Now that we’ve evaluated the metric through our Data Quality dimensions, we understand it more completely and can make more informed decisions about how we choose to use it. The DJIA is ubiquitous, reflects world events, and holds a special place in American finance and the public’s psyche. If you’re interested in a coarse-grained indicator that is largely consistent through more than a century of history, then this is a good pick. It is also useful as a reasonably-sized market basket of blue-chip stocks (please note that I am not a qualified investment advisor and this is not intended as investment advice).

It’s all about Data Fitness!

On the other hand, it’s clearly not the best proxy for the state of the overall American economy. If that’s our objective then we should look elsewhere, perhaps the S&P 500 which covers about 80% of the market. I’ll leave answering these same Data Quality questions for that index as an exercise for the reader. And, of course, this assumes that the stock market is a reasonable proxy for the state of the overall American economy at all. 

Many companies manage to the metric. Executives receive stock and stock options as performance and retention incentives. Board members and shareholders demand increased value. Companies will often take cost-reduction actions like outsourcing, offshoring, layoffs, and shrinking research and development budgets. It all looks good on the balance sheet in the near-term, but are frequently detrimental to the economy and to the company in the long-term. 

No, as a proxy for the state of the overall American economy, the stock market is just one of several factors that need to be considered collectively.

Once you know what to look for, you can dig into any metric, index, or statistic to determine whether it’s answering that question you’re asking or providing the information you’re seeking. The details almost always accompany the reports, but very few people read the small print. I look forward to exploring more of these in the coming months. If any are of particular interest to you, let me know and I’ll add them to the agenda.