In a 1993 Seinfeld episode, Jerry describes the psychological leverage that doctors use on their patients. They tell you to go into the little room and take your pants off and wait. Then, “anybody that comes in with pants on seems like they know what they’re talking about. In any difference of opinion, pants always beats no pants.”
Similarly, data can be used as a tool to convince people to agree with you.
Data beats no data.
Data confers an air of authority. If you’re the only one holding up a chart or report, proclaiming, “Here are the numbers,” more often than not you will carry the argument. Data Storytelling classes will even teach you how to incorporate the data into your narrative.
But do you understand your data well enough to use it properly? Is the data sufficiently trustworthy? Are you sufficiently trustworthy? Are you a wolf, a sheep, a sloth, or an owl?
A wolf won’t care to understand the data or to use it ethically. The only thing that matters to the wolf is that the results support its position. Leave it to the other side to disprove it.
A sheep won’t care either. The data and narrative are taken unquestioningly at face value. When challenged, escalation of commitment kicks in and sheep become defensive about the narrative without really understanding why. Sheep are among the wolves’ best advocates.
Consider a couple examples.
A Warehouse Manager’s performance metrics were among the best in the company year after year. He was held up as the role model to which all other Warehouse Managers should aspire. When he retired, the President of the company attended his send-off.
Sometimes data is used to promote self-interest.
His successor’s numbers were not as exemplary. Unlabeled cartons, drums, and pallets were routinely reported, and often found in the wrong place, including in a pit behind the warehouse. An audit team was dispatched to build the case to fire the new Warehouse Manager, but discovered something unexpected.
They found that when the previous Warehouse Manager came across an unlabeled carton or some other inventory anomaly, he would simply remove it and throw it into the pit outside. Everything inside remained spotless. None of the metrics captured “stuff thrown into the pit.” None of the metrics sought to balance the inventory to ensure that everything on-site was included. When the new manager arrived, he started by assessing the state of the entire warehouse, both building and grounds. There was a lot to clean up. His poor numbers reflected the “technical debt” that he had inherited from his predecessor and that the metrics did not capture. The metric data was accurate, but not measure all aspects of performance. And the previous Warehouse Manager used this to his advantage.
The second example is a report published in the fall of 2023 by the FBI through its Uniform Crime Reporting Program. It has been widely cited in news stories with headlines like: Violent crime is dropping fast in the U.S.—even if Americans don’t believe it, and Most people think the U.S. crime rate is rising. They’re wrong.
Sometimes data is used for bullying.
These stories accurately interpret the published data, but there’s more–actually less–to the data than meets the eye.
In 2021 the FBI changed the way that it accepted data into the system. Nearly a third of the country’s police agencies covering a quarter of the population were not included. The reporting improved slightly in 2022, but seven of the nineteen largest law enforcement agencies were still missing. These included the two largest municipalities where crime rates appear to be rising the fastest: New York and Los Angeles. Furthermore, not enough municipalities reported data at all for the first two quarters of 2023 for the FBI to publish quarterly national estimates, and the third quarter still did not include Chicago or Los Angeles. The report also appears to exclude many federal crimes, including illegal border crossing.
Curiously, neither article mentioned the missing data, and some others that did acknowledge the missing data glossed over it as largely inconsequential. Were questions about data completeness not asked because the authors didn’t want to know (or want you to know), or because they didn’t know to ask?
In both examples, wolves took advantage of a lack of data understanding to advance their own agendas. And the sheep didn’t ask questions.
On the other hand, the owl is observant and informed. The owl uses data ethically, primarily for illumination, not solely for support. The owl casts a critical eye to both the data and the narrative. Neither are unquestioningly accepted nor reflexively dismissed. The owl asks questions guided by data quality dimensions, including:
- How well does the data reflect reality? (Accuracy)
- Are all of the objects or events represented by the data present and are any excluded? (Completeness)
- Do the data representing the same object or event in different systems match, both syntactically and semantically? (Consistency)
The data and those interpreting it must earn the owl’s trust.
Last is the sloth. The sloth generally uses data ethically and without an agenda, but doesn’t bother to understand it. Get a question. Run a query. Publish the results. Next. The results might be correct. Or they might not. Next.
A Sales Analyst was asked to identify which products transited through a particular distribution center on their journey from warehouse to store. This required a simple three-way join between warehouse, distribution center, and store data. A report was published. Next.
Fortunately, one of the report’s recipients noticed that it only included about half the products. After some research, he discovered that the distribution center codes used by the warehouse systems were different from the codes used by the store systems. They overlapped by about half, which led to the partial results. Another source had to be found that mapped the distribution center codes between the warehouse and sales systems.
Partial results…actually, any results…are often worse than no results in the hands of a sloth.
The bottom line is that it is wise to assume that all data is suspect until it has been proven trustworthy or until you understand it completely enough to be able to interpret it correctly.
Practice extreme data skepticism.
Do not allow yourself to be bullied with data. Demand details. Use data quality dimensions to direct your examination of the data. And on the other side, be a trustworthy data source and a trustworthy interpreter. Be transparent, complete, accurate, and consistent. Fit your storytelling to the data, not the other way around.
Be an owl.