Data Producers: Walking and Chewing Gum

Through the years I’ve done several surveys. Sometimes it’s collecting business value. Sometimes it’s connecting talent and need. Sometimes it’s inventorying tools. Sometimes it’s understanding resource utilization patterns. Regardless the purpose or venue, I have a very strict rule that applies to every question in every survey:

Before the question can be asked, the action to be taken or decision to be made based upon its answer must be articulated.

Curiosity is not sufficient. Not only does this information shape the composition of the survey, but it can also inform data collection. Respondents knowing the “why” can affect how accurately a question is answered or how much thought and care is put into the answer. You would think it wouldn’t have to be the case, but human nature suggests otherwise.

—

Imagine you’re a salesperson required to complete a call report within a week of each customer visit. The timely completion of these reports is a heavily weighted performance evaluation factor, and failure to make the one-week deadline will get you yelled at by your boss.

But it doesn’t appear that anybody ever looks at what you write. One day you were feeling particularly testy and wrote “I wonder if anybody actually reads these” in the middle of the report. Crickets.

It seems that the only purpose for the report is to confirm that you actually completed the customer visit and to simplify counting the visits toward your quota. Well then, instead of writing a call report, why not create a simple webform connected to your calendar where you certify that you completed the customer visit. Click on Yes or No. Finished.

But maybe your boss isn’t the only call report data consumer. Maybe one day the customer calls claiming that he had been promised something that he hadn’t received. Now, the detailed documentation in that call report becomes very important. The terms recorded in the report are very different from those demanded by the customer. The lawyers take it from there.

Now, this might happen only a few times a year. Only a fraction of the call reports are ever referenced again. But when they are it’s critical, and can mean a tremendous amount of money saved or lost. It is also important to you, since you’re going to have to explain what you did or didn’t put into your report. It’s going to be a very uncomfortable conversation if it’s the latter.

What if you knew when you were writing your call reports that one day lawyers would be asking you questions about it?

Would that be incentive to be clear and complete?

Would you not just dash through them just to finish them?

How should the manager communicate the need for complete and timely call reports?

Are the incentives (and punishments) aligned with the need? (Answer: No.)

—

Let’s look at another example. In 2023, the Consumer Financial Protection Bureau fined a major U.S. bank $12 million for submitting false mortgage lending information to the federal government. Lenders are required to collect and report demographic information about the people applying for loans. Applicants can choose to decline to provide that information, in which case a note to that effect is made in the application. This bank had a higher-than-normal rate of non-responses so they started tracking the non-response rate for each loan officer. They discovered that rate to be 100% for hundreds of loan officers. They simply weren’t asking the questions and just checking the non-response box.

Why did this happen? The loan officers were being evaluated based on the number of calls they completed. It was faster to simply say that the customer declined to answer the demographic questions than it was to ask the question and risk having to take the time for the customer to answer them.

I can’t believe that the loan officers were not informed of the importance of completely and accurately collecting the information. It is clear, though, that the more directly quantifiable and measurable call volume metric took precedence. Managing/balancing multiple incentives and objectives is a common challenge.

—

The key theme of my doctoral dissertation research was to develop an approach for training an expert system without employing an expert. Now, self-organizing systems had been around for a long time, but I wanted a rule-based expert system whose decisions could be examined to determine how each rule contributed to the final decision. (Thirty years later, explainable AI is now a very hot topic.)

I started with a simple expert system controller: balancing a pole vertically hinged on top of a cart on a horizontal track. It’s kind of like balancing a baseball bat in the palm of your hand. Each instance of the expert system was represented by a bit string and evaluated based on how well it balanced the pole. Populations of these strings were trained using a genetic algorithm. For the first several generations every instance failed. Eventually, one succeeded. Within a short time they all succeeded.

Encouraged by the success, I wanted to try to find more efficient solutions with fewer rules. In the next set of experiments, the evaluation function considered the number of rules in addition to how well the pole was balanced. The smaller the number of rules, the better. What happened next should not have come as a surprise: within a couple generations every instance shrunk to a single rule and no instance ever balanced the pole. I tried scaling the relative weights of the evaluation function factors, even to the point where the number of rules contributed a microscopic percentage to the score. The same thing happened.

This was because one of the objectives was very easy to attain while the other was more difficult. Jettisoning rules was much easier than solving the pole balancing problem. I tried waiting until after the system could solve the problem to consider the number of rules, but in that case I found that the solution was already too locked in and the number of rules never changed.

Two different objectives could not be optimized simultaneously.

I ended up running multiple sets of simulations where each population had different numbers of rules. In one simulation each instance had 10 rules, in the next 9, and so forth down to 3. It took longer to solve the problem with fewer rules. To be successful, the second optimization objective had to be imposed externally.

—

The difference between a human being and a neural network or genetic algorithm system encoding is that a human being can walk and chew gum at the same time. Or rub their belly and pat their head. Or whatever.

Data entry almost always has an immediate and apparent operational purpose. It also almost always has additional downstream purposes that may be less apparent but just as consequential.

It would be nice if everyone would just enter the data properly and completely for the purpose of doing it properly and completely, but this just isn’t human nature. We are going to optimize for our immediate constraints. We will pursue the greatest rewards and do what we need to do to avoid beatings. Like water, we will take the path of least resistance, accomplishing the goals that are most easily attained.

Management must therefore be clear about not only the immediate operational purpose for the data, but also its downstream uses. Communicate the consequences of errors, and don’t just scold for failing to get the TPS reports submitted in time. Align the request with the purpose. This might take a little effort, but the information is available. Understanding the purpose is a natural driver of compliance.

Data Producers: Walking and Chewing Gum

Published by Mark on October 2, 2024October 2, 2024

Data Ethics Part 2: You Are What You Eat

AI-Driven Data & Analytics Disruption: Data Quality and Metadata Collection

Third Recommendation: Stop Using Data

Data Producers: Walking and Chewing Gum

Published by Mark on October 2, 2024October 2, 2024

Related Posts

Data Ethics Part 2: You Are What You Eat

AI-Driven Data & Analytics Disruption: Data Quality and Metadata Collection

Third Recommendation: Stop Using Data