In the holiday classic, It’s the Great Pumpkin, Charlie Brown, and earlier in the October 25, 1961 Peanuts comic strip, Linus Van Pelt famously says:
There are three things I’ve learned never to discuss with people: religion, politics, and the Great Pumpkin.
Unfortunately, it’s hard to talk about Data Ethics without brushing up against at least two of those three. I’ll do my best.
I’ve said it before. Several times. And I wasn’t the first to say it. Data Ethics isn’t some separate special thing. It is simply ethics applied to data. Moral principles that inform the collection, processing, and use of data.
Until recently, Data Ethics was most likely to be discussed within the context of privacy and security. What data should be collected? How should it be managed? Who should be able to access it? Today, like so many areas of information management, the explosive growth of AI has brought renewed attention to Data Ethics, particularly as it relates to the trustworthiness of these models. Ethics is now considered a branch of Data Governance. We’re going to have to add another spoke to the DMOK wheel.
For this conversation, though, I am less interested in the specific ethical requirements themselves than in their foundations.
Data Ethics is similar to Data Quality in that a reference standard is required.
That reference standard is expressed in principles and policies. Let’s take a brief detour here. A principle is the belief that serves as the foundation for actions and decision-making. For example, “It is the responsibility of every company to minimize its carbon footprint to minimize the existential threat of climate change.” A policy expresses the specific implementation and enforcement of the principle. Something like, “Consolidated International Incorporated will use only LED light bulbs.” The principle must always precede the policy, otherwise the policy is baseless and arbitrary. The corporate equivalent of “because I said so.”
Interestingly, the same policy can have widely differing underlying principles. Sticking with the LED light bulb example, that same policy can be derived from the principle, “It is the responsibility of every company to maximize value for its customers and shareholders.” A contributing factor would be to use LED light bulbs that cost less to operate.
But what if you agree with the policy but disagree with the principle that motivated it? I’ll leave that as an exercise for the reader.
It has been said that “data is neutral.” Whether that’s true or not largely depends on your interpretation of the word “data.” 150, TRUE, and “JL” are simply what they are: a number, a binary value, and a character string.
Yes, 150 is just a number. If it’s my cholesterol level, then it means that I’m doing OK (at least in that respect), but all things considered I’d rather that information be kept private. If it’s the weight of my suitcase, then it means that I’ll have to pay more to check it, but I’m less concerned about who knows it.
“Data” is created to represent something and exists within a context. It’s the interpretation and understanding of the data that gives it meaning. Where there’s interpretation there’s the potential for misunderstanding. Where there’s use there’s the potential for misuse. Ethics provides the guardrails.
The most obvious reference standard for ethics is religion. The second is political agenda. And if you’re Linus on Halloween night, then The Great Pumpkin supersedes all. Standards emerge (and vanish) as societal norms evolve.
Ethics provides insight into what is important to people and their values.
What is acceptable to the majority of a population changes over time. What was socially acceptable, or taboo, in the 1950s differs greatly from today. The 1850s even more so. It can even change from year to year or even month to month. A significant, perhaps traumatic event may make something that was perfectly fine yesterday anathema today.
Personal information privacy is an ethical standard that has been largely agreed upon by society. I say “largely” not because there’s a question about keeping personal information private, but deciding what qualifies as personal information. Personal data is legally defined in the EU by GDPR, but lacks a specific definition in the United States.
But suppose for a moment that your world view places the collective above the individual, and that personal property ownership is discouraged or prohibited. In that case, you might not have any issue with making any data about anybody, yourself included, available to everybody. You don’t own or control the data any more or less than you own or control anything else.
Similarly, if you believe that human-generated carbon dioxide is going to destroy the world in the next ten years, then you are going to be very concerned about the power consumption required for AI. That will factor significantly into your policies: whether neural network model training should be limited, or whether its use should be limited. If you’re not worried about human-generated carbon dioxide destroying the planet, then your AI training policies will be built on other principles like maximizing model quality or availability.
And that’s what I mean about evaluating the framework for ethics. We need to recognize our own biases and our own preferences in the creation of those frameworks. Generally speaking, the targets will be somewhere in between the extremes, but we often have a hard time agreeing exactly where those lines should be drawn.
The more diverse the population, the less likely the agreement on a uniform, consistent ethical framework.
That’s not to say that one won’t exist. It will. Even if it’s by default. But there will always be tension and “the line” will always be moving. Expect it. Prepare for it.
Most importantly, be explicit in the principles that provide the foundation for your Data Ethics, and in the policies that implement them.
Next week I’ll dive more deeply into one of the most significant emerging areas of Data Ethics: generative AI model bias.
0 Comments