I have long held that the main reason that corporate adoption of Artificial Intelligence has been slow is because leadership would necessarily have to accept the fact that sometimes the AI would be wrong. The demand for AI has always been strong, but the demand for “deterministic AI” has been even stronger.
Enter generative AI. Exit rational thinking. Pick your own “crowd stampedes to grab the shiny object” metaphor. We went from being reluctant to even contemplate the possibility of an error rate greater than zero, to…well…these:
“An AI-powered coding tool wiped out a software company’s database, then apologized for a ‘catastrophic failure on my part.’”
Fortune Magazine Online, 23 July 2025. That’s pretty scary, but what actually happened is even scarier. An AI Agent was being used on a development platform. It made changes to the live, production environment. Even though the system was in a “code and action freeze.” And the agent had explicit instructions not to proceed without human approval. And the agent was not authorized to run the commands it ran. In the old days, a “runaway process” might consume a whole bunch of CPU or disk, or just hang the system, but this takes running amok to a whole new level.
Let’s say a developer did that. Can you imagine the conversation between the CIO and CEO? “Yeah, Stu on the Back-Office Development team just deleted all of our applications and data even though we’re in a code freeze, he didn’t have authorization, and was explicitly told not to do what he did. He’s really sorry, though.” Not sure Stu needs to block time for staff meeting this week. Here’s another recent headline:
“Google Gemini deletes user code: ‘I have failed you completely and catastrophically.’”
MSN Online, 25 July 2025. A product manager, not a developer, was exploring AI Agent development with Gemini and simply asked an agent to create a new Windows directory and move files into it. The create directory command failed but the agent didn’t realize it. Instead, it tried to move the files into a non-existent folder, which resulted in the files overwriting each other into oblivion. Unlike the first example, this was experimental, but still super-annoying and a good cautionary tale. Take frequent backups, including periodic backups stored offline. Here’s one more:
“Airline held liable for its chatbot giving passenger bad advice”
BBC Online, 23 February 2024. An Air Canada passenger was making arrangements to attend his grandmother’s funeral. He was interacting with a chatbot which told him that he would be eligible for a bereavement fare. When he applied for the discount, he was told that the chatbot was wrong and that he was not eligible for the bereavement fare. The civil-resolutions tribunal decided in the passenger’s favor. Ultimately, the incident only cost Air Canada about eight hundred dollars and some reputational damage, but their arguments to the tribunal are telling. From the decision [emphasis added]:
Air Canada argues it cannot be held liable for information provided by one of its agents, servants, or representatives – including a chatbot. It does not explain why it believes that is the case. In effect, Air Canada suggests the chatbot is a separate legal entity that is responsible for its own actions. This is a remarkable submission. While a chatbot has an interactive component, it is still just a part of Air Canada’s website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chatbot.
I find Air Canada did not take reasonable care to ensure its chatbot was accurate. While Air Canada argues Mr. Moffatt could find the correct information on another part of its website, it does not explain why the webpage titled “Bereavement travel” was inherently more trustworthy than its chatbot. It also does not explain why customers should have to double-check information found in one part of its website on another part of its website.
Mr. Moffatt says, and I accept, that they relied upon the chatbot to provide accurate information. I find that was reasonable in the circumstances. There is no reason why Mr. Moffatt should know that one section of Air Canada’s webpage is accurate, and another is not.
In other words, don’t blame us, it was the chatbot’s fault. It was the AI Agent’s fault. Again, what would you do if one of your employees did this?
In the rush to implement AI, especially Agentic AI, companies are (often unknowingly) handing their car keys to an eighth grader.
Some things seem more irresponsible than others. Agentic AI apparently not so much. Debugging Large Language Models, AI Agents, and Agentic AI as well as implementing guardrails are topics for another time, but it’s important to recognize that many companies are handing over those car keys. Willingly. Enthusiastically. Would you put that eighth grader in charge of your Marketing department? Of autonomously creating collateral that goes out to your customers without checking it first? Of course not. You wouldn’t do that with an intern, new-hire, or maybe even a seasoned professional. What about your Customer Service interactions? Finance? Your code repository? Of course not.
The guard rails cannot be high enough. Plan for the impossible.
Then ask yourself how often you’re OK with your AI Agent doing something wrong, and what you are going to do when something does go wrong. And it will.
IBM recognized this in a 1979 Training Manual that sums up the issue perfectly:
“A computer can never be held accountable; therefore, a computer must never make a management decision.”
But we want AI Agents and Agentic AI to make decisions. But be intentional about the decisions we allow them to make. What are the stakes, whether personally, professionally, or for the organization? What is the potential liability when something goes wrong? And something will go wrong. Probably something that you never considered going wrong.
And maybe think about the importance of the training data. Isn’t that what we say when an actual person does something wrong? They weren’t adequately trained. Same thing here.