Recent advances in large language models have reignited interest in Artificial General Intelligence (AGI). It’s coming soon! Just like it has every couple of decades since the 1950s. But the trajectory of recent successes make it seem more plausible this time.
It doesn’t help that AGI is a nebulous concept. Definitions vary, but it is typically described as the ability of a computer system to learn, reason, adapt, and solve problems at least as proficiently as a human. We’ve been told that we’ll recognize it when we see it.
One day about a year ago I was playing around with ChatGPT and asked it to multiply two moderately large numbers. It confidently responded with the wrong answer. I informed it that it was wrong and asked it to try again. It apologized and then confidently responded with a different wrong answer.
I wrote about the experience at the time when I revealed Neural Network (and ChatGPT) Magic Secrets. But since then, I got wondering whether this is a reasonable thing for an neural network to be able to do, and whether it is useful to even try. That led me to the following hypothesis:
Cooper’s (First) Conjecture:
When Artificial General Intelligence, or something that approaches it, is achieved, these systems will still require dedicated computational engines to do math and to store and retrieve specific facts.
In other words, the computers will use computers.
We’ve long recognized that people are naturally good at certain cognitive activities and computers are naturally good at certain computational activities. The development of AGI moves associative systems closer to cognition, but in doing so, farther away from computation. AGI mimics human cognition, and just as humans require calculators and databases, so too will those AGI systems.
Maybe someone else has made this observation, but I haven’t yet run across it. If so, let me know and I will compliment them on their brilliance. 🙂
I suppose one could include in a large language model training set every different combination of every different mathematical function. Oh, wait. No we can’t. The training set would have to be infinite. (For the number theorists out there, the magnitude is greater than the power set of the uncountably infinite numbers, or at least aleph-two.)
Besides, people don’t memorize every combination of every function. We don’t have to. We memorize the ones that we use frequently or base cases like times tables, and then learn processes like long multiplication and division for accomplishing the calculations that we haven’t memorized. We learn in geometry class that the cosine of 60 degrees is 0.5, but we use a calculator to find that the cosine of 53 degrees is approximately 0.6. (And before calculators with scientific functions were invented, somebody did the calculation by hand or by slide rule and published the result in one of the CRC Handbooks.)
The same applies to “remembering” the details a large number of detailed transactions. That’s what databases are for. Imagine asking a person to remember every one of your company’s sales, and then asking them to write out an invoice for each customer from memory. Ridiculous. Now, imagine asking a large language model to do the same thing. Billions of transactions. Would you trust the result? Is this even an appropriate use of the resource?
Cooper’s (Second) Conjecture:
AGI requires that the learning system recognize its limitations and invent ways to compensate for them without being explicitly trained on those tools or their creation.
A system with Artificial General Intelligence would recognize that using a calculator or database would be easier and more accurate, create those resources, and offload the questions best answered by them. How would it know? The same way that we know: through feedback saying that it got the wrong answer. Even better would be the self-awareness that it is not equipped to provide the answer in the first place, instead of confidently presenting the wrong answer.
To me, that’s a better indicator of AGI.
It’s not about a system recognizing what it knows, but rather recognizing what it doesn’t know and doing something about it.
Throughout history, people seemingly having a surplus of free time have indulged in the thought exercise of identifying the last person that knew everything there was to know. Some believe it was Aristotle. Others, Leonardo da Vinci. When I first ran across this question, Francis Bacon was cited. It’s not a testable theory, and surely something was discovered somewhere that didn’t make it back to one’s candidate of choice. But nobody disputes that whoever it was died at least a century ago and probably much earlier than that. But that kind of universal knowledge isn’t necessary.
Instead, a baseline set of knowledge is possessed by the general population. From there, different people develop expertise in different areas and collectively “know everything.” In The Hitchhiker’s Guide to the Galaxy, Douglas Adams presents the Earth as a massive supercomputer designed to find the Ultimate Question of Life, the Universe, and Everything. It’s like that.
We’re discovering that large language models behave in much the same way. Once a model acquires a baseline set of information, it can be easily trained with additional specific expertise much more easily than training it with both the baseline and specific information from scratch.
Cooper’s (Third) Conjecture:
Although some AGI definitions expect a single system to know everything, comprehensive breadth of knowledge will be achieved through the federation of many systems.
We’re currently training models with trillions of parameters. Thirty years ago, who would have thought?! It isn’t a stretch to envision parameter sets in the thousands of trillions (quadrillion) or millions of trillions (quintillion). These networks will be trained by processors whose throughput dwarfs today’s most powerful supercomputers. Nevertheless, brute-force grows exponentially and is unnecessary. Before brute-force training is even feasible, groups of networks will be trained with specific areas of specialization. Perhaps systems will compete and/or evolve in parallel. A network systems survival of the fittest. This approach is already being used in commercially available learning platforms.
Progress in Artificial Intelligence, especially Neural Networks, accelerates greatly following specific insights and innovations. This time it’s the Transformer Architecture and Large Language Models. It’s unclear, though, whether this alone will take us all the way to AGI or whether another innovation will be required. Who knows? Maybe one of these LLMs will point us in the direction of that next innovation. Maybe that would be evidence of AGI.
0 Comments