MIT researchers have discovered that large language models (LLMs) may develop their own understanding of reality as they improve their language abilities, challenging previous assumptions about artificial intelligence and language comprehension.
The research involved training an LLM on solutions to small Karel puzzles, which require instructing a robot in a simulated environment. Despite never being shown how the solutions worked, the model achieved a 92.4% accuracy rate in generating correct instructions after extensive training.
Using a technique called "probing," the researchers peered into the model's internal processes. They discovered that the LLM had spontaneously developed its own conception of the underlying simulation, indicating a deeper understanding of the instructions beyond mere mimicry.
To validate their findings, the team conducted a "Bizarro World" experiment, altering the meanings of instructions for a new probe. The results further supported the conclusion that the LLM had embedded original semantics within its structure.
While the study used a simple programming language and a relatively small model, it opens up new avenues for research into AI language comprehension. Future work may build on these insights to improve LLM training methods, and explore whether these models use their internal understanding to reason about (perceived) reality actively.