MIT researchers have discovered that advanced AI language models, despite their impressive performance, lack genuine understanding of real-world scenarios, potentially limiting their reliability for scientific applications and real-world tasks.The research, presented at the Conference on Neural Information Processing Systems, specifically examined large language models (LLMs) similar to those powering popular AI systems.
Dr. Ashesh Rambachan, assistant professor of economics and principal investigator at MIT's Laboratory for Information and Decision Systems, led a team that uncovered concerning gaps in AI's comprehension capabilities. Their findings challenge the widespread assumption that AI systems develop accurate internal models of the world through their training.
The researchers conducted a revealing experiment using New York City navigation as a test case. While the AI could provide accurate turn-by-turn directions under normal conditions, its performance dropped dramatically when faced with minor changes like street closures or detours. According to the MIT News article, when lead author Keyon Vafa and the team closed just 1 percent of possible streets, accuracy "immediately plummets from nearly 100 percent to just 67 percent."
"One hope is that, because LLMs can accomplish all these amazing things in language, maybe we could use these same tools in other parts of science," Rambachan tells MIT News. "But the question of whether LLMs are learning coherent world models is very important if we want to use these techniques to make new discoveries."
The team developed two new metrics to evaluate AI's understanding: "sequence distinction" and "sequence compression." These measurements revealed that even when AI systems appeared to perform tasks successfully, they often relied on flawed internal representations. As reported by MIT News, when researchers recovered the AI's internal maps of New York City, they found "imagined" versions with "hundreds of streets crisscrossing overlaid on top of the grid" and "random flyovers above other streets or multiple streets with impossible orientations."
Similar tests with the board game Othello demonstrated that AI could make valid moves without truly understanding the game's rules, much like memorising answers without comprehending the underlying concepts.