The AP reports that OpenAI's Whisper, an artificial intelligence transcription tool marketed for its "human level robustness and accuracy," is regularly inventing text that was never actually spoken. Through interviews with more than a dozen software engineers, developers, and academic researchers, AP uncovered that these fabrications include racial commentary, violent rhetoric, and even imaginary medical treatments.
A recent study by Cornell University's Assistant Professor Allison Koenecke and University of Virginia's Professor Mona Sloane, reviewed thousands of audio snippets from Carnegie Mellon University's TalkBank repository. According to their findings reported to AP, nearly 40% of the tool's hallucinations were harmful or concerning, because they misrepresented the original speaker.
The AP investigation revealed the scope of the problem through multiple sources: a University of Michigan researcher found hallucinations in eight out of every 10 public meeting transcriptions, a machine learning engineer discovered fabrications in roughly half of over 100 hours of transcriptions, and another developer found false content in virtually all of the 26,000 transcripts analysed.
"This seems solvable if the company is willing to prioritise it," William Saunders, a former OpenAI engineer who quit in February over concerns with the company's direction, told AP. "It's problematic if you put this out there and people are overconfident about what it can do and integrate it into all these other systems."
Despite OpenAI's warnings against using Whisper in "high-risk domains," AP reports that over 30,000 clinicians and 40 health systems, including Children's Hospital Los Angeles and the Mankato Clinic in Minnesota, are using a Whisper-based tool, developed by Nabla, to transcribe doctor visits.
The AP notes that the Deaf and hard of hearing community faces particular risks, stressing tha this population has no way to identify fabrications.
As reported by AP, experts, advocates, and former OpenAI employees are now calling for federal government intervention through AI regulations, while urging OpenAI to address these critical flaws in their technology.