OpenAI’s Whisper transcription tool has hallucination issues, researchers say
Software engineers, developers, and academic researchers have serious concerns about transcriptions from OpenAI’s Whisper, according to a report in the Associated Press.
While there’s been no shortage of discussion around generative AI’s tendency to hallucinate — basically, to make stuff up — it’s a bit surprising that this is an issue in transcription, where you’d expect the transcript closely follow the audio being transcribed.
Instead researchers told the AP that Whisper has introduced everything from racial commentary to imagined medical treatments into transcripts. And that could be particularly disastrous as Whisper is adopted in hospitals and other medical contexts.
A University of Michigan researcher studying public meetings found hallucinations in eight out of every 10 audio transcriptions. A machine learning engineer studied more than 100 hours of Whisper transcriptions and found hallucinations in more than half of them. And a developer reported finding hallucinations in nearly all the 26,000 transcriptions he created with Whisper.
An OpenAI spokesperson said the company is “continually working to improve the accuracy of our models, including reducing hallucinations” and noted that its usage policies prohibit using Whisper “in certain high-stakes decision-making contexts.”
“We thank researchers for sharing their findings,” they said.