Why captions are suddenly everywhere and how they got there

People with hearing loss have a new ally in their efforts to navigate the world: Captions that aren’t limited to their television screens and streaming services.

The COVID pandemic disrupted daily life for people everywhere, but many of those with hearing loss took the resulting isolation especially hard. “When everyone wears a mask they are completely unintelligible to me,” said Pat Olken of Sharon, Massachusetts, whose hearing aids were insufficient. (A new cochlear implant has helped her a lot.) 
So when her grandson’s bar mitzvah was streamed on Zoom early in the pandemic, well before the service offered captions, Olken turned to Otter, an app created to transcribe business meetings. Reading along with the ceremony’s speakers made the app “a tremendous resource,” she said. People with hearing loss, a group estimated at roughly 40 million U.S. adults, have long adopted technologies to help them make their way in the hearing world, from Victorian-era ear trumpets to modern digital hearing aids and cochlear implants. But today’s hearing aids can cost upward of $5,000, often aren’t covered by insurance and don’t work for everyone. The devices also don’t snap audible sound into focus the way glasses immediately correct vision. Instead, hearing aids and cochlear implants require the brain to interpret sound in a new way.

“The solutions out there are clearly not a one-size-fits-all model and do not meet the needs of a lot of people based on cost, access, a lot of different things,” said Frank Lin, director of the Cochlear Center for Hearing and Public Health at Johns Hopkins University. That’s not just a communication problem; researchers have found correlations between untreated hearing loss and higher risks of dementia.

Cheaper over-the-counter hearing devices are on the way. But for now, only about 20% of those who could benefit from hearing aids use one. Captions, by contrast, are usually a lot easier to access. They’ve long been available on modern television sets and are cropping up more frequently in videoconferencing apps like Zoom, streaming services like Netflix, social media video on TikTok and YouTube, movie theaters and live arts venues.

In recent years, smartphone apps like Otter; Google’s Live Transcribe; Ava; InnoCaption, for phone calls; and GalaPro, for live theater performances, have emerged. Some are aimed at people with hearing loss and use human reviewers to make sure captions are accurate. Others, like Otter and Live Transcribe, instead rely on what’s called automatic speech recognition, which uses artificial intelligence to learn and capture speech. ASR has issues with accuracy and lags in transcribing the spoken word; built-in biases can also make transcriptions less accurate for the voices of women, people of color and deaf people, said Christian Volger, a professor at Gallaudet University who specializes in accessible technology.