Text-to-Speech Narration is Being Forced on Audio Description Users

A debate continues among audio description users: Should audio description (AD) narrators perform in a neutral style which mirrors the objective quality of description or opt for a more performance-oriented cadence that reacts to each scene’s tone? A case can be made for either style but, despite this disagreement, AD users seem to agree on at least one thing: Text-to-Speech (TTS) narration is terrible.

Users’ complaints about audio description are often peeves, issues that could use some massaging to improve the experience by a small degree. However, grievances concerning a TTS narrator nearly always describe a ruined experience and an inability to suffer through this type of narration.

If it seems obvious that a grating computer voice is no substitute for mellifluous human tones, that’s because it is. The thousands of complaints and internet comments on the subject merely confirm what is all but a fact.

Given the obvious drop in quality from a human voice actor to a TTS narrator, we must conclude that the latter’s use is willful ignorance on the part of providers. It’s especially upsetting that the offending streaming services use a mixed bag of TTS and human description. This tactic intentionally makes it harder for consumers to ‘vote with their dollars’ because in on-demand marketplaces they have no way of knowing if a title has TTS description before purchasing it.

This issue widens a familiar chink in the armor of the otherwise fabulous 21st Century Communications and Video Accessibility Act. The act specifies that a certain percentage of a company’s content must be described but does not ensure the quality of the audio description. This gives companies that only provide description to stay out of legal trouble free rein to produce unlistenable audio description narration tracks.

If litigation is the only thing that will motivate some folks, we’ve got to implement legislative protections against this type of low-end content’s production. Therefore, I urge the reader to reach out to the American Council of the Blind or similar organizations that consolidate the voice of the VI community. Make these representatives aware of the egregiousness of this issue and how common your grievance is.

Some visually impaired users think that a Devil’s bargain can be struck. They believe that while TTS is lower quality, its automated nature would proliferate audio description more quickly. This misconception stems from some users’ belief that text-to-speech audio description is also written by a computer. This is not so. A program advanced enough to decide what images best serve a visual story and craft a supplemental narrative has not yet been built. Given that scriptwriting is the longest, most costly part of audio description production, further implementing TTS would have a marginal impact on audio description’s availability.

Wadjet will never produce a description track with a TTS narrator. We are committed to hiring wonderful performers who not only voice our scripts with verve and style, but who also reflect the cultures and life experiences represented in the programs and visual narratives we proudly make accessible.

2 comments

Tansy Alexander says:

December 14, 2021 at 12:35 am

Thanks for illuminating the subject of writing costs and for this:

“…TTS is lower quality. It’s automated nature would proliferate audio description more quickly. This misconception stems from some users’ belief that text-to-speech audio description is also written by a computer. This is not so.”

Loading...

Jane Purcell says:

September 23, 2024 at 8:39 am

I write and voice audio description for ITV. The RNIB had sent out a survey to its members which included six audio described clips and asked everyone to feedback about them. The script was written by an experienced describer but voiced by AI. And the response was 100% ‘NOPE’. Nobody wanted AI voices – they can always tell – and it’s not the obvious stuff like getting the emphasis on a word wrong – it’s the lack of warmth and engagement.

The other thing is when I first started, I was always told that AD needed to be objective but when it comes to voicing – I think the viewer needs some engagement and involvement. I remember one viewer commenting about a police procedural, had a lot of descriptions about the wild scenery of Newcastle – that was almost another character. And the viewer said that it felt like an audio book.

Given that profit is the bottom line everywhere, AI does have a place but for the moment it seems like our viewers don’t want it as it stands.

Loading...

Text-to-Speech Narration is Being Forced on Audio Description Users