Speech Recognition to Go

Source: The ATA Chronicle

Repetitive stress injuries have a long history in medical literature. The first known report of this condition appeared more than 300 years ago, written by Bernardino Ramazzini, an Italian physician who described the suffering of clerks and industrial workers.¹ Today, these cumulative trauma conditions account for about a third of workers’ compensation cases in the United States.² Many translators, writers, and editors are all too familiar with the consequences of long sessions with keyboards and devices such as trackballs and computer mice.

In the past two decades, speech recognition technology, which originated at Bell Labs in the early 1950s with a system for single-speaker digit recognition, has evolved into a technology that helps people avoid such injuries. Speech recognition also offers significant working advantages to those who spend much of their time composing on computers. The following offers an overview of developments in this area, including some of the most popular software options.

What’s Available?

In my opinion, Nuance³ provides some of the highest quality, best known commercial speech recognition solutions currently available. These solutions have formed the basis for voicemail recognition applications from Cisco, Apple’s Siri, and other popular platforms. The solution best known to desktop computer users is Dragon NaturallySpeaking (DNS), which is currently available for seven languages in DNS version 13. But the real action for speech recognition is now with mobile devices. For these, Nuance offers high-quality speech recognition for more than 40 languages, including Arabic, Chinese, and Russian. The application programming interfaces for these solutions are available to developers, in some cases at no cost.⁴

Among the most popular mobile speech recognition solutions are Apple’s integrated iOS recognition for iPhones, iPads, and iPods, the free Dragon Dictation app for iOS, and Swype + Dragon Dictation for Android. These offer many possibilities for translators and interpreters.

But Does It Really Work?

Although I suffer from painful carpal tunnel syndrome for which speech recognition technology offers relief, I became a “true believer” only after I discovered how much more relaxed I am when I “write” by speaking and have my hands free to touch my computer screen and use my fingers to mark points of reference for untangling particularly long, nasty German patent claim sentences.

The quality of my draft text also tends to be better, although identifying “dictos” (transcription errors from automated speech recognition) can be tricky. These errors can’t always be found through a spell check, so different methods are needed for effective post-editing. Since “dictos” will usually always be spelled correctly and may even sound plausible in the context in which they appear, careful attention to source-text correspondence is often necessary during final review. In English, the speech recognition engine I use has an annoying tendency to confuse definite and indefinite articles. Recognition quality is generally best if you speak entire phrases, clauses, or even sentences. I find that casting a relaxed eye on the phrase or sentence immediately after it is transcribed is the most effective way to catch my “dictos” (some of which can be quite entertaining).

Claims of improved working speed are real. (I know colleagues who produce in excess of 10,000 words per day of reasonable translation work with the help of speech recognition.) I’ve also relied on speech recognition when facing tight deadlines. However, the real value for me is that I can work with greater concentration, consider what I want to say with less distraction, and produce a better text. Given the improved quality of my texts composed with the help of speech recognition, using it would be worthwhile even if it slowed me down a bit, but fortunately that has not been the case.

Are There Other Options Besides Dragon NaturallySpeaking?

After I moved to Portugal two years ago, I saw the disadvantage of colleagues working into Portuguese without the speech recognition solutions I enjoy for English and German. Initially, we thought that there weren’t any commercial options for Portuguese available. But early this year, when David Hardisty at the Universidade Nova in Lisbon shared his experience with speech recognition using the Macintosh Yosemite operating system, we learned of a small treasure trove of options for those working in languages not served by Dragon NaturallySpeaking.

I discovered the free app Dragon Dictation for iOS and began to test novel dictation workflows for writing blog posts and translating. I developed a three-stage integrated translation workflow while using this app:

Phase 1: I dictate the draft text on my iPhone (advancing the cursor in my favorite CAT tool to access the reference information) and e-mail it to my desktop PC.

Phase 2: As a first review, I align the e-mailed text with the source text.

Phase 3: As a second review, I translate the source text from the alignment, with subsequent correction, tagging, and automated quality assurance.

Many of those with whom I shared this solution prefer to have the transcribed text written directly into their working translation environment, whether that be a word processor or tools like Kilgray’s memoQ or SDL Trados Studio.

The first mobile solution we found for this was myEcho for iOS, which uses the secure remote Nuance servers and allows text dictation from an iPhone or iPad at the cursor location of any connected PC running Windows.⁵

At memoQ Fest 2015, Jim Wardell spoke about the Swype⁶ virtual keyboard (another Nuance app) for Android, which includes an Android version of Dragon Dictation. It allows an entire text to be dictated into an e-mail message for subsequent alignment. Other tools, such as Google’s Chrome Remote Desktop, can be used for direct dictation from the Android mobile device into an application such as Microsoft Word or a translation environment tool running under Windows.

The various free or inexpensive mobile apps for speech recognition as well as the integrated speech recognition in the Mac operating system can also be improved for translation or other writing work by adding customized vocabularies. Although the mobile solutions do not offer all of the editing functions available in Dragon NaturallySpeaking for Windows or Dragon Dictate for MacOS, their accuracy may be higher. The greater number of languages available with these mobile apps make the proven benefits of speech recognition accessible to an estimated two billion users around the world.⁷

A Level Playing Field, New Possibilities

We’re nearing the point where those who need to compose text on a computer in any common language will be able to work comfortably in most software applications using voice recognition integrations through mobile applications, web browsers, or other means. The mobile methods generally require an Internet connection, but the cost of these solutions (free or just a few dollars) is lower than the stand-alone PC software solution Dragon NaturallySpeaking. Even better, smartphone users are literally just a few finger taps away from testing the benefits for themselves. As for data security and privacy, at memoQ Fest 2015, representatives from Nuance revealed that their online servers meet the highest security standards and are trusted by the U.S. government and companies like IBM.⁸

Nuance also announced recently that Dragon Anywhere will be released this year.⁹ This subscription mobile device app will bring full voice-controlled editing to smartphones and tablets and integrate with the company’s speech recognition tools for desktop and laptop computers. Custom vocabulary and other features will be synchronized among all devices. It’s not yet known whether this new app will offer speech recognition for every language available for Dragon Dictation or Swype, or whether the possibilities will be more limited, as with the current generation of Nuance recognition software for Mac and Windows operating systems.

Improved Occupational Health

Regardless of the software you use, the health benefits of getting your hands off the keyboard and mouse are clear. The use of speech recognition in schools, at home, and in the workplace can help reduce the appalling incidences of repetitive stress injuries. At the same time, it allows users to maintain, or possibly increase, their work volume. It’s also possible that such technology leads to better writing as a result of more focused, relaxed work. Though these benefits are anecdotal and will vary among individuals, they are often cited by happy users of speech recognition. As one of my colleagues noted, “It’s harder to speak a stupid-sounding sentence than to type one.”

Notes

Ramazzini, Berbardino. De Morbis Artificum Diatriba [Diseases of Workers] (Modena Italy, 1700), http://bit.ly/Ramazzini.
Bierma, Paige. “Repetitive Stress Injury (RSI),” HealthDay (March 11, 2015), http://bit.ly/Bierma.
www.nuance.com
www.nuance.com/for-developers/index.htm
http://myechoapp.com
https://itunes.apple.com/us/app/swype/id916365675?mt=8
Jim Wardell’s presentation of speech recognition and Swype at memoQ Fest 2015 is available at https://youtu.be/icKcrs4CAls.
Ibid.
Lossner, Kevin. “Enter the Dragon, Anywhere!” Translation Tribulations (August 18, 2015), http://bit.ly/dragon-anywhere.

Kevin Lossner is a certified memoQ trainer with three decades of experience teaching the use of software tools for practical work and problem solving. A former research chemist, information technology systems developer, and consultant, he now devotes his time to translating patents, technical marketing, and other specialized texts from German into English. You can find his blog at translationtribulations.com. Contact: translation@lossner.net.

What’s Available?

But Does It Really Work?

Are There Other Options Besides Dragon NaturallySpeaking?

A Level Playing Field, New Possibilities

Improved Occupational Health

Notes

A Level Playing Field, New Possibilities