Beyond Post-Editing: Advances in Interactive Translation Environments

Source: The ATA Chronicle

Here are a few new approaches regarding the future of machine translation that go beyond post-editing, along with some practical tools in interactive and adaptive translation technology.

Post-editing was never meant to be the future of machine translation (MT). For researchers seeking fully automatic translation, post-editing is considered more of a failure mode. For human translators, it often forces the user to correct erroneous output. But translation memory (TM), which is essentially a deterministic MT system augmented with heuristics, is not the future either. So what should we expect? Is the current post-editing technology our best hope? To help answer these questions, let’s begin with a short history of post-editing before delving into more recent developments in machine-assisted translation technology.

1960s: First Experiences with Post-Editing

In the January 1965 issue of Physics Today, Robert Beyer, a professor of physics at Brown University, described his experience post-editing a scientific paper from Russian into English. He was participating in a National Science Foundation (NSF) program started in 1955 for the purpose of translating Soviet physics journals. Ten years later, the program was generating 15,000 pages annually at a cost of $500,000, which was covered through subscriptions. It was a popular and necessary program. For example, Beyer observed that Sputnik was not an instantaneous achievement, but it had been “foreshadowed in their [Russian] literature, but this was largely unknown in the West.” Beyer noted that a language barrier, as even the tourist knows, is an effective way of discussing secrets in plain view.

The problem was how to broaden NSF’s program to other languages and other academic fields to cover an ever-growing range of content. For example, the Journal of Experimental Physics of the USSR alone had expanded from 1,500 to 4,500 pages per year during NSF’s program. Beyer enumerated possibilities for handling this information deluge, including MT. He provided an anecdotal evaluation of the latest MT technology, calling the experience a “slave’s eye view”:

It seemed like a good idea … [but] I must confess that the results were most unhappy. I found that I spent at least as much time in editing as if I had carried out the entire translation from the start. Even at that, I doubt if the edited translation reads as smoothly as one which I would have started from scratch … Someday, perhaps, the machines will make it, but I as a translator do not yet believe that I must throw my monkey wrench into the machinery in order to prevent my technological unemployment.¹

It turned out that “someday” was not to come any time soon. In 1966, federal funding for MT research was reduced dramatically, resulting in an “MT winter” that lasted about 20 years.

1990s: Do Statistics Improve Post-Editing?

In 1993, Ken Church and Ed Hovy, then at AT&T Bell Labs, wrote a widely cited paper with the provocative title “Good Applications for Crummy Machine Translation.”² Statistical MT had been invented in the late 1980s, superseding the rule-based systems of the 1960s. Commercial MT systems had been successful in narrow domains such as translating weather forecasts (e.g., the METEO system used at Environment Canada from 1982 until 2001), but the state-of-the-art was still “crummy.”

Instead of continuing the interminable quest for general-purpose MT, Church and Hovy argued, why not identify more “high-payoff” niche applications for MT? A good niche application would have several characteristics, among them attractiveness to intended users. Church and Hovy stated that post-editing “would appear to be a natural way to get value out of a state-of-the-art MT system … unfortunately, the application fails to meet most of the desiderata” for a niche application. They explained that MT had also failed “to gain much acceptance among the intended target audience of professional translators because post-editing turns out to be an extremely boring, tedious, and unrewarding chore.”

Church and Hovy suggested that post-editing could be more attractive “if the user interface were made more flexible and user-friendly.” However, the best applications for MT would be those in which quality could be traded for speed, convenience, or cost. This turned out to be a prescient suggestion, considering that the highest-impact application of Google Translate and Microsoft Translator is fast and free cross-lingual web browsing.

Present Day: Post-Editing

Integrated into CAT tools In the 22 years since Church and Hovy made their suggestions, MT systems have been integrated into every major computer-assisted translation (CAT) tool. However, the basic experience remains as unimaginative as ever: users are presented with a pre-populated, mutable text box.

Are translators more receptive to MT today? A 2015 paper by Joss Moorkens and Sharon O’Brien, both researchers at the Center for Next-Generation Localization at Dublin City University, compared professional and novice translators’ use of post-editing along three dimensions: throughput, number of edits, and attitude.³ They found that professionals tended to be faster at post-editing even with more edits per segment. Nevertheless, Moorkens and O’Brien were pessimistic about the results of their study. Only one of nine professionals surveyed rated the experience positively (three were neutral and five negative). The reasons given for the lackluster rating were “lack of creativity, tediousness of the task, [and] limited opportunity to create quality.”

Fifty years of MT technology development has not made a fundamentally tedious task less tedious. One reason may be that the standard post-editing interface violates several basic precepts of human-computer interaction (HCI) design. For example, if the MT system proposes a bad translation, the standard interface requires the translator to undo the suggestion. Also, most MT systems do not learn, so mistakes are repeated. Translation memories (TM), despite being an antiquated technology, do not make these mistakes, perhaps explaining their continued popularity.

Translation as Human-Machine Interaction

The shortcomings of post-editing were recognized as early as the late 1960s, when Martin Kay, the pioneering computational linguist, began to envision interactive translation systems. In his landmark 1980 position paper, Kay suggested the following approach to machine-assisted translation:

I want to advocate a view of the problem in which machines are gradually, almost imperceptibly, allowed to take over certain functions in the overall translation process. First they will take over functions not essentially related to translation. Then, little by little, they will approach translation itself. The keynote will be modesty. At each stage, we will do only what we know we can do reliably. Little steps for little feet!⁴

In Kay’s scheme, the machine always defers to the human. The human remains in control; the machine is subservient. Standard post-editing interfaces err by inverting these roles, strongly encouraging the human to correct the mistakes created by the machine.

Interactive MT has historically been a peripheral research topic—the MT community is more interested in fully automatic translation—but there has been a surge of recent interest. Now that even basic post-editing has been shown to increase throughput, researchers are attempting to integrate MT more deeply into CAT environments.⁵ The research prototypes that have been built will likely predict the future of commercial CAT tools.

Recent Prototype Interactive

MT Systems Three significant interactive MT systems have been built over the past decade: TransType, CasmaCat, and Predictive Translation Memory (PTM). Only CasmaCat, which has been commercialized as MateCat, is probably known to the professional translation community. The major innovations illustrated by these systems are:

Predictive typing for dynamic completions of partial translations as the user types.
Model adaptation, in which the MT system adds words and phrases for confirmed translations.
Confidence measures for suggested words and phrases.
Advanced interaction with and visualization of source-target word alignments, translation alternatives, and source coverage.

TransType
TransType (and its successor TransType2) was the first interactive system based on modern statistical MT.⁶ Funded by the Natural Sciences and Engineering Research Council of Canada in 1997, the development of TransType was motivated by the need for faster translation of Canadian government proceedings, which by law had to be published in French and English. The principal innovation was a dynamic autocomplete box that appeared in the text editor. (See Figure 1 below.) Since the machine based its suggestions on human input, this mode came to be called human-centered MT. The autocomplete box could display both character-level and word-level completions. TransType presaged Trados AutoSuggest, which appeared in Studio 2009, by a decade.

FIGURE 1: The TransType2 interface. The autocomplete box provides variable length suggestions that are selected based on a model of user utility.

The implementation details of the autocomplete feature are significant. Users of Google Translate might not know that MT systems can generate very long lists of alternative translations that include differences in vocabulary, word order, and length. After TransType filtered its list according to the user’s partial translation, plenty of alternatives often remained. Contrast this design with modern CAT tools, which cull a few alternatives from TM matches or single-best translations from Google Translate. This diminishes the value of autocomplete, as the translator is likely to diverge from this short list of alternatives.

CasmaCat/MateCat
CasmaCat was developed by a consortium of European universities starting in 2011.⁷ The research program was designed to improve upon TransType, which had yielded disappointing productivity results in user studies. In addition, a central goal of CasmaCat was to develop adaptive MT so that the system could improve with use. Repeat mistakes after human correction were a common criticism of TransType.⁸

Figure 2 shows the CasmaCat interface, which should be familiar to users of MateCat, the commercial counterpart. Documents are arranged in a two-column view similar to TransType. For the current segment, the source text appears on the left and the target text entry box on the right. As shown in Figure 2, the user has partially entered a translation (black text), and an MT system (in this case, Moses) has predicted a completion. Color encodes a confidence score for each suggested word. Suggestions from additional sources, such as TM and Google Translate, are shown below the main editing area.

The considerable MT innovations supporting the CasmaCat interface are largely invisible to the human user. The system generates very long lists of alternatives so that the autocomplete predictions are more robust for user editing. Moreover, when the user confirms a segment by pressing the “Translated” button, the MT system extracts words, phrases, and statistics from the new sentence pair. The system can then provide exact and even sub-segment matches for future input.

FIGURE 2: The CasmaCat system, which uses the same basic user interface as MateCat, but with advanced interaction and visualization features.

Predictive Translation Memory/Lilt
Predictive Translation Memory (PTM) was an interactive MT system developed at Stanford University in 2012.⁹ The system was organized according to the principles of mixed-initiative design, an HCI term for collaborative human-machine interfaces. The shortcomings of post-editing, such as “graceful degradation” and “learning by observing,” were addressed explicitly in the design of both the interface and the statistical MT system that provided suggestions.

PTM demonstrated several new interface concepts, among them minimizing the distance between source and target segments (via an interleaved layout), reordering of target words and phrases via keyboard interaction, and dynamic shading of translated source words. These functions were supported by a statistical MT backend system that could regenerate suggestions at typing speed based on the user’s partial translation. PTM was evaluated for English>German and French>English translation by 32 professional translators. It was the first interactive MT system to show a measurable throughput improvement relative to both post-edit and scratch translation.

The next version of PTM is a commercial product called Lilt. Figure 3 shows the interface, which features interleaved layout, predictive typing with suggestions from both MT and subsegment-level TM, and system adaptation. In the text area shown in Figure 3, the user’s partial translation is in black, and the system’s best prediction is just below the text area. At the right is a dictionary/concordance that exposes the millions of sentence pairs used to train the MT system. The user can interactively explore the parallel entries from which both MT and TM derive their suggestions.

FIGURE 3: The Lilt translation editor, which is based on the Predictive Translation Memory prototype developed at Stanford University.

Where Do We Go From Here?

Neither post-editing nor TM are the future of machine-assisted translation. In his July “Geekspeak” column in this magazine, Jost Zetzsche called for “deeper and different integration of machine translation into our translation environments.”¹⁰ We have seen that this proposal dates to at least the late 1960s, but commercial systems implementing it have only begun to appear recently. SDL has announced an adaptive MT product—XTM—that will likely bring a deeper level of MT integration to Trados. The interactive extensions of CasmaCat might soon appear in its commercial counterpart, MateCat. Lilt is building a commercial version of PTM that will be available by the time you read this article.

Perhaps now is the time for translators who were put off by past experiences with MT to consider giving the more interactive integrations of the technology another look. 

Notes

Beyer, Robert T. “Hurdling the Language Barrier,” Physics Today (volume 18, 1965),46-52, bit.ly/Beyer-1965.
Church, K. W., and E. Hovy. “Good Application for Crummy Machine Translation,” Machine Translation (volume 8, 1993), 239–258, bit.ly/Church-machine.
Moorkens, Joss, and Sharon O’Brien. “Post-Editing Evaluations: Tradeoffs between Novice and Professional Participants” (Dublin City University, 2015), http://bit.ly/moorkens.
Kay, Martin. “The Proper Place of Men and Machines in Language Translation,” Technical Report CSL-80-11 (Xerox Palo Alto Research Center, 1980), bit.ly/Kay-Martin.
Guerberof, Ana. “Productivity and Quality in the Post-Editing of Outputs from Translation Memories and Machine Translation,” International Journal of Localization (volume 7, 2009), 11–21, bit.ly/Guerberof. See also: Plitt, Mirko, and François Masselot. “A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localization Context,” The Prague Bulletin of Mathematical Linguistics (January 2010), 7–16, bit.ly/Plitt-Masselot.
Casacuberta, Francisco, Jorge Civera, Elsa Cubel, Antonio Lagarda, Guy Lapalme, Elliott Macklovitch, and Enrique Vidal. “Human Interaction for High-Quality Machine Translation,” Communications of the ACM (Association for Computing Machinery, October 2009), 135–138, bit.ly/ACM-machine.
Sanchis-Trilles, Germán, Vicent Alabau, Christian Buck, Michael Carl, Francisco Casacuberta, Mercedes García-Martínez, et al. “Interactive Translation Prediction versus Conventional Post-Editing in Practice: A Study with the CasMaCat Workbench,” Machine Translation (November 2014), 1–19, bit.ly/Sanchis-Trilles.
Macklovitch, Elliott. “TransType2: The Last Word,” Proceedings of the 5th International Conference on Languages Resources and Evaluation (May 2006), bit.ly/Macklovitch.
Green, Spence, Jason Chuang, Jeffrey Heer, and Christopher D. Manning. “Predictive Translation Memory: A Mixed-Initiative System for Human-Language Translation” (Association for Computing Machinery Symposium on User Interface Software and Technology, 2014), bit.ly/Green-predictive.
Zetzsche, Jost. “Where Are We Headed?” The ATA Chronicle (July 2015), 32, http://bit.ly/GeekSpeak-July.

Spence Green is a co-founder of Lilt, a provider of interactive translation systems. He has a PhD in computer science from Stanford University and a BS in computer engineering from the University of Virginia. Contact: spence@lilt.com.