Machine translation (MT) and post-editing are inextricably connected for many within the translation world. No matter how good MT output might be, it cannot be trusted for publication-ready quality without a human post-editor evaluating the accuracy and correcting the translation. There are some exceptions, such as the Microsoft knowledgebase, but even that is post-edited, albeit with the P3 (post-published post-editing) process, a form of end-user post-editing that is strongly advocated by Chris Wendt, who leads Microsoft’s program management team for MT development.
Although it might have gone almost unnoticed in the “MT camp,” professional translators’ real use of MT is integrated increasingly into existing processes. True, there are still the “traditional” post-editors who work primarily on raw MT, but as any translation vendor who has tried to hire one can tell you, they’re hard to find. Why? Well, it’s a process for which the typical translator wasn’t trained, and it generally doesn’t match the expectation that translators bring to their jobs. Recognizing both this situation and the existence of valuable data, even in publicly available general MT engines, translation environment tool vendors looked at ways to bring that data into the workflow (aside from just displaying full-segment suggestions from MT systems that often aren’t particularly helpful). Here are some examples:
- A number of tools, including Wordfast Classic and Anywhere, Trados Studio, Déjà Vu, and CafeTran, use auto-suggest features that propose subsegments of MT suggestions (which invariably are more helpful than the whole segment). In some cases, such as with Wordfast and Déjà Vu, these even come from a number of different MT engines.
- Déjà Vu uses MT fragments to “repair” fuzzy translation memory (TM) matches.
- Star Transit uses a process called “TM-validated MT,” in which the communication goes the other way: content in the TM is used to evaluate MT suggestions. A similar process is currently being developed for OmegaT.
- Lift uses MT to identify subsegment matches in TMs so that even a TM with very little content can produce valid subsegment suggestions. (Freelance translator Kevin Flanagan developed Lift as his PhD project at Swansea University. Kevin now works for SDL, and his technology will surely see the light of day in various SDL products.)
- Another tool, Lilt, uses a system that updates the MT engine with every finished segment and interactively changes the MT suggestion with every word you enter.
- Lilt also uses MT to determine the formatting of the target segment automatically.
- And there is clearly more to come in the creative use of MT as a productivity aid to the professional translator.
Considering all this, it’s clear that the old, used-up paradigm of being paid by the word will no longer work for a good part of the translation world. Why? Because what we first tried in the infancy of TM in the 1990s—when we didn’t tell our clients that we’d implemented new ways of reusing content and were able to really jack up our profits heavily for some projects—isn’t going to work anymore. We’re past that kind of clandestine dealing, both in an ethical sense and in a general 21st-century kind of way where processes are much more transparent.
With TM-based translation, it was eventually relatively easy (though painful for some) to share some of the savings with clients (whether language services providers or direct clients). There’s no translation environment tool that doesn’t allow for a perfect/fuzzy match and repetition analysis, and it was (and is) a matter of negotiation between you and your clients on how to deal with those.
When post-editing MT entered the picture more prominently some five or so years ago, new ways of finding compensation had to be developed. Some used a time-based paradigm, some an assumption that MT in general equals the quality of a certain percentage TM match. But probably the most transparent measurement was to calculate the edit distance (i.e., measuring how many edits were made to any one segment, which then could be used in a fuzzy-match-like scheme to come up with a fair compensation).
New technology—particularly the way we use MT—has evolved into an activity that I think is virtually impossible to measure. MT is no longer post-edited but is deeply integrated (and will be forever more) into our existing processes, and there might be many different MT sources that provide resources for us rather than just one. Will it make us more productive? Well, it had better, otherwise there’s no good reason for us to use it in the first place. Will the added productivity be consistent enough to use as a measuring mechanism? I’m absolutely certain that’s not the case.
So what do to?
At a recent conference in Reykjavik, it was suggested that this evolving technology will require us to move away completely from pricing by the word, line, or page and learn how to quote by project and/or time. It makes sense. After all, virtually everyone else in the professional world (outside of translation) operates this way. You can imagine what the immediate response was: “My clients will never go for that!”
Well, maybe not, but we were the ones who taught our clients to expect quotes on the basis of word counts. Now that it’s moving into the realm of the impossible, it can be up to us again to teach our clients that we now charge differently. Comparative pricing for any given project will help our clients understand the ultimate benefit to their bottom line.
I can’t wait to throw off the shackles of word counts and operate like a professional who can figure out how much to charge for a project, just like my electrician or lawyer. And ironically, this will happen (I think) because of advancements in technology. Who would have thought?
Jost Zetzsche is the co-author of Found in Translation: How Language Shapes Our Lives and Transforms the World, a robust source for replenishing your arsenal of information about how human translation and machine translation each play an important part in the broader world of translation. Contact: email@example.com.