How important is interoperability in the translation industry? What is it and how is it achieved? What is being done to improve the interoperability of the tools we use?
In the following we hope to help the reader understand what interoperability is in the context of the translation industry and why it’s critical for how translators and translation companies work. Read on to find out what has been accomplished so far in this area, what’s preventing better interoperability, and some strategies for improving the situation.
What Is Interoperability?
Usually when we talk about interoperability we are talking about compatibility between one translation tool and another (e.g., your preferred computer-assisted translation tool, translation environment tool, or translation memory system). In other words, translation companies and the translators with whom they work should be able to use whatever tool works for them. Then they can all import the files, work on them, and export them without any problems. (Figure 1 below illustrates how this should work.)
Interoperability does not mean that all tools will work in the same way, offer the same functionality, or give you the same results. In an ideal scenario you would be able to work seamlessly with whatever tool you prefer for each project. But things are rarely ideal.
Still, the importance of giving translators the freedom to choose is quite evident. If the tools are not interoperable, a translator might have to use Trados to work with one customer, memoQ for another, and Wordfast for yet another. Let’s not forget the corresponding time and financial investment involved with purchasing, learning, and troubleshooting each tool. Now just imagine if the tools were interoperable. Translators could then pick their favorite tool, master it, and use it for all of their customers.
Another important virtue of interoperability is that it would help reduce “file format panic.” Translators spend a great deal of time every day trying to decipher how to turn the files they receive from a customer into something that they can feed into their translation tools (e.g., translation memory). Having exchangeable, interoperable formats would simplify this task, allowing translators to focus more time on the core value of their business: translation.
What Stands in the Way of Better Interoperability?
If interoperability is such a good thing, what’s preventing this dream from becoming reality? The answer is competition.
Translation technology is a highly competitive but small sector, and each tool vendor is trying to maximize any perceived advantage of one tool over another. As a result, most tool vendors have developed their own proprietary file format. Still, there’s an obvious advantage in a provider being able to work with the proprietary file formats of their competitors. Those proprietary formats are the result of considerable investment from their developers, and like in any other sector, it’s bad news when you see something that you created being used by others in a way that diminishes your revenue.
Another factor slowing down interoperability is the cost associated with developing the technology and functionality to support it. Even if you are the biggest fan of interoperability, making it happen can prove to be quite a financial struggle. For that reason, most of our hopes for interoperability are vested in language industry standards. Over the years, many different XML standards have been through the same lifecycle: a standard is proposed, sufficient work is done to make it a reality, and then, rather than stopping its development or finalizing it, we let the proposed standard stagnate. TMX is another example of a file format created as a standard for the exchange of translation memory data. Little work has been done on it in the past 10 years, but it’s still widely used.
Why does this happen? The truth is that there are very few people working on standards worldwide, and much of that work is done pro bono. Such altruistic efforts can hardly keep up with the evolution of the industry in general and the roadmaps of the tool providers.
In recent years, however, one standard has come closer to breaking the stagnation cycle than the rest and has given us some hope. We are talking about the XML Localisation Interchange File Format, better known as XLIFF.1
XLIFF is an interesting case. The original idea was to develop a single XML-based file format to standardize the way localizable data pass between tools during a localization process. Whatever tool you used, you would be able to import the source documents into it and store them into this fully multilingual and context-rich format. If all tools then used this same file format, it would be fairly easy to switch from one tool to another. However, that’s not what happened.
XLIFF 1.2 (the first really popular version) was a very powerful file format that allowed for “extensions,” which are points in the XLIFF code where a tool vendor could add some tool-specific code to make it match their particular needs. For example, SDL Trados Studio uses SDLXLIFF, so tools such as memoQ need to add functionality to handle this file format. Likewise, memoQ uses MQXLIFF, so tools such as SDL Trados Studio also need to add functionality.
Almost all tool vendors added numerous extensions, which meant that instead of having one common file format, we ended up having a different “flavor” (as the different customizations are usually called) for each tool. The result was that when XLIFF from one tool was imported into another, the best you could hope for was that only a small part of the data wouldn’t be recognized and captured by the other tool. At worst, it would be impossible to import the file at all.
Other issues with XLIFF relate to how it is structured. For instance, an XLIFF file can contain one or more “file” elements (i.e., a specific type of tag that is set inside the code—see Figure 2 for an example). Each element corresponds to a document or file for translation. During the (initial) adoption of XLIFF, some tools recognized only one file element and ignored subsequent ones. So, if an XLIFF file from Tool A, which had three file elements corresponding to three documents, was imported into Tool B, which recognized only one file element, the application would not “see” the second and third documents. As a result, these documents wouldn’t be imported by the tool for translation.
Skeletons are frequently another structural issue. XLIFF files can have a skeleton (a code schema that describes the internal structure of the XLIFF) that can be used to recreate the file after translation. Some tools include the skeleton within their XLIFF files, whereas in others the skeleton is stored in a separate external file (that may not have been provided to the translator). Again, it’s bad news if Tool A doesn’t know how to handle skeletons from Tool B.
How Can We Improve the Situation?
While XLIFF provides an example of what can go wrong, it also showcases how things can be improved. When XLIFF 1.2 was becoming more widely used, the issues with each tool vendor having their own “flavor” of XLIFF became more apparent. People complained that the result was the creation of many new file formats rather than a single interoperable one. A group called Interoperability Now! was formed with the goal of creating better interoperability between tools. Ironically, the failure of a single XLIFF format stirred up the debate around interoperable standards.
XLIFF was developed by a technical committee of the Organization for the Advancement of Structured Information Standards (OASIS), a standards body committed to the development of XML standards for business. The XLIFF technical committee took industry feedback and set out to make XLIFF more interoperable and modular with stricter conformance. However, making it more interoperable meant making some difficult decisions in removing some of its functionality.
The result of this effort is XLIFF 2.0, which was published by OASIS in 2014. This is a leaner, more modular version of XLIFF that will hopefully lead to better interoperability. This was achieved by making it harder to deviate from a core XLIFF module. The basic principle behind it is to keep the core functionality the same for everyone while pushing customizations to optional modules. In that way, every tool would be able to understand at least the “nucleus” of every XLIFF file (meaning, the source and target text and some basic metadata) and allow the user to choose the tool that works best for the situation.
The Next Steps
Obviously, it would be great if there was a single file format with translation material that was recognized by all tools. However, this would hardly be enough to solve the need for interoperability. For instance, consider translation management systems (TMS). Currently, a lot of translation is done through a TMS that tool that vendors often develop for use by a project manager at a translation company or enterprise, in combination with their proprietary tool, which is also used by the translators. Vendors often send packages containing not only an XLIFF file but also other resources, such as translation memories, termbases, and reference material. There is a need for compatibility here, but this type of work is still at an early stage and driven by commercial needs. (For instance, memoQ can work with an SDL Trados Studio or STAR Transit package.)
What’s next? A standard type of package that all tools would be able to ingest? Yes! A universal application program interface allowing us to import content from one TMS to another? Yes! A generic type of machine translation training data that you could import seamlessly and export between engines? Why not!
However, all this is unlikely to happen unless we harness the power of the translator community. Translators need to make sure that their voices are heard in the debate about where interoperability should be improved. We would like to encourage all translators and ATA to get involved in the development of standards that enable compatibility in all aspects of the translation profession. There’s a lot of work to be done, but translators have a strong voice that will certainly make a difference for the better.
Organization for the Advancement of Structured Information Standards
ISO TC 37 web pages
XLIFF web page
Jose Palomares is the administrator of ATA’s Language Technology Division. He is a technology strategist at Venga Global, a multi-tier globalization company. Before joining Venga, he already had over 14 years of experience in the translation and localization industry, serving in several different roles along the supply chain—translator, project manager, language engineer, tester, quality assurance specialist, computer-assisted technology trainer, machine translation coach, and language technology consultant. A certified trainer in multiple tools, he is also the director of the Institute of Localization Professionals. Contact: email@example.com.
Peter Reynolds is the project editor for ISO 17100, which is the successor to the European EN 15038 standard. He is the executive director of Kilgray Translation Technologies. Prior to Kilgray, he worked at Idiom Technologies Inc. (now SDL PLC), Berlitz GlobalNet, Bowne Global Solutions, and Lionbridge. He has been involved in the development and promotion of standards (notably XLIFF) for over a decade. He is an Irish expert to ISO and the deputy chair of the Irish ISO TC 37 Mirror Committee. He has a BSc and an MBA from Open University. Contact: firstname.lastname@example.org.