The Okapi Framework is a free open-source and cross-platform project offering a variety of tools that can be quite helpful for translators. However, there’s a caveat. The project was developed initially as a toolset for localization engineers, not translators, which can make things a bit more difficult.
At its core, the framework is a set of components that are meant to be put together to create processes for doing various translation-related tasks. Think of Okapi as a Lego set with which developers or technical-minded users can build very powerful utilities. But this isn’t very practical for most translators, since they would rather have something more concrete with which to work.
Fortunately, among the different things Okapi offers, there are a few high-level applications ready to use “out-of-the-box” that anyone can take advantage of without any programming skills.
Simply put, Rainbow is a desktop application that allows you to run a wide range of functions on sets of files. There are many predefined utilities, but you can also construct your own pipelines for tasks easily. Here are some of the predefined utilities:
Translation Kit Creation: This utility takes a set of source files in various formats (e.g., DOCX, IDML, and HTML) and extracts the translatable text into a translation kit you can use in computer-assisted translation (CAT) tools. Rainbow has access to all the Okapi filters, so you can process a wide range of file formats.1 The pipeline includes steps to segment the source text and leverage translation resources such as translation memory (TM) and machine translation (MT) engines against the text to translate. (See Figure 1 below.)
Translation Kit Post-Processing: This utility merges back the files prepared by the Translation Kit Creation utility and then translates them. It creates the translated files in their original source formats.
File Format Conversion: This utility allows you, for example, to create a TMX file from a set of translated XLIFF documents, Portable Object files, or other bilingual files. Other output formats are also supported.
Translations Comparison: This utility will compare your final translation with the MT candidates offered by two different MT engines. It will also provide you with some metrics indicating how bad the MT candidates are compared to your flawless translation.
For more information on the many tasks you can perform from with Rainbow, please see the Rainbow’s Wiki page at the link above.
Figure 1: Rainbow
Tikal is a command-line tool that offers several functions similar to Rainbow, but it also has a few extra features. For example, you can query various TM engines directly. One TM that’s available for free is Amagama, which is the one set up by default for the Translation Toolkit TM connector.2 The following command line, C:\>tikal -q “Open file” -tt -sl en -tl fr, will provide you with the matches shown in Figure 2 at right.
You can also query MT engines to machine-translate files in any format supported by the Okapi filter. This provides you with the MTed version in the original format by using a single command line (handy for pseudo-translation tasks). You can also export a file to XLIFF and merge back the translation. For more information see, Tikal’s Wiki page at the link above (or just type “tikal –h” on the command line).
Figure 2: Matches shown using Tikal
CheckMate is an application that allows you to run most of Okapi’s verification and quality check steps on different types of bilingual files and handle the report interactively. It features verifications for inline codes, segment length, missing or extra special characters, comparison of patterns between the source and target (e.g., if there’s a URL in the source, there should be one in the target), and much more. (See Figure 3 below.) You can spell-check, verify the translation against lists of terms, and even run the verifications offered by the Language Tool library.3
One handy feature of CheckMate is that it can check for changes in the input file automatically. This means that when you make changes in any translation tool and save the target file, the list of issues displayed in CheckMate is updated at the same time.
Figure 3: CheckMate
While it’s unlikely that many users will need Ratel very often, it can be very handy on occasions. Ratel is a simple desktop application that can be used to create, test, and maintain Segmentation Rules eXchange (SRX) rules. (SRX is a file standard format to represent segmentation rules.4) The goal of the tool is to make it easier to create rules without having to learn the syntax of SRX. You can also test your rules directly on a portion of text or on text files. (See Figure 4 below.) The rules you create with Ratel can be used with Okapi’s segmentation component, but also with any translation system that implements the SRX standard.
Figure 4: Ratel
Olifant is a program that can be very useful when you have to work on TMX files. It’s a bit different from the other Okapi tools because it comes from the pre-2008 Okapi Framework that was developed in C# and runs only on Windows. That application has not been ported to the new Java-based framework.
As a TMX editor, Olifant lets you do many things on TMX files (e.g., open, group, split, prune, and clean-up TMX entries). Search and replace, code removal, joining entries, and many more functions are also available. For example, Olifant includes a powerful way to flag entries corresponding to a set of given conditions (e.g., duplicated sources or targets, empty targets, source equals to the target, or entries matching a specified regular expression). The function can be applied to existing flagged entries and you can reverse the flags. (See Figure 5 below.) Overall, this allows you to select a set of entries and then delete or export them in a separate TMX.
Figure 5: Olifant
Ocelot is an XLIFF editor with strong support for quality metadata through the Internationalization Tag Set (ITS) standard. It allows you to gather and manipulate information about quality and track changes and who did the changes. (See Figure 6 below.) Several other types of properties defined in the ITS standard are also supported, such as text analysis and terminology. Ocelot supports both XLIFF 1.2 and XLIFF 2 files. Another interesting aspect of Ocelot is that it can use plugins, allowing it to be customized to specific needs.
Figure 6: Ocelot
Filters Plugin for OmegaT
In addition to Okapi’s standalone tools, there is also the Okapi Filters Plugin for OmegaT. As its name indicates, it’s a plugin for OmegaT that allows you to use some of the Okapi filters directly from OmegaT. (See Figure 7 below.) This seamless integration adds support in OmegaT for quite a few formats, including SDLXLIFF, Markdown, TTX, YAML, JSON, ITS-driven XML, XLIFF 2, and InDesign.
Figure 7: Okapi Filters Plugin for OmegaT
More Links for Information on Okapi
Okapi Tools User Group on Yahoo
Overall, while some of the components of the Okapi Framework may be a bit technically challenging, there are a few tools that are relatively easy to use. Many of the end users in the Okapi Tools user group on Yahoo are translators, so you should not hesitate to ask any questions you may have there. (See the above sidebar for this link, as well as other useful resources.)
Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail Jost Zetzsche at firstname.lastname@example.org.
- See the information on the file formats Okapi supports, http://bit.ly/Okapi-supported-formats.
- See http://amagama.translatehouse.org.
- See https://languagetool.org/development.
- See the SRX recommendation on the website of the Globalization and Localization Association, www.gala-global.org/srx-20-april-7-2008.
Yves Savourel has been involved in internationalization and localization for 27 years. He has worked on various localization standards, including TMX, SRX, XLIFF, and ITS. He is the author of XML Internationalization and Localization. He is part of the Okapi Framework open source project and currently works for Argos Multilingual. Contact: email@example.com.