Although PDF files are helpful for reference, providing them for translation could prove quite inconvenient. Knowing how to handle these files in various cases will better prepare you to help clients understand the challenges posed by the files they send.
Dealing with the translation of PDF files could turn into a nightmare during some translation projects. Getting them without their source files often causes issues that probably wouldn’t have arisen had the client just sent the text in its original format. So, why do some clients send PDF files for translation? There’s certainly not just one answer to this question. What can we do when there’s no chance of receiving the source files? Should we charge more when working on PDF files? To give you a better idea of some of the issues involved, let’s explore some aspects of handling PDFs during a translation project.
Why Can Receiving Only PDF Files for a Translation Project Be an Issue?
Nowadays, most translators overwrite the source text they receive in an electronic format for translation. They open the original file in the same program as the one used by the author (or by the publishing team) or process it using a translation memory (TM) tool. Any layout work required on the target text is then done by the translators or by desktop publishing (DTP) experts using the source program.
PDF files are not normally supposed to be overwritten. This format type is actually generated from the original files and used to exchange data easily without any specific program requirement to read the content. Therefore, receiving them as a reference during a translation project can be highly practical. They can be given to linguists who don’t have the original program and/or who work within TM tools that don’t let them visualize the actual layout. They can also be a useful reference during DTP to ensure the same final output is obtained. Or they can even be sent for quality assurance or client validation, since it’s easy to insert comments on the translated text or layout.
Although these files are helpful for reference, providing them for translation could prove quite inconvenient. Overwriting text in Acrobat Pro is feasible, but not very practical because it’s problematic when the target text is longer (or much shorter) than the source. Despite more and more TM tools supporting the PDF format, the result isn’t always optimal. Sometimes the text to be translated isn’t extracted correctly. More issues may appear when the original layout is complex or when the translation has to be delivered in a particular format, like Adobe InDesign or Microsoft PowerPoint.
As we’ll see later in this article, there are techniques to extract the text from a PDF file and rework the layout. However, when the files contain images and have to be printed in high resolution, these processes might not provide the expected results. That’s why it’s generally advisable to ask clients for the source files and only use the PDF files for reference.
Why Do Some Clients Send PDF Files for Translation?
Unfortunately, not all clients send the files to be translated in their original format. Some think it’s easy to overwrite the content in the PDF itself. Others assume that translators create new files (e.g., in Microsoft Word) in which they type the target content directly. Many clients often fail to realize that it’s far easier to receive their editable document and overwrite it with the translation. In most cases, this simplifies the final layout process. I remember when a client sent us a PDF file generated from a text he had just written himself. We immediately asked for the source file, but his reply was quite puzzling. After generating the PDF file, he had deleted his editable file, not realizing anyone would ever need it. Fortunately, a simple explanation will often persuade clients to send the appropriate files for our translation projects.
Some clients who request a translation send PDF files when they judge the source files too complex for translators, such as Adobe InDesign documents. It’s true that not all linguists own this layout program, but providing them with a format compatible with TM tools (in this case IDML files) means that the source text can be overwritten and the original formatting easily recovered for target layout adaptation.
Unfortunately, we might occasionally come across clients who aren’t fully responsive. They feel we should find the solution on our own or be talented enough to handle any file format.
In some companies, the people requesting a translation (for themselves or others) haven’t worked on the source files at all. They were provided with PDF files or retrieved them from a common repository and don’t have any idea who created them. In these cases, it can prove difficult for them to provide us with editable files for translation. When the source files are created by an external team, such as a public relations agency, it can be virtually impossible to get editable files. These agencies might only deliver PDF files to their clients and not share the original material they created. This could be because they want to charge extra for the layout of target languages, or perhaps they simply fail to understand the requirements of the translation team.
Finally, there are exceptional situations, such as when the source text is only available on paper and then scanned. With the best intent in the world, the client won’t be able to send us the actual sources in their original editable format.
How Can We Handle PDF Files?
The methods used to handle PDF files depend on the programs with which we work, the complexity of the PDF files, our own skills, or even the client’s expectations.
Retype: Creating a new document and writing the target text isn’t really complex. Nonetheless, translators used to overwriting may find it takes them longer to type from scratch. This isn’t really an issue if you use dictation software, as dictating from a printout or a second screen can be relatively fast. For repetitive texts or similar projects, however, you’ll also lose the advantage of retrieving existing translations from a TM. And when the client asks for the layout to be retained, attempting to format the target file to match the original PDF can increase project time significantly. In any case, this method will sometimes be recommended for scanned text or non-extractable portions of text.
Copy and Paste: As long as the source content wasn’t scanned, you can select the text of a PDF file, copy it, and paste it into a new document. For plain text, some adjustments may be essential, such as removing carriage returns at the end of each line. You might also need to redo tables. Obviously, the more complex the original layout, the more work you’ll have to do, not only to obtain an editable text but also to reproduce a format similar to the original one.
Save As: You can also save the accessible PDF content in editable formats. Most solutions must be paid for, but will result in quite a good output, requiring only a few adjustments. Nevertheless, some source formats might not be properly supported, and complex layouts, frames, tables, organizational charts, etc., will more often than not complicate the task and require preparation of the source content and/or major work on the final target layout.
Use a TM Tool: For some time, more and more TM tool editors have integrated PDF support. They might even include some features handling scanned text within PDF files to translate. The result is often very good for a simple layout, and even heavily formatted content, with tables, graphics, etc., might be processed correctly. Checking whether all the segments to be translated are actually made available to the translator and whether any adjustments are needed (e.g., correcting double or missing spaces) is strongly recommended.
As far as layout is concerned, the TM tool output could suit the client, possibly with some adaptation. However, if the client expressly requests a specific format other than the proposed output, major formatting work may be necessary, potentially leading to the recreation of full-page layouts using the same program(s) as the source file creator.
Extract the Content into an Editable Format: Several PDF extraction tools are available, sometimes for free. However, make sure that you’re not contravening any non-disclosure agreement and/or contract you’ve signed with the client by uploading files to online sites.
These tools may allow you to select the required format and most of them will extract the text correctly while keeping most of the original layout intact. Once again, the result will mostly depend on the complexity of the source material. Therefore, a preparation or pre-layout step is recommended, particularly when projects involve multiple target languages. Any work you do in advance won’t be needed afterwards for each target language, which will speed up DTP time. Translators should be aware of any potential issues occurring during extraction. You may decide to fix the source text before starting to translate, or fix problematic segments as you go along. When you receive these extractions from a translation agency or a client, you should check that they optimized the extracted source text first. If not, inform the client or agency that the extractions might need to be fixed.
Sometimes one extraction tool might suit all your needs, or you might need several tools to extract various content types into different formats. For example, tables appearing in some PDF files might be extracted with one tool, whereas another tool will need to be used to extract organizational charts. Some tools are limited to one extraction format, like Microsoft Word, while others will provide you with the opportunity to properly extract an Excel spreadsheet, PowerPoint presentation, or even an Adobe InDesign file.
Use Optical Recognition: Instead of being generated from a specific application, some PDF files result from a scanning process. In this case, you can turn to optical character recognition (OCR) software. The output will vary greatly depending on the quality and resolution of the scanned document and correct language detection. (If possible, define the source language of the PDF to be processed.) It goes without saying that it’s also preferable to check the file carefully before launching into the translation. Spotting mistakes linked not only to the format or to some missing text, but also to badly recognized letters or figures, will often be crucial and prevent serious quality problems in the target text (e.g., an “i” extracted as an “l” or “3 cm³” extracted as “3 cm²”).
What Are the Costs Linked to PDF Translation?
Basically, the key is to assess the steps needed to produce the expected result from a PDF file and make sure you’re compensated for the extra work. Sometimes it’s quite hard to make the right guess, but often a few minutes is all you need to make an estimate. You can base this estimate on your experience or on some tests (e.g., opening the file in a TM tool, or checking the rough output from a text extractor).
If the client only expects translated content, without any layout, the extra effort required might be minimal. It will then be a question of deciding whether the work should be paid like any other job or whether you should charge slightly more, either by increasing the rate (per word, line, character, etc.), adding billable minutes or hours, or even apply a flat rate (e.g., specifying the extra amount charged for processing PDF files).
If the request is to deliver the target text following the original file layout, I would advise you to analyze the scope of the task and rate it, especially for a complex layout. For instance, you might add preparation hours to the quote as well as the usual DTP work negotiated per page and/or illustration. Or you could increase the rate per page for DTP when PDF files have to be processed.
In any case, the first action I would recommend is to always ask clients for the editable source files with illustrations containing editable text layers, any proprietary fonts, templates, etc. Explain to them that the goal is not only to ease the linguists’ work, but also to reduce costs and guarantee a proper file resolution. Looking at the PDF file properties (e.g., via the File menu in Adobe Reader) may also give you a good indication of what the source files were.
Being Clear with the Client Helps Everyone
We might encounter clients who won’t be able to send us any other format than PDF files for translation and who might even forget to unlock them at times. Knowing how to handle these files in the various cases is extremely helpful. Whether you write a brand new text, make a basic extraction without any layout, prepare the file for the client so they can format it easily, or recreate the full layout yourself, I suggest you measure the approximate time and effort it takes to complete the tasks and include the related costs in your project price. Clients should understand that extra work means extra charges. But they might need some clear explanations on the challenges posed by the files they send, and they should definitely be warned in advance of any potential price increase.
Nancy Matis has been involved in the translation business for more than 20 years, working as a translator, reviser, technical specialist, project manager, and teacher, among other roles. After earning degrees in translation and social and economic sciences, she worked for an international language services firm for several years. She currently manages her own company based in Belgium, specializing in localization, translation project management, consulting, and training. She also teaches translation project management at Université Lille 3 (France), KU Leuven (Belgium), Université Libre de Bruxelles (Belgium), and through webinars. Besides publishing articles on project management and the importance of teaching this subject to future translators, she has also written about terminology management in projects and quality assurance in translation (www.translation-project-management.com). She is the author of How to Manage Your Translation Projects, which is available on her website. Contact: nancy@nmatis.be.