Skip to content
FacebookTwitterLinkedinYoutubeInstagram
  • Join ATA
  • Renew
  • Shop ATAware
  • Contact Us
  • Log In Welcome, My Account
American Translators Association (ATA)
Find a Language Professional
  • Certification
    • Certification
      • Guide to ATA Certification
      • What is a Certified Translation?
      • How the Exam is Graded
      • Review and Appeal Process
      • Looking for more information?
    • Taking the Exam
      • About the Exam
      • How to Prepare
      • Practice Test
      • Exam Schedule
    • Already Certified?
      • Put Your Credentials To Work
      • Continuing Education Requirement
    • Register Buttons
      • Register for Exam
         
      • Order Practice Test
  • Career and Education
    • For Newcomers
      • Student Resources
      • Starting Your Career
      • The Savvy Newcomer Blog
    • For Professionals
      • Growing Your Career
      • Business Strategies
      • Next Level Blog
      • Client Outreach Kit
      • Mentoring
    • Resources
      • For Educators and Trainers
      • Tools and Technology
      • Publications
      • School Outreach
    • Event Buttons
      • Visit ATA66
      • Upcoming Webinars
  • Client Assistance
    • Client Resources
      • Why Should I Hire a Professional?
      • Translator vs. Interpreter
      • Buying Language Services
    • More Client Resources
      • Need a Certified Translation?
      • What is Machine Translation?
      • The ATA Compass Blog
    • Find a Translator Button
      • Find a Language Professional
  • Events
    • Events
      • Annual Conference
      • Free Events for ATA Members
      • Certification Exam Schedule
    • More Events
      • Virtual Workshops and Events
      • Live and On-Demand Webinars
      • Calendar of Events
    • Event Buttons
      • Visit ATA66
      • Upcoming Webinars
         
  • News
    • Industry News
    • Advocacy and Outreach
    • The ATA Chronicle
    • The ATA Podcast
    • ATA Newsbriefs
    • Press Releases
  • Member Center
    • Member Resources
      • Join ATA
      • Renew Your Membership
      • Benefits of Membership
      • Divisions & Special Interest Groups
      • Chapters, Affiliates, Partners, and Other Groups
      • Get Involved
      • Member Discounts
      • Shop ATAware
    • Already a Member?
      • Member Login
      • Connect with Members
      • Credentialed Interpreter Designation
      • Become a Voting Member
      • Submit Member News
      • Submit Your Event
      • Contact Us
    • Member Buttons
  • About Us
    • About ATA
      • Who We Are
      • Honors and Awards Program
      • Advertise with Us
      • Media Kit
    • How ATA Works
      • Board of Directors
      • Committees
      • Policies & Procedures
      • Code of Ethics
      • ATA Team
    • Contact Button
      • Contact ATA
  • Join ATA
  • Renew Your Membership
  • Contact Us
  • Log In
  • Find a Language Professional
September 11, 2017

Language Sketching

Resources
Source: The ATA Chronicle

Dragos Ciobanu, from the translation department of the University of Leeds, wrote to me a few weeks ago:

The reason I’m writing is to ask whether you’ve had a chance to play with Sketch Engine (see sketchengine.co.uk). In Leeds, we’ve been using it in our corpus linguistics work a lot. It’s got brilliant features, from the terabytes of super useful multilingual data which it already comes with, to features for term extraction, specialized corpus building, thesaurus, collocations, and tons more! It’s really, really cool and I’m only writing to you because the translators I know who have been playing with it also like it a lot.

Not sure whether you could tell, but Dragos really likes Sketch Engine. And in a way, I could stop this column right here, because he already said it all—sort of.

After spending some time looking at Sketch Engine, I felt embarrassed that I hadn’t known more about it. As Drago said, it’s really, really cool. It’s also a monster of a tool (size-wise) and it’s not particularly easy to navigate when you first encounter it. (According to Ondrej Matuška of the Sketch Engine team, one of the areas they’re trying to focus on in the immediate future is to make the product more user-friendly.)

But first, what exactly is Sketch Engine and what does it do?

It’s a corpus tool developed by the Czech company Lexical Computing Limited. Lexical Computing was originally founded in 2003 by the late Brit Adam Kilgarriff and Pavel Rychlý, a professor at Masaryk University in Brno. The idea of corpus tools, and this corpus tool in particular, is to find how language behaves based on large collections of data. For this purpose, Sketch Engine built corpora in more than 80 languages (as well as “time-stamped” corpora in a slightly different set of 18 languages for the purpose of comparing word usage over time). The sizes of the corpora differ widely (from just a few million words in Maori to more than 800 billion in English), and they are available for a number of analysis purposes for any paying trial user. (The annual subscription price is 100 euros for non-academic users, with the trial period ending after 30 days.)

The analyses you can do on these corpora with Sketch Engine include the following:

Word Sketches: This is where the program got its name, and it’s what Kilgarriff brought to the table. A word sketch is a summary of a word’s grammatical and collocational behavior (collocational refers to the analysis of how often a word co-occurs with other words or phrases. (See Figure 1 below.) Since the data in the corpora is lemmatized (i.e., words are analyzed so they can be brought back to their base or dictionary form), the results are a lot more meaningful than what most of our translation environment tools provide when they’re unable to relate different forms of one word to each other. Another word sketch option that Sketch Engine offers is the comparison of word sketches of similar words.

Figure 1: An example of a Word Sketch

Figure 1: An example of a Word Sketch

Thesaurus: The ability to retrieve a detailed list or a graphical word cloud with similar words, including links to create reports on word sketch differences for those terms to understand the exact differences in actual usage.

Concordance: Searches for single words, terms, or even longer phrases. Since the data in the supported languages is tagged, it’s also possible to search for specific classes of words or specific classes of words that surround the word in question.

Parallel Corpus: Retrieval of bilingual or multilingual sets of words or phrases within the contexts. (See Figure 2 below.) Presently this is available only for on-screen data viewing, but it will soon be offered as downloadable data. This is especially helpful when uploading your own translation memories (see below).

Figure 2: The parallel corpus feature

Figure 2: The parallel corpus feature

Word Lists: The possibility of creating lists of words and the number of occurrences, either as lemmas (the base form of each word) or in each word form.

Creating Your Own Corpus: This is likely the most exciting feature for translators. You can either upload your own translation memories or use the tool’s own search engine mechanism (which relies on Microsoft Bing) to create a list of bilingual websites that contain the terms that are relevant to your field, have them automatically align, and form a corpus. I don’t need to explain to you the possibilities this offers to translators who don’t have the privilege of having high-quality translation memories or termbases for a particular subject matter that they need to translate. As a logical extension of this feature, not only can you perform any of the functions mentioned earlier, but it’s also possible to run a keyword search on the user-created corpus, identify the terms that are relevant, and download that into an Excel or TBX file. This feature is currently available for Chinese, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, and Spanish. The bilingual version of this is just around the corner.

By the way, you can find an example of the up-and-coming increased user-friendliness of Sketch Engine in OneClick Terms (terms.sketchengine.co.uk) that allows you to extract terms from TMX, XLIFF, PDF, DOC, DOCX, HTML, or TXT files in essentially one or two clicks. (See Figure 3 below.)

Figure 3: Preview of OneClick Terms

Figure 3: Preview of OneClick Terms

Translators have been one of the primary target groups for the makers of Sketch Engine. One immediate result of that focus is the availability of a plug-in for SDL Trados Studio (see http://bit.ly/SDL-appstore-sketch and http://bit.ly/user-guide-sketch). The plug-in itself is free, but it requires a trial or paid registration to be usable. It allows you to perform collocation, thesaurus, and concordance searches and will soon offer term extraction. According to Ondej Matuska at Sketch Engine, talks with makers of other translation environment tools are under way to offer plug-ins or add-ons for those tools as well.

Can you believe you’ve never heard about this tool before? Well, maybe you were quicker than I to find this, but the good thing is that now we all know.


Jost Zetzsche is chair of ATA’s Translation and Interpreting Resources Committee. He writes the “Geekspeak” column for The ATA Chronicle. He is also the co-author of Found in Translation: How Language Shapes Our Lives and Transforms the World, a robust source for replenishing your arsenal of information about how human translation and machine translation each play an important part in the broader world of translation. Contact: jzetzsche@internationalwriters.com.

Remember, if you have any ideas and/or suggestions regarding helpful resources or tools you would like to see featured, please e-mail me at jzetzsche@internationalwriters.com.
 

Share this

Posts navigation

← Peer Reviewed: Collaborative Preparation for the Certification Exam
Interview with Maurine McLean, Former Sign Language Interpreter →

Latest Posts

  • ATA Statement on Artificial Intelligence May 20, 2025
  • Pennsylvania Recruiting Bilingual Workers with a Pay Incentive Pilot Program May 5, 2025
  • A County in Illinois Rolls Out “I Speak” Cards as Part of April’s “Language Access Month” May 5, 2025
  • Trump Administration Cuts Funding for Ukrainian Literature Translations at Harvard May 5, 2025
  • Washington State Senate Passes Bill Enhancing Court Interpreting Services for Non-English Speakers May 5, 2025

Topics

  • Advocacy & Outreach
  • Annual Conference
  • Book Reviews
  • Business Strategies
  • Certification Exam
  • Certification Program
  • Client Assistance
  • Educators and Trainers
  • Growing Your Career
  • Industry News
  • Interpreting
  • Member Benefits
  • Member News
  • Mentoring
  • Networking
  • Public Outreach
  • Publications
  • Resources
  • School Outreach
  • Specializations
  • Starting Your Career
  • Student Resources
  • Tools and Technology
  • Translation
Language Services Directory
ata_logo_footer

American Translators Association
211 N. Union Street, Suite 100
Alexandria, VA 22314

Phone +1-703-683-6100
Fax +1-703-778-7222

  • Certification
  • Career and Education
  • Client Assistance
  • Events
  • News
  • Member Center
  • About Us
  • Member Login
  • Contact Us
  • Sitemap
  • Privacy Policy
  • Accessibility Statement
  • Submit Feedback

© 2025 - American Translators Association

Find a Language Professional
Scroll To Top
By clicking accept or closing this message and continuing to use this site, you agree to our use of cookies.I AcceptPrivacy Policy