By Mark Herman

Once again I’m happy to report that my alma mater, the Columbia School of Engineering and Applied Science, is still at the forefront of natural language processing, an extremely important subfield of artificial intelligence. Previously, I discussed the attempts to produce better lie detectors by Julia Hirschberg, a professor of computer science (March-April, 2016).

According to an article in the Spring 2017 edition of Columbia Engineering1 magazine:

because there are more than 500 languages in Nigeria, most of them uncategorized in terms of syntax, grammar, or lexicon, international disaster relief teams often run into an impenetrable barrier: the inability to understand these so-called low-resource languages (LRLs). (19)

Contrary to many works of science fiction, there is no such thing as a universal translator. But such a device may not in fact be necessary:

Working with a four-year Defense Advanced Research Projects Agency (DARPA) grant, Computer Science Professors Kathleen McKeown and Julia Hirschberg are leading the development of a universal sentiment and emotion detection system that will enable disaster relief workers confronted with a LRL to figure out who needs help the most, ideally within a day of their arrival in the region. (19)

But is it really possible to provide “situation awareness by identifying elements of information…such as topics, names, events, sentiments, and relationships” (19) in a totally unknown language? Disasters present “complex, dynamic scenarios” (19), and the relief worker must be able to understand “not just…the status of the disaster but…the causative events and the potential hazards ahead.” (19)

Natural language processing systems learn through ingesting massive amounts of data. No data means no way to train the system, and the very term “low-resource language” basically states the…major challenge…“[This] is entirely new ground,” McKeown said. “No one has really done it before.” (20)

One way to tackle the problem is to use:

speech data labeled for emotion in high-resource languages to train systems for identifying the same emotion in low-resource languages. (20)

This may seem ludicrous on its face, but the artificial intelligence system has thus far been able to identify anger and stress in a LRL, after being trained on an entirely different language, with an accuracy about 17% above random.

Another approach, unfortunately not very well explained in the article, involves the use of “deep neural networks.” (20)

In a second article in the magazine,2 Professor Hirschberg talks more generally about her research on natural language processing. At Bell Labs, she worked on text-to-speech synthesis. She explained that:

when people speak, they rarely speak in a monotone. They use falling pitch to indicate statements, rising pitch for yes/no questions, and a variety of other contours to express uncertainty, incredulity, surprise, and other types of information. (23)

The contours are important if artificial systems like Siri are to sound natural. Eventually, Hirschberg and her colleagues were able to automatically scan voice-mail messages and, with good accuracy, automatically distinguish between personal and business messages.

Hirschberg then goes on to talk about the studies on lying, discussed in my previous column, and her own PhD thesis:

computational approaches to interpreting conversational implicature – information listeners infer but which is not explicitly said. (24)

Finally, she speaks of future research into natural language processing (NLP):

  • build[ing] classifiers from larger and larger amounts of data, which makes NLP of increasing interest for business and medical applications
  • human-robot interactions
  • identif[ication] of medical conditions, such as depression and autism (25)

And finally, and perhaps not so benevolently,

  • analyses of political candidates and movement leaders to determine their ability to attract followers and win elections (25).

I will end this column with a linguistic matter that artificial intelligence currently deals with miserably: typographical fonts. Certainly when my computer substitutes a “closely related” font for a font in an imported text that happens not to be installed, it usually picks a font so different from the correct one as to give visually grotesque results. This completely wrecks the setup of tables and figure captions, to say nothing of advertising copy, leading some people to stick, whenever possible, with Times New Roman.

But fonts do have strong characteristics, even causing some people to wax poetic about them. Here are two such waxings by Arthur Graham3:

cropped Hum 2018 05-06 Help



1. Harris, Marilyn. “Designing Artificial Intelligence to Solve Global Problems,” Columbia Engineering (Spring 2017), 16-22.

2. Hvala, Joanne. “Q&A with Professor Julia Hirschberg,” Columbia Engineering (Spring 2017), 23-25.

3. Graham, Arthur. Rimes of an Ancient Typographer (Polyglot Press, 2017).


Submit items for future columns via e-mail to mnh18@columbia.edu. Discussions of the translation of humor and examples thereof are preferred, but humorous anecdotes about translators, translations, and mistranslations are also welcome. Include copyright information and permission if relevant.

Leave a Comment

Your email address will not be published. Required fields are marked *

The ATA Chronicle © 2018 All rights reserved.