Natural Language Processing has made tremendous strides the last few years, undoubtedly. One should expect the next five or ten years to see a Cambrian Explosion of sorts in the sector, not just in language-focused technology like translation and language learning, but across the continuum of modern technology. That being said, we are still a little out of step on some critical issues. This is not to say there are not teams of brilliant and motivated people on a definite road to solving these problems, but the solutions have not hit the public — yet.
There is evidence of this in a very pervasive application, Google Translate. For anyone familiar with multiple languages, it’s unfortunately clear the majority of translation perfection has been through Anglo eyes. Translations from and to English are improving, but not so much for other language pairings. Progress is noticeable in other European languages, but still lacking. Even drastic improvements in Google’s technology — using zero-shot translation and a ‘multilingual neural machine translation’ mechanism — have not fully alleviated the problem in bridging two lesser-used languages.
Changing this will require a focus on this technology from native speakers of other languages, especially bilinguals, who can more effectively contribute. An Israeli who speaks Hebrew and picked up some Thai after a couple years in his early 20s will do far more to improve Hebrew-Thai-Hebrew translation memory than an algorithms developed by English-speakers alone.
This has become somewhat of a joke for a lot of people, with videos appearing on YouTube that compare how well Siri, Alexa and Google can decipher the spoken accent of a given speaker. In English, there has been some progress on this front, but again that is a product of Silicon Valley’s centrality working on the technology.
Again, I don’t want to make it like I’m treading new ground here. An Amazon patent for “Accent Translation,” — filed June 2018 — might be the first breakthrough from what investor Lou Kerner dubs FAMGA ( Facebook, Apple, Microsoft, Google, and Amazon).
Multiple languages have a variety of accents, some thicker and some weaker (the latter a particular issue with older speakers of any language), and some even modified by health issues. This issue also relates to the next point:
English-speakers tend to think of dialects mostly in terms of accents: Bostonians, Brooklynites and Chicagoans share some features but some important differences. However, dialect is actually richer than that in most cases. English has true dialects, like West Virginian English and African-American Vernacular English, a.k.a. AAVE (which are respectively dismissed by many as under-educated ‘redneck’ or ‘ebonic’ slang). But these are rich communication systems that not only incorporate different pronunciation, but their own grammar rules. Social scientists have suggested that differences between AAVE and more ‘white’ dialects of American English might widen the chasm between the two groups, resulting in increased discrimination and more minor misunderstandings between black and white Americans:
“Although this has certainly resulted in subtle and non-subtle human prejudice being codified into machine learning, it also presents an opportunity to create programs that may correct for human prejudices. Data-driven models often work best when the linguistic field is narrowed, actually, because human language is so broad. Because of this, I wonder if there will be — could be — machine translation settings in the future that take into account the dialect of English being spoken; certainly this is a question that speech-to-text MT applications have to take into account. And I wonder if this, somehow, could be used to ‘translate’ dialects in places like the courtroom, for the benefit of everyone.” — Katie Botkin, managing editor of MultiLingual
Germans and Italians have more uniform boundaries between their dialects, with many foreign learners unaware that the standard German and Italian of their textbooks are compromises, mostly based on one local dialect that incorporates features of others. Germany is littered with varieties that are borderline separate languages: Bavarian, Allemanic, Schwabisch; Italian also: Sicilian, Neapolitan, Sardinian, and more.
These variations in spoken word are extremely critical to map in older populations around the world, who have not experienced the dialectal leveling of their grandchildren caused by a common language education standard and proliferation of mass media that reflects a relatively unified manner of speaking.
There are papers addressing the issue of using machine translation for dialects that are ‘low on resources,’ i.e. they have little in associated text to draw from. Some samples, namely with focus on specific groups of dialects, include Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German, Machine Translation for Arabic Dialects, and Multi-Dialect Machine Translation (MuDMaT) specifically for North African dialects of Arabic. Microsoft is also going over the problem.
This is an obscure one and admittedly tougher to measure, but equally critical to producing accurate understandings of language. A paper by Microsoft describes recognizing emotion in text and the challenges associated. Professionally, Microsoft calls this field emotion analysis and ties it closely to sentiment analysis, and has applications beyond translation or language learning.
Some companies like Beyond Verbal are using emotional analytics to measure changes in subjects’ health. Affectiva is using it to get a read on drivers’ emotions to improve safety on the road.
The application can also be used to improve translation, pricking up emotional nuances, motivations — even sarcasm — that create a much finer understanding of speech when interpreted across borders. This will go light years in eliminating mistranslations, which are often on account of cultural misunderstandings and a lack of familiarity with a language’s particular turns of phrase and nuances.
This is a common theme in the above-mentioned issues. Spoken language often does not reflect written language. The issue with Microsoft’s paper mentioned above, its merits aside, is that it focuses on the written word. Unfortunately, this might actually typify issues in the translation market.
The close approximation between English speech and writing has obscured this fact, especially since so much development work is actually done in English with a focus on large English-speaking audiences.
Even when it might not seem necessary for that target market, further and deeper fine-tuning of these elements of natural language processing — which deviate from our perception of written language — will have benefits for English speakers, and inevitably for people across language boundaries around the world.