From the dawn of civilization, translation fascinates humanity’s desire to bring ideas across language and cultural barriers. Historians have found the Akkadian translation of the Sumerian to be the earliest translated text. (No, I’m not referring to in the Marvel Eternals, and no he is not ). Epic of Gilgamesh Don Lee Wong from Dr. Strange Closer to modern history, linguists raced to decipher the Egyptian . hieroglyphs on the Rosetta stone Today, translations in the forms of ciphers have sent nations rushing to develop automatic translator devices, like the and during the cold war. Engima in World War II IBM’s Thinking Machine “Does this mean the end of human translators? Yes for scientific and technical material but as regards poetry and novel, no I don't think we'll ever replace these translators” - Paramount News (1954) Computer science has advanced far from the 1950s. With the recent renaissance of deep learning, state-of-the-art machine translation systems have achieved ‘ ’, reached and . human parity , translation quality comparable to human professionals correctly translated the essence of a French poem Is this the final frontier of machine translation? Technologists, scientists, and big tech companies would use any opportunity to tout any record-breaking achievements. And this is the point where articles would normally do some fear-mongering, “ ” or “ ”. The end of human translators Translators replaced by machines in the fourth industrial revolution But this isn’t that kind of article. Instead, I would like to introduce readers to the simple notion of Translation Memory (TM). There is no doubt that machine translation makes mistakes and human intervention is absolutely necessary for (for now). Translation applications range from non-critical translations of Zelda games to precarious situations where a doctor needs to translate medical reports to give an accurate prognosis. high fidelity translations The correct translation is critical, especially in medical and pharmaceutical translations that require specialist world knowledge. For example, a machine without some knowledge base would never be able to properly translate “ ” (Polish) to “ ” (American English); “paracetamolu” would have translated to “Paracetamol” which is common in British English. przedawkowanie paracetamolu Acetaminophen/Tylenol overdose For these situations, a human is needed to edit the machine translation, or a good Translation Memory should be able to take care of the replacement. terminology A machine … would never be able to translate “ przedawkowanie paracetamolu” (Polish) to American English. Translation Memory is just a database The simplest form of Translation Memory (TM) is a database of translated texts curated by human translations. Typically, before a translator translates a document, that first tries to search for matches in the TM database and pre-populates the translations for segments that are perfect or near-perfect matches. they use translation editing software Translation Memories are a very useful tool for humans and machines, they usually feature: A translator has previously translated, “水のように” to “ r” and “火“ to “fire”, now the TM is able to find translations for “火のように”. Smart Remembering: be like wate Imagine the boredom of translating and re-translating websites and contract boilerplates; or translating “Link… every blood moon. Reduce Repetition: Ganon's power grows...” Other than medical scenarios illustrated above, there are other translation jobs that have equally high stakes that require specific terminological translations, e.g. (technical knowledge) or (cultural knowledge). Introduce Knowledge: getting sued for mistranslating car manuals translation errors that could have started a war Wait a minute, isn’t TM just the training data for Machine Translation? Yes, it can be. But it can be a lot more than . There are several scenarios that TM can integrate with machine translation (MT) and it is not just the training data. Consider the following: just the training data The TM is only available after the MT model is trained The TM has constant updates and additions/deletions The TM is used to correct MT mistakes In the first scenario, companies and individuals that do not build their own machine translation engines have no other choice but to , e.g. plug in the TM as an ad-hoc if-else from aomame import GoogleTranslator


gt = GoogleTranslator(host="translation.googleapis.com", key="*******")

def translate(text, source_lang, target_lang, tm):
    if text in tm:
        return tm[text]
    else:
        return gt.translate(text, source_lang, target_lang)

tm = {"przedawkowanie paracetamolu": "Tylenol overdose"}

input = "przedawkowanie paracetamolu"

translate(input, "pl", "en", tm) For the second scenario, imagine if the medical director decrees that all documents are to use generic drug names, i.e. “ instead of “ ”. Even if Google somehow managed to get the right translation to the example above at some point, you still can’t go to the Google office and enforce only generic drug names in the translation. Acetaminophen” Tylenol And the last scenario, after expanding all your means of training/tuning or complaining to Google, there is no way for the model to learn the right translation for “ ” to “ you would have to resort to using the TM on-top of the MT for specific translations. przedawkowanie Acetaminophen”, If the model ain’t learning, you ain’t tuning hard enough It’s possibly true that the model will eventually learn the right translation after concocting the right mixture of training data with the TM and turning the knobs on the hyperparameter ham-radio. One should consider the ROI of: But at what cost would it be to fix that particular translation? Infrastructure effort and computing cost of setting up a model tuning mechanism Human effort of hyperparameter tuning or writing the code to tune hyperparameters The time it takes to deliver the right translations to the user Are there sentences that machine translation just can’t get right, no matter how much data/tuning you throw at it? Regardless of the task, there will always be a data point that a machine can’t get right, especially when humans sometime struggle too. The exists in machine translation too. “Chihuahua or Muffin” problem Shiba or marshmallow ? - Karen Zack, @teenybiscuit (2016) Since the days of the , we have understood that some texts are harder to translate than others, most notably web text; the misspelling prone comments, abbreviations, slang, and fat-fingers “ ” make translations of web texts challenging. GALE project covfefe Why does Google Translate do so well on web texts translations? The magic in machine learning is often data and indirectly, human-created data. Have you wondered why there is sometimes a human figure symbol next to the translation on Google? That is an example of ad-hoc translation memory usage in machine translation. “Covfefe”  was stored as a human validated translation and most probably it is a frequently translated word and Google wants to enforce the word preservation as the right translation. Even if public translation APIs don’t explicitly tell you that humans curate their translation data behind the scenes, data cleaning is critical to a state-of-the-art machine learning model. So much so that there is a specific . translation data cleaning shared task Summary: Translation Memory (TM) Expectation management has been a bane to the existing hype around MT systems stealing jobs from human translators since the 1950s. While the NLP/MT technology is accelerating at an unprecedented pace, languages and translations will always contain nuances that even humans find hard to grasp. As we advance the state of machine translation, translation memory has its place in todays’ translation tech stack that benefits MT users and human translators. Even if tech giants don’t explicitly tell you humans creating data are the key ingredients that make MT possible, they definitely do hire lots of translators indirectly through buying language data brokers.

Google

Microsoft

Twitter

Conferencing and The Art of 'Paper Blitzing'

TMNT: Translation Memory and Neural Translation

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Generative Language Flashcards

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Generative Language Flashcards

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps