I Built an LLM Translation Service — and It Changed How I See Language

I began this project the way I begin most engineering projects: with a checklist. Build the pipeline, tune the model, test edge cases, measure output quality. I expected it to be difficult, but difficult in familiar, technical ways.

It took only a few weeks to realize I was wrong.
The model was fine. The code was fine. The infrastructure was fine.

The real challenge was language.

Not “language” in the abstract, but the way each one makes different assumptions about meaning — assumptions that English simply doesn’t carry.

I won’t go into architecture or implementation, but building this system forced me to understand linguistics in a way I never had before. It became less an engineering project and more a journey through how humans structure thoughts.

English Leaves Too Much Unspoken

English was my starting point, and quickly became the first obstacle.

English hides an enormous amount of meaning — gender, hierarchy, intent, politeness, formality, evidentiality, specificity, even time — behind context and tone. Other languages insist you say these things explicitly.

LLMs don’t intuit what English fails to mention.
They need to betold how to resolve what English leaves ambiguous.

In practice, this meant translation could not be a single instruction. A prompt like:

“Translate this into Japanese”

worked surprisingly well — until it didn’t. As soon as we moved beyond single sentences, reviewers started flagging issues that weren’t wrong, just off.

It felt less like translating English and more like recovering the meaning English chose not to express.

The Moment a Simple Sentence Changed Everything

One turning point arrived through a line so ordinary I barely noticed it:

“I went to play soccer.”

In English, this unfolds naturally:
I → went → to play → soccer.

Then I saw the natural Japanese version:

サッカーをしに行った。

(Soccer → to play → went.)

Same meaning. Completely different choreography.

If you preserve the English order, a Japanese speaker feels the same jolt an English speaker feels reading:

“Soccer to play went I.”

Understandable? Yes.
Natural? Not at all.

That was the moment I realized translation isn’t mapping words.
It’s retelling the idea in the order the language expects.

In practice, this meant we could no longer ask the model to “translate” and hope for the best. We had to be explicit about structure and register. A typical instruction looked something like this:

Translate the following English text into Japanese.

Constraints:
- Do not preserve English word order if it sounds unnatural.
- Reorder clauses to match native Japanese sentence flow.
- Maintain a formal, polite register throughout.
- Prefer natural phrasing over literal translation.

From an engineering standpoint, this forced a concrete change: we stopped treating “structure preservation” as neutral, and started explicitly instructing the model to reorder sentences to match target-language flow, even if that meant aggressively breaking English structure.

Topic vs. Subject: A Different Way of Organizing Thought

Once I saw this, other patterns surfaced.

English organizes sentences around the subject:

“My car broke down.”

But topic-oriented languages like Japanese and Korean prioritize the topic:

“As for my car… it broke down.”

A small shift, but one that changes:

what must be repeated
what can be omitted
where emphasis naturally falls
what the listener expects next

It’s not just grammar — it’s a worldview.

Politeness: From a Word to a System

Because I know Hindi, I was already familiar with politeness in language.
A well-placed -jisoftens a sentence, conveys respect, or signals warmth.
ButHindi expresses politeness lexically — you add polite words.

Japanese and Korean operate on a completely different axis.

These languages express politeness grammatically.
Not with particles, but withentire verb systems and levels of speech.

Choosing a politeness level alters:

verb endings
pronouns (or their omission)
vocabulary
structure
whether you elevate the listener
whether you humble yourself

And these levels aren’t optional.
They’re not seasoning — they’re the recipe.

Early in the project, linguists reviewing the outputs pointed out something striking:
the model was oftengrammatically correct, but sociallywrong.
Too casual. Too intimate. Too humble. Too formal.

Hindi has polite words.
Japanese and Korean have politesystems.

From a system perspective, politeness had to be treated as a hard constraint, not a stylistic preference. We explicitly locked register in the prompt:

Set politeness level to FORMAL_POLITE.
Do not switch register mid-paragraph.
If the source text is ambiguous, default to the safer, more formal register.

Without this, the model would often produce grammatically correct output that reviewers still flagged as socially inappropriate.

Gender: Familiar From Hindi, But Surprisingly Diverse

Gendered grammar wasn’t a surprise to me either. Hindi makes verbs, adjectives, and objects dance to the rhythm of gender. I already knew that:

“I went.”
becomes
मैं गया। (male)
मैं गई। (female)

And that everyday objects have gender.

What surprised me was seeing how other languages interpret gender differently.

Hebrew and Arabic extend gender into plural verbs and imperatives.
German assigns gender in ways that feel whimsical unless you grew up with them.
Turkish and Chinese dispense with grammatical gender entirely.

It wasn’t the existence of gender that taught me something new.
It was the variety — the way languages decide what deserves gender and what doesn’t.

In implementation terms, this turned gender into a stateful decision. Once a gender was inferred, it had to remain consistent across the entire passage. We made that explicit:

If the source text does not specify gender:
- Choose a single inferred gender for the passage.
- Apply gender agreement consistently across all sentences.
- Do not switch gender unless explicitly stated.

Every sentence could be “correct” on its own, but inconsistency across sentences immediately broke the voice.

How Writing Shapes Sound

Japanese writing introduced me to an entirely new dimension.

Three scripts, each with a job:

Kanji for meaning
Hiragana for grammar
Katakana for emphasis and loanwords

But what struck me most was how loanwords aren’t just borrowed —
they’rephonetically naturalized.

“Computer” becomes:

コンピューター (konpyūtā)

Streamlined into Japanese rhythm, made pronounceable within its syllable structure.

Soccer becomes サッカー. (sakkā)
Taxi becomes タクシー. (takushī)
Ice cream becomes アイスクリーム. (aisu kurīmu)

It’s not transliteration.
It’s adaptation — a cultural reshaping of sound.

Reviewers would immediately point out when a script was wrong or a word wasn’t properly naturalized. That’s when I understood: script choices aren’t cosmetic. They’re part of the identity of the language.

Consistency: The Hidden Challenge

And then there was consistency — the problem I didn’t even see until linguists pointed it out.

One reviewer said:

“Every sentence is correct.
But together, they don’t sound like the same person.”

They were right.

The politeness level shifted.
Gendered verb forms wobbled.
A classifier appeared in one place and vanished in another.
A borrowed word switched scripts between sentences.

Individually, nothing was “wrong.”
Collectively, the voice fractured.

That was when I understood something essential:

Fluency isn’t correctness — it’s coherence.

The entire passage needs to think in one language, not many.

This is where we introduced a post-translation evaluation step. Instead of using the model only to generate translations, we also used it to review them — acting as a linguistic quality checker before human review.

You are a bilingual translation reviewer.

Evaluate the translation for:
- Register and politeness consistency
- Gender agreement consistency (if applicable)
- Structural naturalness (avoid English-like phrasing)
- Script correctness and loanword usage (if applicable)
- Paragraph-level coherence

Flag anything that would feel unnatural to a native reader,
even if the sentence is grammatically correct.

This evaluator didn’t replace linguists — it surfaced exactly the kinds of issues they would later flag, but much earlier in the pipeline.

In practice, each translation followed a simple loop:

translate → evaluate → tighten constraints → regenerate

Translation Isn’t Enough

By the time this project neared completion, one truth had crystallized:

Translation is not enough.

This is why professional translators talk about transcreation—not translating content, but recreating the message so it feels born in the target language. Translation converts text. Transcreation carries meaning.

Working on this didn't just make me better at translation. It made me hear languages differently—not as code, but as culture.

Languages don't just describe the world. They recreate it.