Masumi Nakamura


Reverse engineering visual novels 101, part 2

Hey, it’s been a while since my last post, so I thought a quick reminder of what’s going on here would be helpful. We’re dabbling in reverse engineering, and visual novels are our target. In the first part of this article, we’ve learned to analyze container formats using marvelous Kaitai Struct tool, which makes it pretty easy for just about everyone who knows a little about computers to try their hand in proper “clean room” reverse engineering, without deep knowledge of CPUs, registers, disassembly, etc.

Our main target is still Koisuru Shimai no Rokujuso, a lesser known, but still nicely done visual novel from PeasSoft. Last time we’ve discovered that it uses an engine that’s probably called “Yuka”. We’ve successfully extracted game files from .ykc containers, got our hands on visuals, sounds and music, but the final challenge still awaits: the script.

Meditation on novel’s internals

Before diving into the vast depths of binary dumps, let’s take a moment to reflect on how most of the visual novel engines work. Visual novel consists of the text (dialogues, narratives), visuals and sounds. To make it all come together, some sort of a control program is needed. Technically it is possible to just stuff it all as machine code in the middle of .exe file, but in 99% of cases (ok, I’m lying — in 100% of the cases I’ve seen by my very own eyes) engine authors don’t do that. Instead, these control instructions are kept as a separate file in some sort of (improvised) domain-specific scripting language. That’s what they universally call “the script”. Let’s take a peek at how it may look:

$ tarot = 0
$ memory = 0
scene bg01_1 with dissolve
play music "bgm/8.mp3" fadein (2.0)
play ambience "amb/forest.mp3" fadein (3.0)
"Not my favourite time of the day."
"The morning is when you’re not awake enough to do anything…"

This is a small fragment of script’s source code of one of the VNs using Ren’Py — that’s one of the most well-known free-as-in-beer & free-as-in-speech engines. Leaving aside the question of whether it’s a good or bad idea to use Ren’Py per se, let’s just investigate what’s typical for visual novel’s script (and that’s exactly what we’ll be looking for):

  • the text — it might be the narrative (i.e. not spoken by a character, but part of the story told by a narrator) or some dialogue lines (attributes to the certain character)
  • commands to deal with visuals — backgrounds / sprites (`scene bg01_1`) / character graphics / event graphics, sometimes augumenting them with some visual effect (`with dissolve`)
  • commands to play back the music or sound effects (`play music`, `play ambience`), sometimes also coming up with some extra arguments, most frequently with durations of fade-ins and fade-outs (i.e. slow gradual increase or decrease in sound’s volume)
  • dealing with variables: setting (`$ tarot = 0`), getting, checking conditions and branching
  • some extra stuff, like playing back voice files pronounced by certain characters, flow control and various service features, like comments, labels, macros, etc

Of course, in the real world most of the time we won’t be able to access the script’s source code — that would be too easy. It’s been probably half-a-century already since people learned to create compilers (as opposed to interpreters), so nowadays most likely the source code is being compiled into the executable byte code, which is then executed by the VN engine using some sort simple (or not so simple) virtual machine. Sometimes you’ll get lucky — for some popular engines you’ll be able to get a ready-made toolset — compilers, decompilers, debuggers, script validators, etc. — but most of the time life’s not that convenient.

Let’s get back to our visual novel that we’ve started to dissect in our previous article — Koisuru Shimai no Rokujuso. After unpacking of its archives, we found inside the images, the sounds, the music files, and the most mysterious artifact so far — a handful of files with .yks extension. Presumably, it’s where the novel’s script is (remember, the engine is presumably called Yuka, so it’s YKS = YuKa Script). To be precise, it’s not just one script, but there’s a quite a few of them:



That makes us 103 files total in YKS/all/. Let me remind you that we’ve downloaded the trial version — but, it looks like the developers were kind of lazy to rip the full game content properly and we’ve got trial scripts in trial/ and full game scripts in all/.

Judging from my experience, visual novel authors tend to walk one of two ways: either packing every script bit into one huge file, or making tons of files for every scene / event in the game. Looks like it’s the latter we’re facing here. By the way, note that there’s also a distinct “ScriptStart.yks” — but probably it won’t be as interesting to us as the rest of the files. You see, engine developers frequently want their engine to be as all-purpose as possible and choose to implement elements like UI, saving/loading, CG galleries, menus, options screens, etc using the engine itself. It’s definitely possible to diassemble all that stuff too, but (a) it’s certainly not the best place to start given it lacks any text or other visual hints, (b) it’s not that interesting for our purposes. So, I propose to make a head on start and begin with more plausible scripts.

Where’s my martini, shaken, not stirred?

As in our previous article, we start with basic intelligence first. So, get your Aston Martin ready and let’s scout the surroundings.

First of all, this is a Windows game, so it’s a quite viable idea to just find some Windows box, run it and see what happens. That’s what we get right after hitting the “Start new game” button in main menu:

The very first text that one encounters in the visual novel.

The start of the story greets us. We’ve got a background here (half-a-minute search reveals that it’s a “bg01_01.png” from the BG/ directory), and we’ve got some text. Seeking for text is usually a very good idea to start from, so let’s retype it from the screen:


If you have any problems typing Japanese (and I bet at least some of you have completely no idea even where to start), here’s a quick primer on how to do it with some effort in several easy steps.

Typing Japanese in your PC without knowing anything about it

As you can see, Japanese text consists of individual characters (usually fitting into a square), so we take them one by one and:

  • check if it’s a punctuation mark first using the following line:

If you’ve got lucky, just copy-n-paste it. Note that both Japanese “commas”, “periods” and “brackets” are special.

  • if that’s not a punctuation, look for the character in this line:
  • then this line:
  • still no luck? for example, you’ve got 恋 — then we’re doomed, it’s kanji; zoom your font up to 300–500% to make the details clearer and got to, “search by radical” section; once there, look closely at the table of the elements that form kanji characters (these are called “radicals”) and look for the parts that look like parts of the kanji; taking 恋 as an example — a quick mediation reveals that bottom part of this character is 心 — pressing down this radical shortens a list of possible kanji to browse dramatically — it’s no longer multiple thousands, but a few dozen; browsing through them, we’re bound to find fifth character in “10” section — that would be the 恋 we’re looking for;

Sure it’s might be time consuming, but it’s better than nothing. Another semi-cheating method is to go to Google Translate, turn Japanese input there, switch into “drawing” mode and try to draw what you see. In 80% of the cases, you’ll be lucky and you’ll get your kanji right away. In other 20%, looking up kanji in a dictionary like a is an ultimately foolproof method.

Another note that I’d like to point out is that I’m not sure if that would be “Ver2” or “Ver2” — note that it’s not the different fonts I’m using here, but so called “full-width characters” — that’s completely distinct characters, to be found in Unicode somewhere around U+FF01..U+FF5E — obviously, if we’re comparing binary representations, we need to be exact.

The text we’ve copied will help us in two ways. First of all, we can translate it and find out what’s going on. Even if you don’t read Japanese, you can always stick it into Google Translate or some other automatic translator and get a rough idea. So, actually, it’s not the beginning of the story. Authors thank us for downloading the trial version, so it’s not the “first chapter”, but some kind of a preface, some words from the authors. Second, we can take this line, encode it as ShiftJIS (as we’ve figured out in the previous article, there are really high chances that this engine internals all use ShiftJIS) and seek the files for it. Let’s take a piece of it and do the encoding:

$ echo ‘ダウンロード頂きありがとうございます’ | iconv -t sjis | hd
00000000 83 5f 83 45 83 93 83 8d 81 5b 83 68 92 b8 82 ab
00000010 82 a0 82 e8 82 aa 82 c6 82 a4 82 b2 82 b4 82 a2
00000020 82 dc 82 b7

So, here’s the hex string we need to seek in all our files. Alas, we fail here. Life’s not gonna be that easy.

ShiftJIS: how does it work

Time to make yet another excursion into Japanese PC culture and get ourselves familiar with ShiftJIS encoding. They say that there are as many Japanese characters as stars in the sky. I dunno if that’s the case, but it’s hard to argue that at the very least there are much more characters in Japanese that characters in English. Thus, having 1 byte to encode them all (with 256 possible values) is barely feasible. Thus, ShiftJIS uses at least 1 byte and at most 2 bytes. As you might see from this table, byte values of 00..7F are equal to ASCII (thus making ShiftJIS compatible with ASCII), and byte values of 81..9F and E0..EA mean that it’s a 2-byte combo. Note that for, again, for sake of ASCII compatibility, second byte is not arbitrary, but can be somewhere between 40 and FF.

Shortest crash course in Japanese language ever possible

Looks like we won’t escape yet another dive into Japanese language basics. So, to put it blunt, Japanese uses 3 groups of symbols:

  • hiragana — look something like ありがとうございます — i.e. simple round cursive shapes; ~50 glyphs, but there are a few variations like “big i = い, small i = ぃ”; 1 syllable = 1 glyph.
  • katakana — look something like ダウンロード — i.e. square-like, straight lines, pretty useful for typesetting; it corresponds to the same sounds as hiragana, but is mostly used to write down the loanwords (ダウンロード = da-u-n-lo:-do = download).
  • kanji — look something like 体験版 — i.e. complex drawings with lots of elements, fit into square shape; most of the time it’s kanji that give the most pain for Japanese language learners, and it’s many thousands of them.

Also, there are a few other symbols, like punctuation. For our purposes (albeit proper Japanese scholar would like to kill me for such a hypocrisy), let’s say that they are mostly equal to those of European languages. There are:

  • a “full stop” — 。
  • a “period” — 、
  • ellipses — …
  • quotes — 「」
  • full-width question mark — ?
  • full-width exclamation mark — !
  • and some other, less frequently used symbols

But there’s a catch, as you might have seen: there are no spaces in Japanese. The trick is simple: in Japanese text you have “significant” words (which are written with a mix of kanji and hiragana) and participles, which are always written in hiragana. This way, one can detect start of the words by changing of script. Let’s take the name of our game as an example: 恋する姉妹の六重奏

  • 恋 — kanji
  • する — hiragana
  • 姉妹 — kanji
  • の — hiragana
  • 六重奏 — kanji

Ok, so what are we gonna do with all that stuff? Knowing that helps a lot in detection of obfuscated Japanese. For starters, let’s do a very simple thing: frequence stats. Taking a full ready-made script of any Japanese novel (there’s a couple of them available for download at tlwiki, for instance), we do a quick look up for the Unicode codepoint ranges for all 3 scripts we’re interested in and we run the following script on that (pun intended):

stats = {}
$stdin.each_char { |c|
t = case c.ord
when 0x3041..0x309F then :hiragana
when 0x30A0..0x30FF then :katakana
when 0x4E00..0x9FCC then :kanji
stats[t] ||= 0
stats[t] += 1
p stats

and, voila, we’ve got something like that:

{nil=>72384, :kanji=>5731, :hiragana=>15377, :katakana=>2241}

That means that a typical Japanese text would have ~25% of kanji, 65% of hiragana and 10% of katakana:

Knowing distribution of different classes of characters in Japanese aids statistical analysis methods

Enough of the theory, give me some hands-on!

Ok, time to get our tools ready and dive into it. As a very quick reminder for those who have forgotten our first article, we’re using a new tool named Kaitai Struct to reverse engineer the binary files of unknown structure.

Kaitai Struct allows one to make some templates in .ksy markup language, which can be applied to the file (or, better yet, the files) to quickly visualize their inner structure in a nice tree view. As soon as you finish writing a .ksy, you’ve got a huge bonus as well: you can just compile the .ksy into a module in any supported target language.

By the way, since the last article, Kaitai Struct received lots of love and now, besides Java, JavaScript, Python & Ruby, it supports C++, C#, Perl and PHP as well. If we’ll take a look at some top programming language list, that means top 10 are covered, and, out of top 20, if we’re not considering domain-specific stuff, only Delphi, Visual Basic, Swift and Go are missing. That said, I’ve never seen anyone using Delphi like in a decade and I would hardly imagine anyone using ancient Visual Basic (*not* modern .NET one) for reverse engineering.

We went over the basic syntax of Kaitai Struct templates in the first article, so, if you still haven’t read it or just want to brush it up — that’s the best time to do it.

Ok, let’s quickly take a look at hex dumps of a couple of files and hack up the following quick template for a start:

id: yks
application: Yuka Engine
endian: le
- id: magic
contents: ["YKS001", 1, 0]
- id: magic2
contents: [0x30, 0, 0, 0, 0, 0, 0, 0, 0x30, 0, 0, 0]
- id: unknown1
type: u4
- id: unknown2
type: u4
- id: unknown3
type: u4
- id: unknown4
type: u4
- id: unknown5
type: u4
- id: unknown6
type: u4
- id: unknown7
type: u4

One might see similarities with YKC format right away. Given that YKC format started with a header that had its own length in the very beginning, I might say it’s a safe bet that that fixed 0x30 in “magic2” field everything is actually the size of first header, so I went straight ahead and parsed everything up to 0x30. That gives us 7 integers, and we’ve got to guess what is what.

So, for Yoyaku.yks (file size is 27741 bytes):

  [.] @unknown1 = 1845
[.] @unknown2 = 7428
[.] @unknown3 = 795
[.] @unknown4 = 20148
[.] @unknown5 = 7593
[.] @unknown6 = 25
[.] @unknown7 = 0

For trial_00100.yks (file is 91267 bytes long):

  [.] @unknown1 = 6433
[.] @unknown2 = 25780
[.] @unknown3 = 2376
[.] @unknown4 = 63796
[.] @unknown5 = 27471
[.] @unknown6 = 5
[.] @unknown7 = 0

And, for the sake of comparison, something from “all”, for example, “all_00010.yks” (12968 bytes long):

  [.] @unknown1 = 933
[.] @unknown2 = 3780
[.] @unknown3 = 353
[.] @unknown4 = 9428
[.] @unknown5 = 3540
[.] @unknown6 = 1
[.] @unknown7 = 0

What do we see here? First of all, I can bet a million that we’re looking at either offsets or some size sections in a given file, as for 91K file the numbers are between 25..63K, and for 12K file, they keep between 3..9K. Closer inspection reveals that probably only “unknown2”, “unknown4” and “unknown5” are offsets or sizes — they are all divisible by 4 and are large enough. “unknown7” always seems to be zero. “unknown6” seems to be small enough, so it’s probably a counter for some entities. It might be the number of scenes / sprites / backgrounds, size of memory pool reserved for variables in our virtual machine, or something like that.

After 0x30, even with a naked eye (I mean, with just a hex editor), one can see a range of steadily increasing numbers (ok, almost steadily increasing). Probably it is not the byte code: it the byte code you’d expect to see lots of repetitive patterns. Most likely there are also some kinds of offsets — for example, these could be offsets to starts of statements in the byte code or offsets to variable length strings in some other section or something like that. Given that we’ve identified first possible section and given that we have 7 unknown values in the header, it’s only natural to give it a try and check if anything of that looks like:

  • a size of this section
  • an absolute offset of the end of this section = start of next section
  • number of 4-byte integers in this section

That said, we hit the bulls eye in the first or second shot. “unknown1” proves to match the quantity of entities in our first section, and “unknown2” seems to be a pointer to the start of the next section. In practice, that’d mean that the following is always true:

unknown2 = 0x30 + unknown1 * 4

Let’s jot that down, and while we’re at it, we’ll move our header description into distinct type “header”, and we’ll use names like sect1..sectX for the following sections that we’ll discover:

- id: header
type: header
- id: sect1
size: header.sect2_ofs - 0x30
type: sect1
- id: magic
contents: ["YKS001", 1, 0]
- id: magic2
contents: [0x30, 0, 0, 0, 0, 0, 0, 0, 0x30, 0, 0, 0]
- id: sect1_qty
type: u4
- id: sect2_ofs
type: u4
- id: unknown3
type: u4
- id: unknown4
type: u4
- id: unknown5
type: u4
- id: unknown6
type: u4
- id: unknown7
type: u4
- id: entries
type: u4
repeat: expr
repeat-expr: _root.header.sect1_qty

“trial_00100” turns out to look like that with our new .ksy:

  [-] @header
[.] @magic = 59 4b 53 30 30 31 01 00
[.] @magic2 = 30 00 00 00 00 00 00 00 30 00 00 00
[.] @sect1_qty = 6433
[.] @sect2_ofs = 25780
[.] @unknown3 = 2376
[.] @unknown4 = 63796
[.] @unknown5 = 27471
[.] @unknown6 = 5
[.] @unknown7 = 0
[-] @sect1
[-] @entries (6433 = 0x1921 entries)
[.] 0 = 6
[.] 1 = 7
[.] 2 = 3
[.] 3 = 3
[.] 4 = 4
[.] 6425 = 2371
[.] 6426 = 2372
[.] 6427 = 34
[.] 6428 = 1
[.] 6429 = 2373
[.] 6430 = 2374
[.] 6431 = 1
[.] 6432 = 2375

It is obvious now that our assumption about “sect1” entries being increasing offsets or something is not entirely correct. There are not only increasing numbers, there is a fair deal of repetitions, thus these actually could be the bytecode. Increasing numbers seem to follow the pattern of starting with 0 or 1 and increasing steadily by 1 up to 2375. As one might notice, “unknown3” = 2376 — and it looks just like the total number of these values. That probably means that our proposed “bytecode” in sect1 references some other table with exactly 2376 different values (probably indexed from 0 up to 2375 inclusive). What could it be?

Increasing numbers in 16-byte records neatly lined up

I guess it’s evident that these are 16-byte long records (exactly 1 line here), and inside they carry something that looks very much like steadily increasing indices or offsets. A stupid guess, may be there are 2376 of them? Let’s check that out, renaming “unknown3” to “sect2_qty” and adding this trivial piece to collect “sect2” as 16-byte records:

  - id: sect2
size: 16
repeat: expr
repeat-expr: header.sect2_qty

and looks like we hit the jackpot. It’s exactly that:

End of “sect2”, in assumption that there are exactly “sect2_qty” 16-byte records in it

Even with the naked eye one can see that these strict 16-byte records end exactly at our assumed end of “sect2” and that means that there are exactly “sect2_qty” of them. Something completely different lies beyond them. How does it look? That’s definitely not 4-byte integers, almost everything seems to be non-zero. After looking at these for several minutes, I think I can’t make a good guess on what it is. There are no periodic patterns, nothing. Only thing worth noticing is lots of 0xaa bytes. Tons of 0x28 too, and these can be encountered always on every second byte — no two 0x28 come in sequence. Let’s check out the end of file — does it look the same there, or we’ve got something else, i.e. more sections? Nah, same patterns seem to occur there as well:

End of file. Abudance of 0xaa and 0x28. End of “sect3”?

Thus that’s a safe bet that we’re observing a third and final section of this file. There’s nothing beyond that. Let’s jump back to the very beginning and recall what we were going to find here. Header? Check. Bytecode? Check. Text? Nope. Probably here they are, but encoded in some way to make it less evident. Are they compressed? Probably not, if they were, there won’t be so many repetitions of 0xaa and 0x82. Actually, repetitive 0x28 in something like

28 08 28 1b 28 0e 28 6c 26 6f 28 07 3a 14 28 6b

looks really, really suspicious. Rewind to our short excursion into Japanese ShiftJIS encoding and get a specimen of a normal Japanese text in ShiftJIS:

82 a0 82 e8 82 aa 82 c6 82 a4 82 b2 82 b4 82 a2

Is it just me, or it’s a simple substitution cypher, where every single byte is replaced with some other single byte using some mapping table. Ok, what shall we do to encode 0x82 as 0x28? Of course, that could be any arbitrary table, but most of the times all humans are lazy and thus opt for ready-made functions. It’s not like there are tons of them. Actually, I can only name 3 from top of my head:

  • addition/subtraction — one can just add (or subtract, which is the same operation from CPU’s point of view) some constant to every byte, wrapping the result around in case of overflow
  • ROL/ROR — a circular shift by a certain amount of bits; note that circular shift right or left by 4 bits just swaps 2 hexadecimal digits in every byte
  • exclusive or (XOR) — one can do XOR operation, combining each byte with some other fixed byte; an approach which is definitely most overused, but still effective and thus still popular

There are quite a few heavy-duty tools like XORSearch, that can help you to brute-force such simple algorithms, but in this case, it’s even simpler, so I’ve got a match on second try. Lots of 0xaa bytes allows us to guess that there are many zeroes in the stream, which, XORed with 0xaa, give us lots of 0xaa’s. Besides, 0x82 ^ 0xaa is mysteriously 0x28 (and XOR operation is so versatile that you can derive that by doing 0x82 ^ 0x28 = 0xaa).

To be frank, 0xaa is one of the most abused XOR patterns, which one has to try almost immediately (akin to passwords “123456”, “qwerty”, or top 10 from lists like these when brute-forcing passwords). 0xaa is 0b10101010, that is, XORing a value with 0xaa toggles every 2nd bit in it. Yeah, a lot of people still think it’s a cool idea.

Fortunately, Kaitai Struct supports such cases with a process: clause. It’s enough to add the following:

  - id: sect3
size-eos: true
process: xor(0xaa)

and, here you go, we’re finally observing the rich inner world of string constants in our script:

000000: 69 66 00 c8 00 00 00 47 6c 6f 62 61 6c 46 6c 61
000010: 67 00 3d 00 ff ff 00 00 01 00 00 00 3d 00 7b 00
000020: 0d 00 00 00 57 69 6e 64 6f 77 4e 61 6d 65 53 65
000030: 74 00 97 f6 82 b7 82 e9 8e 6f 96 85 82 cc 98 5a
000040: 8f 64 91 74 28 83 66 83 6f 83 62 83 4f 29 81 7c
000050: 46 69 6c 65 20 3a 20 74 72 69 61 6c 68 5f 6d 61
000060: 79 75 2e 79 6b 73 00 7d 00 09 00 00 00 44 72 61
000070: 77 53 74 6f 70 00 47 72 61 70 68 69 63 48 69 64
000080: 65 00 0a 00 00 00 54 72 61 6e 73 69 74 69 6f 6e
000090: 00 02 00 00 00 64 00 00 00 0a 00 00 00 0b 00 00
0000a0: 00 47 72 61 70 68 69 63 4c 6f 61 64 00 00 00 00

or, in ASCII:

000000: if.....GlobalFla
000010: g.=.........=.{.
000020: ....WindowNameSe
000030: t........o.....Z
000040: .d.t(.f.o.b.O).|
000050: File : trialh_ma
000060: yu.yks.}.....Dra
000070: wStop.GraphicHid
000080: e.....Transition
000090: .....d..........
0000a0: .....d..........

To be honest, we’ve got really lucky here: there are a lot of ASCII strings, which make our life easier. From the first glance, one might say that there’s just a bunch of C-style zero terminated strings mashed together, but careful inspection reveal that is not exactly the case. Sure, strings are in there, but other than that, we’ve got more binary between it too, like

ff ff 00 00 01 00 00 00


02 00 00 00 64 00 00 00 0a 00 00 00 0b 00 00 00

There’s one printable ASCII symbol in it (`d` = 0x64), but probably these are not strings. And, finally, we’ve got the original novel text in Japanese in ShiftJIS with all these 0x82s.

Hooray, we’re done?

Ok, let’s summarize what we’ve got:

  1. sect1, consisting of 4-byte integers (presumably, it is the bytecode); at least some of these 4-byte integers are references to 16-byte records in sect2
  2. sect2, consisting of 16-byte records with some increasing numbers inside (presumably — offsets referencing some other section, maybe sect3?)
  3. sect3, consisting mostly of null-terminated ShiftJIS-encoded strings and some more intricate data (presumably — it’s the string resources and other constants that bytecode references)

And, while we’re still enjoying this little victory, I guess we’ll call it a day. This article became much longer than it was originally expected (and definitely much longer than your average article). Actually, if we were translating this novel, what we’ve achieved today is enough to pass the torch to the translators. For that, all we need to do is rip out all the Japanese texts from sect3 and hand them over for translation. Just iterating over the sect3 contents, ignoring all unprintable stuff gives us something like (of course, you need to convert it to your system’s encoding, probably UTF-8, if you’re not on a ShiftJIS system), yields:

恋する姉妹の六重奏(デバッグ)-File : trialh_mayu.yks

What’s next?

My thanks to everyone who had the patience to follow up to this place. Stay tuned for the next article where we’ll dig in into the bytecode itself and will try to understand how all discovered sections are interlinked. Be sure to subscribe, if you don’t want to miss it.

A summary of useful links:

If you feel like translating this tutorial into your language to help more people learn reverse engineering — it is CC-BY-SA-licensed, so you’re totally welcome!

If you like this article and found it helpful, please click ❤ below to recommend it to others, or share it with your friends in social networks. It means a lot to me and motivates me to write more :)

More by Masumi Nakamura

Topics of interest

More Related Stories