About a week ago, a friend asked me a question that would end up consuming me for many late nights:
“I’ve been studying Chinese for a few months now and I’ve learned all of these characters to make basic words. I know a lot of them can be rearranged and put in different orders to make other words that I would be able to read, write, and speak more easily, but I don’t know what those words would be…How can I figure out what words I should already be able to form and if it’s worthwhile to learn them?”
The question intrigued me. Having studied Chinese for 5 years, I’ve often wondered the same thing. But this time, the question resonated differently with me. I started to think not just about what characters my friend could learn, but also how she should learn if she wanted to grasp Chinese as quickly as possible.
With this question in mind, I embarked on a journey to determine the quickest order in which one could learn Chinese characters.
(Skip this section if you’ve already studied Chinese language or how the language works.)
The Chinese language is classified as a syllable-based logography, a writing system where each syllable is represented by a graphical character. Some of these characters represent full words on their own (just like the monosyllabic “I” or “me” in English). In other cases, stringing multiple characters together creates a full word (think of the polysyllabic “Iodine” or “meander” in English).
Basics of Chinese Characters. Chinese words can be made from one character or from several combined characters.
In alphabet-based languages, there is a direct link between reading and pronunciation, so students only need to memorize a word’s pronunciation or spelling to be able to fully use that word. In Chinese, however, there is no direct link between character and pronunciation (see Assumption 3 below), so students need to memorize a word’s pronunciation (romanized with the Latin alphabet in a system called pinyin) and character-based “spelling” in order to be able to fully read, write, speak, and understand that word.
For example, I can show you the Spanish word for house, casa, and you immediately have an idea of how to pronounce it. You can study this word with a two-sided flashcard, with the Spanish on one side and the English on the other. If I show you the Chinese character for house, 家, you have no clear signal of how the character should be pronounced (jiā). In order to study a Chinese word, you would need a three-sided flashcard, with the characters on one side, the pinyin pronunciation on another, and the English meaning on a third.
This complexity is one of the reasons that studying Chinese is so hard (see the US State Department’s classification). Since learning each character takes significant effort in Chinese, it is critically important to determine which characters will give students the most value for their effort.
So how can we order the characters that students memorize so that they learn the most Chinese as quickly as possible?
Our first instinct might be to teach students words based on how frequently those words are used and to require students to memorize the characters in those words. This is a relatively standard approach to language that is intuitive for non-character-based languages. It’s also the likely basis of most Chinese courses, since it makes sense to teach students the words that they’ll encounter most frequently.
But in a character-based language like Chinese, the standard approach might not be the most optimized. It is entirely possible that the most common words in Chinese contain uncommon characters or, rather, that students can gain more command of the language without learning as many characters so long as they make maximum use of those they already know. These “low hanging fruit” characters are the premise of my friend’s question and this exploration.
For the purposes of testing the idea focusing on these “low hanging fruit” (LHF) and how this prioritization in the Chinese learning process would increase learning efficiency, we can consider three different learning methods:
**Method 1: Standard**As stated above, students should learn Chinese words in the order of their usage frequency. By extension, students should study characters in the order in which they appear in these frequency-arranged words. This method prioritizes everyday usage and ease of communication.
**Method 2: LHF Words**Students should study the characters that will give them the most mastery over the Chinese language as measured by the words they can form with those characters. For each character a student is about to learn, we take into consideration all of the words that the student can make with that character and the characters they already know, as opposed to just focusing on the most common word. This method prioritizes efficiency in learning characters.
**Method 3: Combined Approach of Standard with LHF Words**Students should learn Chinese in the order of the most frequent words, but when they learn characters that they can use to form other words (LHF), they should learn those LHF words before attempting to learn another character. This combined approach uses principles from both Method 1 and Method 2. It prioritizes everyday usage and ease of communication, but also being efficient given the characters that a student has already learned.
Application of Ordering Methods. Here, we see a visual representation of a set of Chinese words and their frequencies (f). Colored blocks respond to Chinese characters with blocks of the same color representing the same character. Method 1 optimizes based on word frequency. Method 2 optimizes based on the best path to find LHF words. Method 3 finds LHF while going through the most frequent words. For each method, I show the characters learned at every step (cl) and the overall mastery that a student would have gained (m). Note how cl and m vary for each method.
I decided to put these methods to the test to determine how each would alter a student’s ability to learn Chinese. Given the time it takes to absorb a useful amount of Mandarin, rather than teaching a group of students, I instead opted to teach my computer Mandarin via a simulation. Here’s how:
My process required me to make the following assumptions:
After running all three methods, I compared how each would affect a student’s mastery of Chinese.
To start, we can consider what percentage mastery a student will have after each word learned. Even before running our explorative simulations, we would have been able to intuit that the standard approach would be the most effective on a per-word basis, since that is precisely what it optimizes for (learn the next most frequent word at any given time). What might have been less obvious is how dramatically Methods 2 and 3 would influence a student’s mastery on a per-word basis. Let’s take a look at the data:
Methods 2 and 3 have students learn all words that they can make with their current character set before learning any new characters. Some of the words that they can make will be common words with high-frequency, whereas others may be more obscure LHF words. For example, Method 2 instructs the student to learn the word 好看 (hăokàn, attractive) for .38% mastery before learning the word 和 (hé, and) for 4.48% mastery simply because the student already knows the characters for 好 (hăo, good) and 看 (kàn, look). We should note that Method 3 works somewhat better than Method 2 in covering common words (at some points around 1172 words about 2.3% better), given that it prioritizes the most frequent words by default. Still, both Methods 2 and 3 fall short of Method 1 here, with a gap as wide as 11.6% at 59 words. Lastly, it is important to note that in both the per-word and per-character results, we will see all 3 methods converge at the end of simulation, since they must all end up teaching the student the same 5000 words and 2067 characters regardless of order.
At first, we might view this as a weakness of Methods 2 and 3, particularly if our goal is to be able to communicate with the most frequent words as quickly as possible (as is often the case in Mandarin conversational classes which don’t include reading and writing). But this is actually the entire point of our optimization, which is based on the idea that learning Chinese characters takes significantly more effort than learning only pinyin. (In another world, it might be more convenient if all of Mandarin were written in pinyin and we could revert to a two-sided flashcard way of learning, but, alas, it would be somewhat sacrilegious to roll back the 3,000+ years of history backing the character system, its beauty, and its ability to keep written Chinese consistent over millennia.) Instead, it is much more informative to look at the per-character results of each method:
Looking at the per-character results tells a much more informative and interesting story about how these learning methods perform.
As we would expect, Methods 2 and 3 behave much better than Method 1 on this basis since they both account for LHF and make the most of the characters that a learner has mastered. That being said, two important conclusions arise from the results.
First, it is readily apparent that Methods 2 and 3 offer a significant advantage over Method 1 on a per-character basis. Once a student has learned 491 characters using Method 2, they are able to access a whole 5.2% more Chinese words. That is to say, this student can read 5.2% more Chinese than a peer just by optimizing which characters to learn.
Secondly, we note that Methods 2 and 3 are neck-in-neck throughout the per-character results, and more so than in the per-word results. Their largest gap in mastery on a per-character basis is 1.7% at 25 characters, as compared to a gap of 2.3% at 1172 words in the per-word data. Functionally, this is because both methods search for LHF and increase their per-character mastery, but Method 3 does so by also prioritizing per-word mastery. Essentially, Method 3 is a happy medium between Methods 1 and 2, though it behaves more similarly overall to Method 2.
So what does this mean to an actual Chinese language learner?
The results of this study are quite conclusive. If you learn Chinese characters and simultaneously study all of the Low Hanging Fruit words associated with them, you can more quickly gain mastery of Chinese reading and writing (since knowing characters is not as necessary for speaking). However, doing so comes at the cost of learning the most common words first, since you would end up learning some LHF words that are not as useful in everyday life. Essentially:
If you want to learn to read and write Chinese, use Method 2 or 3 to study.
If you only want to learn conversational Chinese, use Method 1.
Use the Quizlet Flashcards I’ve generated to start learning Chinese characters more efficiently!
Ready to get started? Here are ordered Quizlet flashcards (password: “Medium”) I’ve created for Method 2 (Deck 1, Deck 2, Deck 3) and Method 3 (Deck 1, Deck 2, and Deck 3). (Note that translations for these decks were created by Google Translate and are not my own.) For Method 1, you can find a frequency-based dictionary here or the original Cai and Brysbaert lexicon here.
I’m excited about the potential for this work to change how learners access written and spoken Chinese, but there is always more work to be done. Here are some ideas on how the above can be improved upon in the future:
Thoughts on what might be the best next step for this exploration? Find the flashcards or thought process particularly useful? Comment below and be sure to recommend/share this post with others! Find other posts from me here and follow me for future updates!