We’re starting a new fascinating series about the history of large language models (LLMs).
“To fundamentally push the deep learning research frontier forward, one needs to thoroughly understand what has been attempted in the history and why current models exist in present forms”
Haohan Wang and Bhiksha Raj from On the Origin of Deep Learning
Large Language Models (LLMs) have a fascinating history that dates back to the early 1930s when the first ideas of computational linguistics were born. You may argue that this tracing is excessive and that LLMs have nothing in common with the old-fashioned prehistoric computer systems. You may also argue that LLMs are based on real, hard-core deep learning. However, deep learning itself originated in 1943, when the first ancestor of the artificial neural model was proposed by McCulloch and Pitts. Exactly 60 years ago! What took us so long to get to modern LLMs?
This series isn't about drowning you in technical details. While we provide an extensive list of references for those who want to delve deeper, our main goal is to captivate your attention and share the influential developments that have shaped LLMs. Consider it a springboard for further exploration, a chance to find something in history that can inspire you for a new ML discovery. It's an invitation to immerse yourself in the story of LLMs, which made such a splash last year.
In this episode, we'll take you on a time-travel adventure from 1933 to 1966. Ready? Let’s dive into The Era of Mechanical Translation and How It Crashed!
The concept of mechanical translation (MT) has always been a distant dream that tickled the imagination of many inventors, but it wasn't until the early 20th century that engineers and mathematicians began to develop the first concrete ideas about how to make it a reality.
In 1933, significant progress was made when two individuals, George Artsrouni, a French-Armenian, and Petr Smirnov-Troyanskii, a Russian, independently patented their ideas for mechanical translation systems.
1933 – George Artsrouni and Petr Smirnov-Troyanskii independently secure patents detailing the first proposals of systems for mechanical translations.
George Artsrouni designed a storage device on paper tape which could be used to find the equivalent of any word in another language. Troyanskii proposed a three-stage approach, where humans would handle the initial and final stages of translation, with the machine serving as an intermediary. Troyanskii firmly believed that in the future, the entire translation process could be fully mechanized.
1937 – Artsrouni demonstrates his first prototype
Troyanskii's ideas hold greater significance compared to those of Artsrouni, yet their impact remained largely confined within the borders of the USSR. The lack of international awareness about Troyanskii's work limited the broader recognition and influence his ideas could have had on a global scale.
It wasn't until 1947 that some occasional conversations started to happen about mechanical translation in the United States. By that time, the progress made in the field of mechanical translation was limited to the development of a program capable of performing dictionary-based lookup operations, emulating the tasks performed by human translators.
1947 – First discussions of MT in the US
Challenged by limited resources and a lack of formal support, the United Kingdom faced difficulties in establishing itself in the field of mechanical translation. Andrew Booth and Richard Hook Richens could devote only their spare time from their normal university duties to this unexplored domain. At odd moments, they collaborated on creating a detailed description of a dictionary that could potentially be used in conjunction with computing machines.
1947 – The start of Booth and Richens's collaboration on the dictionary
The same year, in 1947, Warren Weaver, having been exposed to computer design problems during the war and understanding the capabilities of modern electronic computers, envisioned the possibility of using computers for translation. He wrote to famous professor Norbert Wiener of MIT, expressing the idea of designing a computer for translation to address the significant communication challenges between people (“for the constructive and peaceful future of the planet”). He even speculated that the problem of translation could be approached as a cryptographic problem.
However, Professor Wiener, in his response, expressed skepticism about the feasibility of mechanical translation due to the vague boundaries of words in different languages and the extensive emotional and international connotations attached to them. Despite Weaver’s attempt to persuade Wiener by suggesting that a computer could handle the vocabulary and combinations of words, the discussion did not lead to any concrete progress in the field of translation at that time.
But in 1949, Warren Weaver went ahead and published “Translation,” a memorandum that brought the concept of mechanical translation to global attention. This event inspired a wave of research at the University of Washington, the University of California at Los Angeles, and the Massachusetts Institute of Technology.
1949 – Warren Weaver presents his memorandum about Translation
In terms of machinery, the first notable advancement took place in 1950. Leon Dostert, in collaboration with International Business Machines (IBM), initiated the Georgetown-IBM Experiment, giving birth to the Georgetown Machine – the world's first mechanical translation marvel. It symbolized a promising glimpse into a future where words could effortlessly cross language barriers. The events start to unfold much faster for MT from this moment.
1950 – The invention of the Georgetown Machine, the first machine for mechanical translation
Around the same time, Yehoshua Bar-Hillel was appointed as the first full-time machine translation (MT) researcher at MIT. He became the one to organize the 1st International conference focused entirely on machine translation, for which he published a 10-page overview of the present state of research on mechanical translation. This and the IBM - MIT Memory Conference next year emphasized two essential needs: long-term basic research and the demonstration of MT in action. This placed the basis and defined the main directions of MT research for the following years.
Full action on MT was happening in the Soviet Union. By the spring of 1951, nearly fifty engineers had been working on the machine, and by the autumn of 1952, the BESM-1: the First Computer of the S.A. Lebedev Institute of Precise Mechanics and Computer Engineering was in operation. At this moment, it was one of Europe's fastest electronic computers. It also was used as a prototype of the first Chinese computer built with the help of Soviet engineers.
BESM-1 had 1024 words of read–write memory and 1024 words of read-only memory. It also had external storage: four magnetic tape units of 30,000 words each and fast magnetic drum storage with a capacity of 5120 words and an access rate of 800 words/second. An incredible capability for that time!
1952 – USSR completes the creation of BESM-1
Four years after its invention, in 1954, the Georgetown Machine took center stage with a public demonstration, showcasing its capabilities. A thoughtfully curated set of 49 Russian sentences was translated into English. Notably, this translation was accomplished using a highly limited vocabulary of only 250 words and a mere six grammar rules.
This demonstration generated widespread publicity and became one of the most influential instances in the history of machine translation. The experiment was a collaborative effort between two IBM staff members, Cuthbert Hurd, and Peter Sheridan, along with Leon Dostert and Paul Garvin from the Institute of Languages and Linguistics at Georgetown University.
The captivating showcase of the Georgetown Machine's capabilities ignited a surge of optimism and excitement. What’s more important, the demonstration's success attracted significant funding and support, providing a solid foundation for further advancements in the field of machine translation.
1954 – The first public demonstration of an MT system using the Georgetown Machine
Meanwhile, across the Atlantic, the Nuffield Foundation made a generous grant to Birkbeck College, University of London, that allowed it to take the project of MT translation on a full-time basis in Great Britain.
1955 – The Nuffield Foundation grant to Birkbeck College, University of London
In 1956, several impressive demonstrations of machine translation took place at Birkbeck College, utilizing the APEXC (All Purpose Electronic (X) Computer) computing machine.
The same year, a modest, roughly 8-week workshop was organized by McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon. Its name was The Dartmouth Summer Research Project on Artificial Intelligence. It was for this group that Artificial Intelligence was first named as a science.
These years the enthusiasm around MT was high; the topic was so hot that research and development groups appeared everywhere. In 1957, Ray Solomonoff (one of the original ten invitees to the Dartmouth workshop) published the first paper on machine learning, "An Inductive Inference Machine."
1957 – Ray Solomonoff publishes the first paper on machine learning, "An Inductive Inference Machine"
In 1959, IBM showed some muscle and unveiled their masterpiece, the first member of the Automatic Language Translator system family, named "Mark I."
It consisted of a 65,000-word dictionary and a custom tube-based computer to do the lookups. Texts were hand-copied onto punched cards using custom Cyrillic terminals and then input into the machine for translation.
A system was installed for the US Air Force, which produced translations for many years. It was a custom computer that used a high-speed optical disk with 170,000 words and phrases to translate Russian documents into English.
1959 – IBM demonstrates the Automatic Language Translator “Mark 1”
It was not the only MT system in use in the US. A group under Michael Zarechnak at Georgetown University proposed the method adopted and named Georgetown Automatic Translation (GAT). It was successfully demonstrated in 1961 and 1962. As a result, Russian-English systems were installed at Euratom in 1963 and at the Oak Ridge National Laboratory of the US Atomic Energy Commission in 1964.
1962 – A group under Michael Zarechnak in Georgetown University proposes the method adopted and named Georgetown Automatic Translation (GAT)
In the mid-1960s, the US government started to question whether MT was financially reasonable and as effective as humans. In the many research groups that were established around the world, there came an understanding that MT is much more difficult than they had anticipated. The following years were a time of disillusionment.
In addition, in 1964, the United States government formed the Automatic Language Processing Advisory Committee (ALPAC) to evaluate the progress and potential of machine translation.
1964 – The Automatic Language Processing Advisory Committee (ALPAC) is formed to assess the state of MT research
The publication of the ALPAC report in November 1966 crushed machine translation. It led to significant reductions in funding and a shift toward more theoretical research in computational linguistics. The report's impact was substantial, even giving rise to discussions among researchers about the possibility of similar assessments. While the notoriety of ALPAC is well-known, the report's actual content is often forgotten or misunderstood. Titled "Languages and machines: computers in translation and linguistics," it addressed not only machine translation but also the broader field of computational linguistics, although in practice, NLP research was mostly focused on MT at that time.
1966 – The publication of the ALPAC report
Some condemned the ALPAC report as narrow, biased, and shortsighted. However, its influence was profound. It brought a virtual end to MT research in the United States for many years, and MT was perceived as a complete failure.
What happened in the next decade that allowed researchers to get into the “languages and machines” concept again? This pivotal period would witness a shift in focus towards practical applications of natural language processing, and the advent of statistical models would breathe new life into the dream of breaking down language barriers. But more on this with all the details – in the next episode!
To be continued…
If you liked this issue, subscribe to receive the second episode of the LLMs history straight to your inbox. Oh, and please, share this article with your friends and colleagues. Because... To fundamentally push the research frontier forward, one needs to thoroughly understand what has been attempted in history and why current models exist in present forms.
Also published here.