We’re starting a new fascinating series about the history of large language models (LLMs). Why should I know about it? “To fundamentally push the deep learning research frontier forward, one needs to thoroughly understand what has been attempted in the history and why current models exist in present forms” Haohan Wang and Bhiksha Raj from On the Origin of Deep Learning Large Language Models (LLMs) have a fascinating history that dates back to the early 1930s when the first ideas of computational linguistics were born. You may argue that this tracing is excessive and that LLMs have nothing in common with the old-fashioned prehistoric computer systems. You may also argue that LLMs are based on real, hard-core deep learning. However, deep learning itself originated in 1943, when the first ancestor of the artificial neural model was proposed by McCulloch and Pitts. Exactly 60 years ago! What took us so long to get to modern LLMs? This series isn't about drowning you in technical details. While we provide an extensive list of references for those who want to delve deeper, our main goal is to captivate your attention and share the influential developments that have shaped LLMs. Consider it a springboard for further exploration, a chance to find something in history that can inspire you for a new ML discovery. It's an invitation to immerse yourself in the story of LLMs, which made such a splash last year. In this episode, we'll take you on a time-travel adventure from 1933 to 1966. Ready? Let’s dive into The Era of Mechanical Translation and How It Crashed! The first steps in the US, USSR, and UK The concept of , but it wasn't that engineers and mathematicians began to develop the first concrete ideas about how to make it a reality. mechanical translation (MT) has always been a distant dream that tickled the imagination of many inventors until the early 20th century In 1933, significant progress was made when two individuals, , a French-Armenian, and , a Russian, independently patented their ideas for mechanical translation systems. George Artsrouni Petr Smirnov-Troyanskii 1933 – George Artsrouni and Petr Smirnov-Troyanskii independently secure patents detailing the first proposals of systems for mechanical translations. George Artsrouni designed a storage device on paper tape which could be used to find the equivalent of any word in another language. Troyanskii proposed a three-stage approach, where humans would handle the initial and final stages of translation, with the machine serving as an intermediary. Troyanskii firmly believed that in the future, the entire translation process could be fully mechanized. 1937 – Artsrouni demonstrates his first prototype Troyanskii's ideas hold greater significance compared to those of Artsrouni, yet their impact remained largely confined within the borders of the USSR. The lack of international awareness about Troyanskii's work limited the broader recognition and influence his ideas could have had on a global scale. . By that time, the progress made in the field of mechanical translation was limited to the development of a program capable of performing dictionary-based lookup operations, emulating the tasks performed by human translators. It wasn't until 1947 that some occasional conversations started to happen about mechanical translation in the United States 1947 – First discussions of MT in the US Challenged by limited resources and a lack of formal support, . and could devote only their spare time from their normal university duties to this unexplored domain. At odd moments, they collaborated on creating a detailed description of a dictionary that could potentially be used in conjunction with computing machines. the United Kingdom faced difficulties in establishing itself in the field of mechanical translation Andrew Booth Richard Hook Richens 1947 – The start of Booth and Richens's collaboration on the dictionary The same year, in 1947, , having been exposed to computer design problems during the war and understanding the capabilities of modern electronic computers, envisioned the possibility of using computers for translation. He wrote to famous professor Norbert Wiener of MIT, expressing the idea of ( . He even speculated that the problem of translation could be approached as a cryptographic problem. Warren Weaver designing a computer for translation to address the significant communication challenges between people “for the constructive and peaceful future of the planet”) However, , in his response, . Despite Weaver’s attempt to persuade Wiener by suggesting that a computer could handle the vocabulary and combinations of words, the discussion did not lead to any concrete progress in the field of translation at that time. Professor Wiener expressed skepticism about the feasibility of mechanical translation due to the vague boundaries of words in different languages and the extensive emotional and international connotations attached to them . This event inspired a wave of research at the University of Washington, the University of California at Los Angeles, and the Massachusetts Institute of Technology. But in 1949, Warren Weaver went ahead and published “Translation,” a that brought the concept of mechanical translation to global attention memorandum 1949 – Warren Weaver presents his memorandum about Translation In terms of machinery, the first notable advancement took place in 1950. , in collaboration with International Business Machines (IBM), initiated the , giving birth to the Georgetown Machine – the world's first mechanical translation marvel. It symbolized a promising glimpse into a future where words could effortlessly cross language barriers. The events start to unfold much faster for MT from this moment. Leon Dostert Georgetown-IBM Experiment 1950 – The invention of the Georgetown Machine, the first machine for mechanical translation Around the same time, was appointed as the first full-time machine translation (MT) researcher at MIT. He became the one to organize focused entirely on machine translation, for which he published a of the present state of research on mechanical translation. This and the next year This placed the basis and defined the main directions of MT research for the following years. Yehoshua Bar-Hillel the 1st International conference 10-page overview IBM - MIT Memory Conference emphasized two essential needs: long-term basic research and the demonstration of MT in action. 1951 – Yehoshua Bar-Hillel is appointed as the first full-time researcher in the MT 1952 – The 1st International Conference on MT at MIT 1953 - IBM - MIT Memory Conference MT in action! Full action on MT was happening in the Soviet Union. By the spring of 1951, nearly fifty engineers had been working on the machine, and by the autumn of 1952, the : the First Computer of the S.A. Lebedev Institute of Precise Mechanics and Computer Engineering was in operation. At this moment, it was one of Europe's fastest electronic computers. It also was used as a prototype of the first Chinese computer built with the help of Soviet engineers. BESM-1 https://www.youtube.com/watch?v=U4i6al2TBIY&embedable=true BESM-1 had 1024 words of read–write memory and 1024 words of read-only memory. It also had external storage: four magnetic tape units of 30,000 words each and fast magnetic drum storage with a capacity of 5120 words and an access rate of 800 words/second. An incredible capability for that time! 1952 – USSR completes the creation of BESM-1 Four years after its invention, in 1954, the Georgetown Machine took center stage with a , showcasing its capabilities. A thoughtfully curated set of 49 Russian sentences was translated into English. Notably, this translation was accomplished using a highly limited vocabulary of only 250 words and a mere six grammar rules. public demonstration This demonstration generated widespread publicity and became one of the most influential instances in the history of machine translation. The experiment was a collaborative effort between two IBM staff members, , and , along with Leon Dostert and Paul Garvin from the Institute of Languages and Linguistics at Georgetown University. Cuthbert Hurd Peter Sheridan The captivating showcase of the Georgetown Machine's capabilities ignited a surge of optimism and excitement. What’s more important, providing a solid foundation for further advancements in the field of machine translation. the demonstration's success attracted significant funding and support, 1954 – The first public demonstration of an MT system using the Georgetown Machine Meanwhile, across the Atlantic, the Nuffield Foundation made a to Birkbeck College, University of London, that allowed it to take the project of MT translation on a full-time basis in Great Britain. generous grant 1955 – The  Nuffield  Foundation  grant  to  Birkbeck  College,  University  of  London In 1956, several impressive demonstrations of machine translation took place at Birkbeck College, utilizing the computing machine. APEXC (All Purpose Electronic (X) Computer) The same year, a modest, roughly 8-week workshop was organized by , , , and . Its name was . It was for this group that Artificial Intelligence was first named as a science. McCarthy Marvin Minsky Nathaniel Rochester Claude Shannon The Dartmouth Summer Research Project on Artificial Intelligence 1956 – Demonstration of MT experiments on APEXC in Great Britain 1956 – The Dartmouth Summer Research Project on Artificial Intelligence These years the enthusiasm around MT was high; the topic was so hot that research and development groups appeared everywhere. In 1957, (one of the original ten invitees to the Dartmouth workshop) published the first paper on machine learning, " ." Ray Solomonoff An Inductive Inference Machine 1957 – Ray Solomonoff publishes the first paper on machine learning, "An Inductive Inference Machine" In 1959, IBM showed some muscle and unveiled their masterpiece, the first member of the system family, named "Mark I." Automatic Language Translator It consisted of a 65,000-word dictionary and a custom tube-based computer to do the lookups. Texts were hand-copied onto punched cards using custom Cyrillic terminals and then input into the machine for translation. A system was installed for the US Air Force, which produced translations for many years. It was a custom computer that used a high-speed optical disk with 170,000 words and phrases to translate Russian documents into English. 1959 – IBM demonstrates the Automatic Language Translator “Mark 1” It was not the only MT system in use in the US. A group under Michael Zarechnak at Georgetown University proposed the method adopted and named . It was successfully demonstrated in 1961 and 1962. As a result, Russian-English systems were installed at Euratom in 1963 and at the Oak Ridge National Laboratory of the US Atomic Energy Commission in 1964. Georgetown Automatic Translation (GAT) 1962 – A group under Michael Zarechnak in Georgetown University proposes the method adopted and named Georgetown Automatic Translation (GAT) The disillusion and the fall of MT In the mid-1960s, t . In the many research groups that were established around the world, there came an understanding that MT is much more difficult than they had anticipated. The following years were a time of disillusionment. he US government started to question whether MT was financially reasonable and as effective as humans In addition, in 1964, the United States government formed the to evaluate the progress and potential of machine translation. Automatic Language Processing Advisory Committee (ALPAC) 1964 – The Automatic Language Processing Advisory Committee (ALPAC) is formed to assess the state of MT research The publication of the ALPAC in November . It led to significant reductions in funding and a shift toward more theoretical research in computational linguistics. The report's impact was substantial, even giving rise to discussions among researchers about the possibility of similar assessments. While the notoriety of ALPAC is well-known, the report's actual content is often forgotten or misunderstood. Titled "Languages and machines: computers in translation and linguistics," it addressed not only machine translation but also the broader field of computational linguistics, although in practice, NLP research was mostly focused on MT at that time. report 1966 crushed machine translation 1966 – The publication of the ALPAC report Some condemned the ALPAC report as narrow, biased, and shortsighted. However, its influence was profound. It brought a virtual end to MT research in the United States for many years, and MT was perceived as a complete failure. What happened in the next decade that allowed researchers to get into the “languages and machines” concept again? This pivotal period would witness a shift in focus towards practical applications of natural language processing, and the advent of statistical models would breathe new life into the dream of breaking down language barriers. But more on this with all the details – in the next episode! To be continued… If you liked this issue, to receive Oh, and please, share this article with your friends and colleagues. Because... subscribe the second episode of the LLMs history straight to your inbox. To fundamentally push the research frontier forward, one needs to thoroughly understand what has been attempted in history and why current models exist in present forms. Also published . here

This story contains new, firsthand information uncovered by the writer.

The is an opinion piece based on the author’s POV and does not necessarily reflect the views of HackerNoon.

FOD 37: Can We Genuinely Trust LLMs?

The Inflection AI Story: How a Secretive Unicorn Reached a $4 Billion Valuation

Subscribe to Turing Post newsletter for free

The History of LLMs - Part 1: The Era of Mechanical Translation and How It Crashed

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI Industries Converge: Llama 3 and Electric Atlas Have More In Common Than You Think

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

AI Industries Converge: Llama 3 and Electric Atlas Have More In Common Than You Think

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps