History of Using AI in Education

Author: (1) Mohammad AL-Smad, Qatar University, Qatar and (e-mail: malsmadi@qu.edu.qa). Table of Links Abstract and Introduction History of Using AI in Education Research Methodology Literature Review Summary Conclusion and References 2. History of Using AI in Education The history of using AI in education dates back to the 1960’s, with the development of early intelligent tutoring systems. These systems were designed to provide personalized instruction to students, tailored to their individual needs and learning styles. However, before we delve into the evolution of using generative AI in education, we need to understand the history and evolution of generative AI models. 2.1. The History and Evolution of Generative AI Models Generative Artificial Intelligence (AI) models, particularly Language Models (LLMs), have witnessed remarkable progress over the years, transforming the landscape of natural language processing and a wide array of other creative tasks (Susarla et al., 2023). In this section, we delve into the historical roots and evolutionary trajectory of these models, highlighting key milestones that have shaped their development. • Early Days of Language Modeling: LLMs development history started in the 1950’s and 1960’s with the emergent of statistical Natural Language Processing (NLP). In its infancy, language models primarily employed statistical methodologies to estimate the likelihood of a given word or word sequence within a linguistic context. N-grams and sequences of n words were fundamental techniques during this period (Russell & Norvig, 2010). • From N-grams to Word Embeddings: A pivotal shift from n-gram-based models to the use of word embeddings began to emerge in the mid-2000’s with the introduction of the ”Word2Vec” algorithm by (Mikolov et al., 2013) in 2013. This innovative approach hinged on the utilization of vector representations to capture the semantic meaning of words. This breakthrough laid the groundwork for subsequent developments in language modeling. • Advancements in Text-based deep learning models (i.e. Sequence-to-Sequence NLP): The integration of word embeddings into language modeling ushered in a new era. These vector representations served as input to deep learning models such as recurrent neural networks (RNNs) and, later, the encoder-decoder architecture. This shift had a profound impact on NLP research, including text summarization and machine translation, as demonstrated by (Sutskever et al., 2014). The ability to capture semantic context through vector representations significantly enhanced the quality and depth of generated content. • The Transformer Architecture Revolution: The introduction of Transformer architecture by (Vaswani et al., 2017) in 2017 is considered as a turning point in the advancement of NLP and computer vision research and in particular in language modeling research. The transformer architecture represented a paradigm shift in NLP by introducing a self-attention mechanism. Several deep learning models have been developed based on the transformer architecture such as BERT (Devlin et al., 2018). This innovation enabled the model to capture long-range dependencies within sequences, improving the coherence and contextuality of generated content. The Transformer architecture laid the foundation for the subsequent development of LLMs. • The Emergence of LLMs: In recent years, the field of AI witnessed the proliferation of Large Language Models (LLMs). These models which are also known by the term ”foundation models” are trained on vast and diverse datasets encompassing books, news articles, web pages, and social media posts and tuned with billions of hyperparameters (Bommasani et al., 2021). This unprecedented scale of data, coupled with advancements in model architecture and training techniques, marked a significant turning point. These foundation models exhibit an extraordinary adaptability to a wide range of tasks, including tasks for which they were not originally trained. ChatGPT stands as an exemplary case of a generative AI model in action. This remarkable AI system was launched in November 2022 and is fine-tuned from the generative pre-trained transformer GPT-3.5, which was originally trained on a large dataset of text and code sources (Neelakantan et al., 2022). ChatGPT harnesses the power of Reinforcement Learning from Human Feedback (RLHF), a technique that has shown immense promise in aligning Large Language Models (LLMs) with human intent (Christiano et al., 2017). The astonishingly superior performance of ChatGPT underscores the potential for a paradigm shift in the training of generative AI models. This shift involves the adoption of instruction aligning techniques, such as reinforcement learning (Christiano et al., 2017), prompt engineering (Brown et al., 2020), and chain-of-thought (CoT) prompts (Wei et al., 2022), as a collective step toward the realization of building an ecosystem of intelligent services based on generative AI models. The culmination of these advancements has led to generative AI models that possess a remarkable capacity to comprehend and generate media-rich realistic and proper content (including text, images, audio, and video). Such capabilities have enabled these models to be utilized and widely adopted in different applications such as education. Despite these advancements, concerns and challenges have arisen in the generative AI landscape (Susarla et al., 2023). The ease with which models like ChatGPT can be adapted to new tasks raises questions about the depth of their understanding. Experts in AI fairness have warned against the potential for these models to perpetuate societal biases encoded in their training data (Glaser, 2023), labeling them as ”stochastic parrots” (Bender et al., 2021). 2.2. Evolution of Using Generative AI in Education Using AI in Education is not new, the first attempts of using AI in Educations can be tracked back to the early 1960s, when researchers at the University of Illinois at Urbana-Champaign developed an intelligent tutoring system (ITS) called PLATO (Programmed Logic for Automatic Teaching Operations) (Bitzer et al., 1961). PLATO was the first computer system that enabled students with graphical user interfaces to interact with educational materials that were developed and adapted using AI to their needs. Another example on early attempts of using AI in Education is the ”Automatic Grader” system that was developed in the 1960’s to automatically grade programming classes (Hollingsworth, 1960). The advent of personal computers has increased the developments of ITSs during the 1970’s, an example of a system that was developed in that period is TICCIT (Time-shared, Interactive Computer-Controlled Instructional Television) (Stetten, 1971). TICCIT was another early ITS that was developed in the early 1970’s at the University of Pittsburgh. TICCIT was an early attempt to deliver individualized multi-media based content in mass to users at homes and schools. The advancements in the developments of ITSs in 1960’s and 1970’s was backed up with learning theories and principles that value the one-to-one individualized tutoring of students at classrooms (See for example the work of B.F. Skinner’s pioneering work on ”programmed instruction movement” and Benjamin Bloom’s work on ”mastery learning”(Block & Burns, 1976). The developed ITSs during that period were mainly rule-based systems. Advancements in AI and the advent of micro-computers in the 1970’s have influenced the way ITSs were trained and developed (Reiser, 2001a). since the 1980’s, the use of computer-based instruction and AI-based education in particular has evolved to automate several instructional activities (Reiser, 2001b). The arrival of the world-wide-web (WWW) in the 1990’s has had a major shift in the delivery medium of intelligent educational services Chen et al. (2020). ITSs have evolved to deliver intelligent, adaptive, and personalized learning services underpinned by machine learning models. Despite of these advancements in the way ITSs were developed and delivered to the users, their capabilities were limited to the delivery of individualized instruction and learning. The evolution of the WWW to the so called ”Web 2.0” and the additional capabilities of collaborative and social based interaction has paved the way to a new era in the development of ITSs. Collected data based on users’ interaction with the Web 2.0 services, and the ability of training software agents on these data using different machine learning algorithms has led to having more advancements in the application of learning analytics to adapt and personalized learning (Clow, 2013). The 21st century has witnessed several breakthroughs in using AI in education. These breakthroughs were backed up by advancements in: (i) Hardware capabilities and performance (Nickolls & Dally, 2010), (ii) Big data mining (Wu et al., 2013), and (iii) AI models and architectures (i.e. the advent of deep learning models) (LeCun et al., 2015). The advent of the Transformer deep learning architecture in 2017 (Vaswani et al., 2017), is considered to be a turning point in the history of developing intelligent software in general (See Section 2.1). Many intelligent models such as generative pre-trained transformers (GPT) has started to appear right after (Radford et al., 2018). In November 2022, OpenAI has released ChatGPT - which is based on GPT 3.5 architecture - and reached over 100 million users in just a few months. Since then, and today generative AI-based educational tools are developed to provide students with personalized instruction, adaptive learning, and engaging learning experiences (See Section 4.2). This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. Author: (1) Mohammad AL-Smad, Qatar University, Qatar and (e-mail: malsmadi@qu.edu.qa). Author: Author: (1) Mohammad AL-Smad, Qatar University, Qatar and (e-mail: malsmadi@qu.edu.qa). Table of Links Abstract and Introduction Abstract and Introduction History of Using AI in Education History of Using AI in Education Research Methodology Research Methodology Literature Review Literature Review Summary Summary Conclusion and References Conclusion and References 2. History of Using AI in Education The history of using AI in education dates back to the 1960’s, with the development of early intelligent tutoring systems. These systems were designed to provide personalized instruction to students, tailored to their individual needs and learning styles. However, before we delve into the evolution of using generative AI in education, we need to understand the history and evolution of generative AI models. 2.1. The History and Evolution of Generative AI Models 2.1. The History and Evolution of Generative AI Models Generative Artificial Intelligence (AI) models, particularly Language Models (LLMs), have witnessed remarkable progress over the years, transforming the landscape of natural language processing and a wide array of other creative tasks (Susarla et al., 2023). In this section, we delve into the historical roots and evolutionary trajectory of these models, highlighting key milestones that have shaped their development. • Early Days of Language Modeling: LLMs development history started in the 1950’s and 1960’s with the emergent of statistical Natural Language Processing (NLP). In its infancy, language models primarily employed statistical methodologies to estimate the likelihood of a given word or word sequence within a linguistic context. N-grams and sequences of n words were fundamental techniques during this period (Russell & Norvig, 2010). • Early Days of Language Modeling: • From N-grams to Word Embeddings: A pivotal shift from n-gram-based models to the use of word embeddings began to emerge in the mid-2000’s with the introduction of the ”Word2Vec” algorithm by (Mikolov et al., 2013) in 2013. This innovative approach hinged on the utilization of vector representations to capture the semantic meaning of words. This breakthrough laid the groundwork for subsequent developments in language modeling. • From N-grams to Word Embeddings: • Advancements in Text-based deep learning models (i.e. Sequence-to-Sequence NLP): The integration of word embeddings into language modeling ushered in a new era. These vector representations served as input to deep learning models such as recurrent neural networks (RNNs) and, later, the encoder-decoder architecture. This shift had a profound impact on NLP research, including text summarization and machine translation, as demonstrated by (Sutskever et al., 2014). The ability to capture semantic context through vector representations significantly enhanced the quality and depth of generated content. • Advancements in Text-based deep learning models (i.e. Sequence-to-Sequence NLP): • The Transformer Architecture Revolution: The introduction of Transformer architecture by (Vaswani et al., 2017) in 2017 is considered as a turning point in the advancement of NLP and computer vision research and in particular in language modeling research. The transformer architecture represented a paradigm shift in NLP by introducing a self-attention mechanism. Several deep learning models have been developed based on the transformer architecture such as BERT (Devlin et al., 2018). This innovation enabled the model to capture long-range dependencies within sequences, improving the coherence and contextuality of generated content. The Transformer architecture laid the foundation for the subsequent development of LLMs. • The Transformer Architecture Revolution: • The Emergence of LLMs: In recent years, the field of AI witnessed the proliferation of Large Language Models (LLMs). These models which are also known by the term ”foundation models” are trained on vast and diverse datasets encompassing books, news articles, web pages, and social media posts and tuned with billions of hyperparameters (Bommasani et al., 2021). This unprecedented scale of data, coupled with advancements in model architecture and training techniques, marked a significant turning point. These foundation models exhibit an extraordinary adaptability to a wide range of tasks, including tasks for which they were not originally trained. ChatGPT stands as an exemplary case of a generative AI model in action. This remarkable AI system was launched in November 2022 and is fine-tuned from the generative pre-trained transformer GPT-3.5, which was originally trained on a large dataset of text and code sources (Neelakantan et al., 2022). ChatGPT harnesses the power of Reinforcement Learning from Human Feedback (RLHF), a technique that has shown immense promise in aligning Large Language Models (LLMs) with human intent (Christiano et al., 2017). The astonishingly superior performance of ChatGPT underscores the potential for a paradigm shift in the training of generative AI models. This shift involves the adoption of instruction aligning techniques, such as reinforcement learning (Christiano et al., 2017), prompt engineering (Brown et al., 2020), and chain-of-thought (CoT) prompts (Wei et al., 2022), as a collective step toward the realization of building an ecosystem of intelligent services based on generative AI models. • The Emergence of LLMs: The culmination of these advancements has led to generative AI models that possess a remarkable capacity to comprehend and generate media-rich realistic and proper content (including text, images, audio, and video). Such capabilities have enabled these models to be utilized and widely adopted in different applications such as education. Despite these advancements, concerns and challenges have arisen in the generative AI landscape (Susarla et al., 2023). The ease with which models like ChatGPT can be adapted to new tasks raises questions about the depth of their understanding. Experts in AI fairness have warned against the potential for these models to perpetuate societal biases encoded in their training data (Glaser, 2023), labeling them as ”stochastic parrots” (Bender et al., 2021). 2.2. Evolution of Using Generative AI in Education 2.2. Evolution of Using Generative AI in Education Using AI in Education is not new, the first attempts of using AI in Educations can be tracked back to the early 1960s, when researchers at the University of Illinois at Urbana-Champaign developed an intelligent tutoring system (ITS) called PLATO (Programmed Logic for Automatic Teaching Operations) (Bitzer et al., 1961). PLATO was the first computer system that enabled students with graphical user interfaces to interact with educational materials that were developed and adapted using AI to their needs. Another example on early attempts of using AI in Education is the ”Automatic Grader” system that was developed in the 1960’s to automatically grade programming classes (Hollingsworth, 1960). The advent of personal computers has increased the developments of ITSs during the 1970’s, an example of a system that was developed in that period is TICCIT (Time-shared, Interactive Computer-Controlled Instructional Television) (Stetten, 1971). TICCIT was another early ITS that was developed in the early 1970’s at the University of Pittsburgh. TICCIT was an early attempt to deliver individualized multi-media based content in mass to users at homes and schools. The advancements in the developments of ITSs in 1960’s and 1970’s was backed up with learning theories and principles that value the one-to-one individualized tutoring of students at classrooms (See for example the work of B.F. Skinner’s pioneering work on ”programmed instruction movement” and Benjamin Bloom’s work on ”mastery learning”(Block & Burns, 1976). The developed ITSs during that period were mainly rule-based systems. Advancements in AI and the advent of micro-computers in the 1970’s have influenced the way ITSs were trained and developed (Reiser, 2001a). since the 1980’s, the use of computer-based instruction and AI-based education in particular has evolved to automate several instructional activities (Reiser, 2001b). The arrival of the world-wide-web (WWW) in the 1990’s has had a major shift in the delivery medium of intelligent educational services Chen et al. (2020). ITSs have evolved to deliver intelligent, adaptive, and personalized learning services underpinned by machine learning models. Despite of these advancements in the way ITSs were developed and delivered to the users, their capabilities were limited to the delivery of individualized instruction and learning. The evolution of the WWW to the so called ”Web 2.0” and the additional capabilities of collaborative and social based interaction has paved the way to a new era in the development of ITSs. Collected data based on users’ interaction with the Web 2.0 services, and the ability of training software agents on these data using different machine learning algorithms has led to having more advancements in the application of learning analytics to adapt and personalized learning (Clow, 2013). The 21st century has witnessed several breakthroughs in using AI in education. These breakthroughs were backed up by advancements in: (i) Hardware capabilities and performance (Nickolls & Dally, 2010), (ii) Big data mining (Wu et al., 2013), and (iii) AI models and architectures (i.e. the advent of deep learning models) (LeCun et al., 2015). The advent of the Transformer deep learning architecture in 2017 (Vaswani et al., 2017), is considered to be a turning point in the history of developing intelligent software in general (See Section 2.1). Many intelligent models such as generative pre-trained transformers (GPT) has started to appear right after (Radford et al., 2018). In November 2022, OpenAI has released ChatGPT - which is based on GPT 3.5 architecture - and reached over 100 million users in just a few months. Since then, and today generative AI-based educational tools are developed to provide students with personalized instruction, adaptive learning, and engaging learning experiences (See Section 4.2). This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license. available on arxiv