**If you can do IT, we have exciting jobs for you.!**

2,103 reads

by Sanjay KumarDecember 15th, 2022

A prelude article elucidating the fundamental principles and differences between “Model-based” & “Instance-based” learning in the branches of Artificial Intelligence & Machine learning.

- Introduction
- The instinctive idea behind "Generalization" and "Memorization"
- The concept behind "Model-based" learning
- The concept behind "Instance-based" learning
- Summary
- References

“Instance-based” and “Model-based” are 2 different types of learning approaches used by various machine learning algorithms to perform their task.

We know that the final objective of any predictive model is to learn the hidden patterns inside the data and predict the values with a reasonable accuracy based on its learned knowledge. There are 2 different approaches used by algorithms to learn about the data-

- Generalization
- Memorization

Let's go through a simple story before moving to the mathematical concepts. John and Joseph are best friends who always score good marks in the examinations. There was another student in their school named Kevin. Since Kevin is a little bit poor studies, he requested both of them to help him with his studying, so that he also could also score good marks in the examination. Both John and Joseph agreed that they will teach him the subjects.

On the first day, Kevin went to John’s house to learn mathematics. John explained all the in-depth concepts to Kevin and taught him about various scenarios and approaches for solving different kinds of problems. He also trained Kevin to solve many sample problems and made him understand topics and questions with similar contents and weightage in the examination. Kevin felt very confident and happy. Also, he thanked John and left his house.

*Photo by **Tra Nguyen*

On the second day, Kevin went to Joseph’s house to learn science. Joseph asked him whether he want to understand all the in-depth concepts and theories on the subject or if he want just the list of questions that will appear on the question paper because by memorizing all the important questions, it is possible to score good marks even without understanding the concept behind each answer. Kevin was intrinsically a lazy boy. So, he said that he doesn't want to put effort into learning the concepts and he just needs the list of important questions so that he can memorize those answers. Joseph gave a list of 50 important questions and answers and asked to memorize the entire content.

*Photo by **Dmitry Ratushny** *

Finally, the exam days came. The first examination was mathematics. The question paper had a lot of tough questions but Kevin had a good conceptual understanding that he learned from John. He solved almost all the problems and was confident in getting 90% marks.

The second examination was science. When Kevin received the question paper, he got surprised because the majority of the problems were from the list of questions and answers that he memorized already. He recollected all the answers and wrote them neatly. Hence, in science also, he was very confident about getting 90% marks. Even though he didn't do anything conceptually, he wrote all the things that he memorized and achieved his objective.

*Photo by **Green Chameleon** on **Unsplash*

The learning pattern followed for mathematics is called **"Generalisation" **and the learning pattern followed for science is called **"Memorisation"**. Hope you liked the story. Now we can move to the machine learning explanation.

In Generalisation, models always try to learn about the intrinsic pattern, behavior, and overall concept of the problem.

For example,

We all know the formula for* "Linear regression"*. It is represented as-

Y = m1x1 + m2x2 +... mnxn + c

Where,

- Y = dependent variable
- x1,x2 ..xn are independent variables
- m1, m2 ...mn are the slopes of the corresponding independent variables.
- c is the intercept

Let's assume that we developed a linear regression model that can predict the weight of a person based on his/her age, height, and parent's height. The mathematical representation of the model will be as follows-

Weight = 0.3*(Height) + 0.2*(Age) + 0.4*(Father's height) + 0.1*(Mother's height) + 2

Here, 0.3, 0.2, 0.4 and 0.1 are the values for slopes that we derived after an extensive hyperparameter tuning process. Similarly, 2 is the value of the intercept for the regression plane.

The visual representation will somewhat look like this-

*Image Illustrated by the author*

Here, each feature will be a dimension and the data points will be projected in this multidimensional space. After this process, we will be deriving a regression plane that can pass through these dimensions. The predicted value (Weight) for a particular “Height”, “Age”, "Father's Height" and "Mother's height" is nothing but the value of this regression plane corresponding to the axes coordinates of the feature dimensions.

In another aspect, this model tried to understand the linear relationship between the variables like age, height, etc., and derived an imaginary hyperplane that can approximately indicate a predicted value based on many natural design formations in this space like Linearity, Homoscedasticity, Autocorrelation, etc.

The model will try to make the hyperplane in a generalized way such that the overall error in prediction will be low i.e. the distance between data points and the regression plane will be low as possible. It was able to derive this generalized hyperplane because of the learning it did about the data to find various patterns in the space as a part of the ML training activity.

Let's go through one more example with another algorithm named *“Support vector machine”*.

The support vector machine is a supervised machine learning algorithm that is popularly used for predicting the category of labeled data points.

For example-

- Predicting whether a person is male or female
- Predicting whether the fruit is an apple or orange
- Predicting whether a student will pass or fail the exams etc.

SVM uses an imaginary plane that can travel across multiple dimensions for its prediction purpose. These imaginary planes which can travel through multiple dimensions are called hyperplanes. It is very difficult to imagine higher dimensions using human brains since our brain is naturally capable to visualize only up to 3 dimensions.

Let’s take a simple example to understand this scenario.

We have a classification problem to predict whether a student will pass or fail the examination. We have the following features as independent variables-

- Marks in internal exams
- Marks in projects
- Attendance percentage

So, these 3 independent variables become 3 dimensions of a space like this-

*Image Illustrated by the author*

Let’s consider that our data points look like this where-

- The green color represents the students who passed the examination
- The red color represents the students who failed the examination

*Image illustrated by the author*

Now, SVM will create a hyperplane that travels through these 3 dimensions in order to differentiate the failed and passed students-

*Image Illustrated by the author*

So, technically now the model understands that every data points which falls on one side of the hyperplane belong to the students who passed the exams and vice versa. As we saw in linear regression, the SVM hyperplane is also created as a final result of complex hyperparameter tuning and the learning done by the ML model as a part of its training activity.

Do you find any similarity in the learning approach of the 2 above-mentioned algorithms?

Both of them tried to learn about the nature of the entire space, hidden patterns among the data points, and various optimization techniques to minimize the errors thereby deriving a generalized mathematical function to solve the problem. This approach is called **"Model-based learning"**.

*The learning approach of the models that follow the generalization procedure for prediction purposes is called Model-based learning. *

Now let's come to another example where we need to implement the *"K nearest neighbor"* algorithm.

We can consider the same scenario that we assumed for the SVM example. Here also, we need to predict whether a student will pass or fail the examination. Our data looks like this-

*Image illustrated by the author*

Now, as per the KNN algorithm, we should decide on a value for *“K” (the number of neighbors)* and note the class of the ‘K’ closest neighbors for each of the unlabelled data points. The predicted value for the unlabelled data point will be the class that has majority participation among the “K’” nearest neighbors.

Assume that we assigned the value of K =3. Also, data points “a”, “b”, and “c” are unlabelled data points for which we need to predict the class using this model.

- For data point “a”, all of the 3 neighbors are “red”. Hence we can predict that this student will probably fail the examination.
- For data point “b”, 2 of the 3 neighbors are “red” and 1 neighbor is “green”. The majority of the “K” nearest neighbors are belonging to the “fail” class. Hence we can predict that this student will probably fail the examination. If at least 2 out of 3 neighbors were “green”, we would have predicted that this student will pass the examination since the majority will be supporting the “pass” class in that case.
- For data point “c”, all of the 3 neighbors are “green”. Hence we can predict that this student will probably pass the examination.

*Image Illustrated by the author*

Did you observe any significant difference between the working procedure of KNN and the other 2 algorithms mentioned earlier?

Actually, KNN didn't undergo any training process. It didn't learn about the patterns among the data points or mathematical assumptions about the space or it even didn't try to derive any mathematical function to map the independent variables and the dependent variable. The only variable that a researcher needs to carefully optimize is the value of “K”. It is just memorizing the procedure of picking the majority class among its neighbors and claiming it as the predicted value. It does not use any generalization technique as a part of any mathematical function. Instead, just memorize the principle of voting and repeat that task for every unlabelled data point. This process is called **"Memorization"**.

*The learning approach of the models that follow the memorization procedure for prediction purposes is called Instance-based learning.** *

- Model-based learning focuses on the process of discovering the hidden patterns among the data points thereby optimizing the parameters through
**training of the entire dataset**. Instance-based learning doesn't train the entire dataset. Instead, it only does the prediction for a particular unlabelled data point by following some simple rules that are configured by the researcher.

- In model-based learning, we can
**remove the training data**from the system since the model has already learned all the patterns from that dataset. However, in instance-based learning, the training data should be kept as it is since the model uses the labels of the full or part of the training samples for prediction purposes.

- In model-based learning, the prediction will be a
**fast process**. However, in instance-based learning, the prediction will be comparatively slow because it hasn't got any mathematical function to quickly pass the input values and derive the output. Instead, it needs to spend some time in comparison and rule-based decision-making for each of the unlabelled data points by comparing it with various training samples. In other words, these models delay processing until a new instance must be classified. Because of this reason, they are also called lazy learners.

- Instance-based learners can be easily fooled by providing
**irrelevant features**. However, In model-based learning, models will come to know about the importance of various features since they are going through various optimization techniques.

- Instance-based learners are good at
**handling noisy data**and they don't lose any information. However, model-based learners cannot manage noisy data points well. The outliers and anomalies are usually eliminated from the dataset in the pre modelling stage to counter this challenge. But it is to be noted that, the elimination of the outliers can result in the loss of some information regarding the overall characteristics of the dataset that might affect the predictive ability of the model.

*Walter Daelemans**;**Antal van den Bosch**(2005). Memory-Based Language Processing. Cambridge University Press.**Russell, Stuart J.**;**Norvig, Peter**(2003),**Artificial Intelligence: A Modern Approach**(2nd ed.), Upper Saddle River, New Jersey: Prentice Hall, p. 260,**ISBN**0-13-790395-2**D. Randall Wilson; Tony R. Martinez (2000). "Reduction techniques for instance-based learning algorithms".**Machine Learning**.**Newton S. Lee (1990). "A computational paradigm that integrates rule-based and model-based reasoning in expert systems". International Journal of Intelligent Systems. Wiley. 5 (2): 135–151.**doi**:**10.1002/int.4550050202**.*

L O A D I N G

. . . comments & more!

. . . comments & more!