This is an introductory article explaining the basic intuition, mathematical idea & scope of radial basis functions in the development of predictive machine learning models.
In machine learning, problem-solving based on hyperplane-based algorithms heavily depends upon the distribution of the data points in the space. However, it is a known fact that real-world data rarely follows theoretical assumptions.
There are a lot of transformation functions that can convert the natural shape of the data points into theoretically recommended distributions persevering the hidden patterns of the data. Radial Basis is one such renowned function which is discussed in a lot of machine learning textbooks. In this article, we will learn about basic intuition, types and usage of the Radial basis function.
The radial basis function is a mathematical function that takes a real-valued input and outputs a real-valued output based on the distance between the input value projected in space from an imaginary fixed point placed elsewhere.
This function is popularly used in many machine learning and deep learning algorithms such as Support Vector Machines, Artificial Neural Networks, etc.
Let us understand the concept and the usage of this mathematical function.
In real-time, whenever we solve complex machine learning problems using algorithms such as SVM, we need to project all of our data points in an imaginary multidimensional space where each feature will be a dimension.
Let's assume we have a classification problem to predict whether a student will pass or fail the examination.
We have the following features as independent variables:
So, these 3 independent variables become 3 dimensions of a space like this-
Let’s consider that our data points look like this where-
The green colour represents the students who passed the examination
The red colour represents the students who failed the examination
Now, SVM will create a hyperplane that travels through these 3 dimensions in order to differentiate the failed and passed students-
So, technically now the model understands that every data points which falls on one side of the hyperplane belong to the students who passed the exams and vice versa.
In our example, it was easy to create the hyperplane because a linear and straight hyperplane was enough to discriminate the 2 categories. But in real-time complex projects, these relations may get violated in many scenarios. Especially when you have hundreds of independent variables, there is no possibility of getting a linear relationship between data points such that it will be difficult to create an optimal hyperplane.
In such scenarios, researchers usually apply the Radial basis function to each of the data points so that they will be able to pass a linear hyperplane across the data points to easily solve the problem.
Consider that our data points are looking like this in the space-
It is clear that we cannot use a linear hyperplane such that it can group the data points according to their classes.
RBF will help us in these kinds of scenarios.
Some researchers will usually project these data points in much higher dimensions so that the distance between the data points will be increasing so that they can apply some function (RBF or any other function) to build a hyperplane. But it is not necessary to build high dimensions since it is always the decision of the statistician/researcher who understands the patterns in the data.
Next, we have to mark an imaginary point in the space like this wherever we need.
After that, we need to draw some concentric circles based on this imaginary point.
The distance between the centre and any data point positioned in the boundary of the circle is called the radius.
After calculating the radius, we need to pass this value inside a mathematical function (RBF) that will return a real value. The returned value will be the transformed magnitude of a particular data point used for further proceedings.
There are multiple types of radial basis functions. Each of them will transform the input value in a different way. Some of them are-
Where,
The function will look like this with respect to time,
Where,
Where,
I will explain intuitively what these functions will do intuitively in the space. There are 2 different processes that are done by these functions-
The process of expansion will visually somewhat look like this-
The process of compression will visually somewhat look like this-
After the expansion and compression, the data points would have been transformed like this-
Now, we can easily construct a linear hyperplane that can classify the data points like this-
Sometimes, RBF is also used along with artificial neural networks with one hidden layer. In such types of networks, RBF will be used as activation functions in the hidden layers. Apart from the hidden layer, there will be an input layer that contains several neurons where each one of them represents a feature variable and the output layer will be having a weighted sum of outputs from the hidden layer to form the network outputs.
Such networks are called RBF networks.
In this article, we discussed one of the most useful transformation functions in machine learning. I have tried to explain this complicated concept without many in-depth mathematical calculations in a lucid manner targeting beginners in the AIML learning space.
This function is available as an inbuilt library in most data science-oriented programming languages such as Python or R. Hence, it is easy to implement this once you understand the theoretical intuition. I have added the links to some of the advanced materials in the references section where you can deep dive into the complex calculations if you are interested.