An introductory article to bring a preliminary cognizance on the broadening prospects of foundation models in the AI industry.
It's no doubt that the growth of the Artificial Intelligence industry in recent times has paved the way for many innovations and research that have shown the potential of unlocking the solutions for problems that we considered complex a couple of years ago. Although we have an understanding of the algorithmic computations and sophisticated methodologies for developing AI models, the availability of computational power and the training datasets for small-scaled startups were always interrupting as a key challenge for AI enthusiasts who can't easily access these high-cost resources. “Foundation models”- A word that is becoming popular in recent times among the techies is expected to become a boon in overcoming the above-mentioned challenges and providing state-of-the-art AI results for researchers and students mainly predictive modelling. In this article, I will try to take you through various aspects and prospects of the foundation models that will get you an idea of how this concept is going to become a biggie in the AI industry in the upcoming times.
To answer this question with proper justification, we need to first go back a little bit back in the timeline to understand some of the challenges faced in the AI model development a decade ago.
One of the primary challenges based on the researchers is the availability of large-scale datasets for solving any downstream tasks such as regression or classification. It's not a piece of new information that AI models are very data-hungry and it's the responsibility of researchers to ensure that the model is fed with diverse, extensive, and accurate data points for training. Any compromise that is done with respect to the quality of training data might introduce severe consequences in the model inferences.
Another important hurdle was nothing but computational power to train such complex models with billions of parameters. This process might take days or weeks even with highly sophisticated infrastructure that comes along with a high cost too.
Image by Gerd Altmann from Pixabay
Considering all these points on the table, came this idea:
“Why can't we train a complex AI model once for creating a generic representation of training data that is not specific to any downstream task, store it in a data store and later consume this trained model for solving multiple diversified problems by just tweaking the trained model?”
Yes, this curiosity surfaced the way for a new terminology - the “Foundation model”.
The term was coined for the first time in a white paper named “On the Opportunities and Risks of Foundation Models” published by some esteemed researchers at Stanford University. As per the standard definition written on this paper
“A foundation model is any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks.”
Some examples of such models are BERT, GPT-3, CLIP etc.
Let's consider a simple example. A model named “Alexnet” was developed for image recognition problems. A lot of trailblazing and sophisticated techniques such as Batch normalisation, Drop out, Relu activation etc were part of the optimization and development of this model.
This neural network model has eight layers with learnable parameters. The model consists of five layers with a combination of max pooling followed by 3 fully connected layers and they use Relu activation in each of these layers except the output layer. It is trained on a dataset called “Imagenet” that has almost 14 million images across a thousand classes. Later in a competition, it was proved that Alexnet can provide a state of the art outcomes for solving various problems. It's even difficult to replicate the training of such a big network model even if we are provided with enough resources.
If we have got a problem based on image processing, why can't we use a trained version of the Alex net model and tweak some of its parameters according to our downstream task?
The same concept is dealt with the usage and consumption of foundation models.
Let's deep dive into one more real-time example regarding natural language processing.
In NLP, there is an important concept called “Embedding”. It is a process used for converting raw text into mathematical vectors. This is because a machine learning model will not be able to directly consume an input in text format for the various internal computational processes.
The embedding process carried out by algorithms such as Word2vec, Glove, etc is called word embedding or static embedding.
Image by Gerd Altmann from Pixabay
Here, a large text corpus containing a lot of words is passed inside a model for the training process. The model will assign a corresponding mathematical value to each word by assuming that the words which are appearing close to each other more frequently are similar. After this process, the derived mathematical values are used for further calculations.
For example,
Consider that our text corpus had 3 sentences as mentioned here-
Here, we can see that the words “King” and “Queen” appear frequently. Hence, the model will assume that there could be some similarities among these words. When these words are transformed into mathematical values, they'll be placed at a small distance when represented in a multidimensional space.
Image source: Illustrated by the author
Imagine there is another word “Road” then logically it won't be appearing more frequently with “King” and “Queen” in a large text corpus. Hence, that word will be placed far apart in the space.
Image source: Illustrated by the author
We have a lot of downstream tasks in NLP such as-
For each of the above models, the first step would be to clean a large text corpus and represent the data in the form of vectors (i.e. embedding) as per the example we discussed in the above paragraph.
Then, why can’t we build a foundation model that already represents the English language in the form of vectors with the help of a large text corpus and later incorporate some fine-tuning as per the downstream task we are going to implement for our project?
This foundation model can be accessed by a lot of NLP researchers worldwide and it will make their lives easy.
In order to accept an AI model as a foundation model, we expect it to possess some key properties like-
Highly sophisticated, validated and scalable architecture
A foundation model, in most cases, will be a deep neural network. The architecture of this model should be able to withstand most of the expected challenges that might be faced in the future with respect to the data, agility to the deployment platforms, finetuning provision for the researchers etc.
Image by Ahmed Gad from Pixabay
The implementation of complex mathematical computations such as Batch normalisation, Drop out, Activation functions etc. should be free of any errors or miscalculations since this model is going to be a base for multiple projects and applications in the future. Especially, in the field of predictive modelling, any minute misestimation of the metrics might lead to Type 1 or Type 2 errors which can lead to severe consequences in the business.
Trained with a broad and diversified data
The developers who implement foundation models have the responsibility of ensuring that accurate, diversified and generic data points are selected for the training purpose.
Don’t forget the saying - “Garbage In- Garbage Out :)”
Even if the collection and cleaning of the data points are done with the guidance of proper statistical treatment, sometimes researchers forget to maintain the generic nature of the entire data distribution.
Image by Gordon Johnson from Pixabay
In the case of NLP data we saw in one of the above paragraphs, it should be ensured that the text corpus used for the training process should not have any biases with respect to the topics. Let’s say most of the texts used for training the foundation model are based on “Politics” or “Sports”, the model might not yield good results if the downstream task is to analyse the sentiment of a movie review which comes under the category of “entertainment”. Hence, the dataset should not have any bias toward any section of the entire population and the sample used for the training should have sufficient representatives for all sections of the data distribution and be adequate for recognising the hidden patterns in space.
Adaptable for a variety of downstream tasks
As mentioned earlier, the foundation models’ architecture should be designed in such a way that with the help of minimum finetuning, the researchers should be able to adapt and reuse the pre-trained version for as many traditional use cases as possible.
Image by Mohamed Hassan from Pixabay
In the case of representing the English language in space, we can adapt it for multiple use cases such as-
In the case of image processing, the researchers should be able to adapt the model for a wide variety of use cases such as-
Any sort of technological advancement is a natural evolution in the world of computer science. Also, each progress in advancement brings both opportunities and challenges when it gets exposed to real-world implementations. Let's look at some of the pros and cons of consuming foundation models in AI projects.
We can start with the positive aspects and advantages-
Reduction in the development time of real-time AI projects
The most time-consuming parts of predictive modelling use cases are data treatment and model training. The foundation model would have already completed a major chunk of this task before we start the actual code-level prototyping of our ideated solution. The major task of the researcher who consumes a foundation model will start from the finetuning which ideally saves a lot of time for the development of AI projects.
Image by Mohamed Hassan from Pixabay
Reduction of Infrastructure cost
As we already discussed, developing standard AI models from the scratch needs a lot of financial investment for the infrastructure including processing units, data storage, development platforms etc. A ready-made pre-trained model can considerably reduce many initial stages of AI development.
Image by OpenClipart-Vectors from Pixabay
Possibility of achieving state-of-the-art AI results
Most of the foundation models that become popular would have achieved state-of-the-art results as their AI outputs. This is because there are tons of brainstorming, researching, designing and architecture that have already happened for building such solid foundation models. Replicating high-standard outputs is not easy for someone who develops an AI model from the scratch. Consumption of foundation models is a shortcut for achieving highly accurate outcomes that is humanly possible.
Image by Mohamed Hassan from Pixabay
On the cons side, we can list down some of the challenges such as-
Bias propagation
Bias propagation is an important challenge faced while consuming the foundation models. This happens when the sample used for training the models is not properly generic and unfit for learning important patterns. This mistake might introduce critical consequences when any downstream tasks are incorporated later with the foundation models. Any bias or fault while learning the patterns in the sample data will be propagated to all the downstream tasks later implemented with this base model.
Image by OpenClipart-Vectors from Pixabay
Black box source code of the foundation model
Since the companies who implement and publish foundation models spend tons of money on development purposes, they don't allow third parties to go through the source code of the model.
Image by Mohamed Hassan from Pixabay
Generative Pre-trained Transformer 3 is a standard example for this scenario. It is an autoregressive language model that uses deep learning to produce human-like text that has given highly accurate results in the past. But, it is not possible the explore the source code or parameters inside this architecture for a better understanding of the solution.
These kinds of black boxing can sometimes act as a hindrance to the researchers who consume the foundation model to introspect and hyper-tune the existing architecture more conveniently according to their task-specific objective.
I hope you got a prefatory perception of the concepts behind the “Foundation models” through this article. If you are an AI enthusiast or working professional in the domain of AI, this abstract might help you to think about introducing the implementation of foundation models in your real-time projects that have the potential to exponentially increase the success rate of models. With the rapid growth of Auto ML platforms and cloud-based model training methodologies, there is doubt that foundation models will be a vital part of every AI company in the future. However, considering its cons side also, we should be able to find ways to grasp the advantages of this system for the benefit of the AIML use cases.