In simple terms, transfer learning is a machine learning approach where a model that is already trained on a specific data set and developed for a specific task is reused as the starting point for training on a different data set for a different task.
Transfer learning is a popular approach for model training, and pre-trained computer vision and natural language processing models are used commonly as a starting point for specific user applications. At Modzy, our models are designed to work out of the box for your specific applications, but we also provide the option for retraining via CPU, utilizing transfer learning and domain adaptation so that our models are even more tailored to your specific applications.
Transfer learning works surprisingly well in a plethora of applications focused around computer vision and natural language processing ([1]). Transfer learning and domain adaptation refer to situations where what was learned in one setting is exploited to improve generalization in another setting ([2]).
There is always a distribution mismatch between the source and target data distribution, but the reason for the mismatch differs from one setting to another. Transfer learning and domain adaptation approaches are designed to address this issue.
The problem to solve here is a challenging one because one of the core assumptions in the science of machine learning is that the training and test data sets the same probability distribution. However, this is not the case in real-world applications. Thus, it is important to leverage the existing source knowledge possessed by the model after training for the source task to solve a different target problem, given that the source and target may exhibit a distribution mismatch ([3]).
The underlying assumption made for transfer learning and domain adaptation is that the source and target domains differ in terms of marginal data distributions, but that the labeled data sets for the two domains are the same. There are also cases where the marginal distributions of the source and target data sets are related, but the source and target tasks have different labeled data sets.
Depending on the specific application, the transferred knowledge can be in the form of data instances, feature representations, or model parameters. In our specific solution, we focus on features learned when the model was trained on the source data set for the source task, then adapt those features to a new target data set and target task.
As an example, if we have a YOLO-based object detection model for detecting buildings in a specific data set, the features learned by that model can be utilized on the target data set to detect buildings in a data set with a different pixel distribution; this is done by performing a limited re-training for a few of layers in the YOLO model.
At Modzy, we approach transfer learning and domain adaptation from a “learning to learn” perspective. The learning to learn ability, shared by humans and animals, implies that as a biological cognitive system gains more experience, it becomes better at learning new tasks. We train our models on large data sets consisting of data points from different probability distributions.
Our objective is that our pre-trained models should have a very low generalization error. Further, we provide the opportunity for customized retraining of some of our models on the user’s data set. Our retraining option occurs under the following conditions:
Continual learning or learning to learn is an important topic in the AI field. Domain adaptation and transfer learning are important emerging solutions in the field of continual learning for computer vision and natural language processing tasks.
At Modzy, we develop our models to not only perform well on a range of datasets and applications, but also to transfer well to very specific applications for which large datasets may not exist. We use limited re-training options based on transfer learning and domain adaptation to bridge the probability distribution gap between the source and target dataset to minimize the effects of domain-induced changes in the learned feature distribution. In this way, we can reduce the generalization error as we apply a model to different tasks.
References