In this article, I will acquaint you with the notion of Few-shot learning, focusing on the widely-used approach known as SetFit for text classification.
To begin our learning journey, we'll start by revisiting Traditional Machine Learning techniques. Afterward, we'll transition into the realm of Few-shot Learning using SetFit for the purpose of text classification on an E-Commerce dataset.
In supervised machine learning, a substantial dataset is employed for model training. After completing the training process, we utilize test data to obtain predictions from the model. The model's ability to make precise predictions is honed through extensive training on this large dataset. However, a notable drawback of this conventional supervised learning approach is the necessity for an extensive and error-free training dataset, which may not always be accessible across all domains. Consequently, this has led to the emergence of the concept of few-shot learning.
Prior to delving into Sentence Transformer fine-tuning (SetFit), it is essential to briefly revisit a crucial aspect of Natural Language Processing (NLP) known as few-shot learning.
Few-shot learning involves training a model using limited training datasets, where the model gains knowledge from these small sets referred to as support sets. The key distinction lies in teaching few-shot models to discern both the similarities and differences among the training data. For instance, instead of instructing the model to classify images as cats or dogs, we instruct it to grasp the commonalities and distinctions among various animals. This approach, which focuses on understanding the similarities and distinctions within input data, is commonly referred to as meta-learning or learning-to-learn.
Few-shot learning's support set is also referred to as k-way and n-shot learning, where "k" represents the number of classes in the support set. In binary classification, for instance, k equals 2. Additionally, "n" indicates the number of samples available for each class within the support set. For instance, if there are 10 data points for the positive class and 10 for the negative class, n equals 10. In summary, this support set can be described as 2-way and 10-shot learning.
Now that we have a fundamental understanding of Few-shot learning, let's explore its practical application through SetFit.
SetFit, an exhilarating open-source tool for few-shot classification, was jointly developed by teams from Hugging Face and Intel Labs. You can find comprehensive information about it in the project repository.
Using just 8 labeled examples per class from the Customer Reviews (CR) sentiment dataset, SetFit achieves results on par with fine-tuning RoBERTa Large on the complete training set consisting of three thousand examples. It's worth highlighting that the fine-tuned RoBERTa model is three times larger than the SetFit model utilized in this comparison.
Below is the architecture of SetFit.
SetFit is super fast and efficient to train, with a competitive performance compared to larger models like GPT-3 and T-FEW.
The below image shows that SetFit outperforms RoBERTa in few-shot learning.
In this instructional guide, we'll be working with a unique E-Commerce dataset comprising four distinct categories: Books, Clothing & accessories, Electronics, and Household items. The primary objective of this dataset is to categorize product descriptions sourced from E-Commerce websites into these specified labels.
To facilitate a few-shot training approach, we will select eight samples from each of the four categories, resulting in a total of 32 training samples. The remaining samples will be reserved for testing purposes.
The support set we use here is 4-way and 8-shot learning.
A sample of the custom E-Commerce dataset is shown in the below figure,
We employ the Sentence Transformers pre-trained model called "all-mpnet-base-v2" to transform textual data into vector embeddings. This model generates vector embeddings with a dimensionality of 768 for the input text.
We will start the Implementation of SetFit by installing the required packages into the conda environment.
!pip3 install SetFit
!pip3 install sklearn
!pip3 install transformers
!pip3 install sentence-transformers
Having installed the packages, we can now load our dataset.
from datasets import load_dataset
dataset = load_dataset('csv', data_files={
"train": 'E_Commerce_Dataset_Train.csv',
"test": 'E_Commerce_Dataset_Test.csv'
})
We will take a look into the number of training and test samples.
To convert text labels into encoded labels, we will use LabelEncoder from sklearn package.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
With LabelEncoder we will encode both training and testing datasets and add those encoded labels into the “Label“ column of the dataset.
Encoded_Product = le.fit_transform(dataset["train"]['Label'])
dataset["train"] = dataset["train"].remove_columns("Label").add_column("Label", Encoded_Product).cast(dataset["train"].features)
Encoded_Product = le.fit_transform(dataset["test"]['Label'])
dataset["test"] = dataset["test"].remove_columns("Label").add_column("Label", Encoded_Product).cast(dataset["test"].features)
Now we will initialise the SetFit model and sentence-transformers model.
from setfit import SetFitModel, SetFitTrainer
from sentence_transformers.losses import CosineSimilarityLoss
model_id = "sentence-transformers/all-mpnet-base-v2"
model = SetFitModel.from_pretrained(model_id)
trainer = SetFitTrainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
loss_class=CosineSimilarityLoss,
metric="accuracy",
batch_size=64,
num_iterations=20,
num_epochs=2,
column_mapping={"Text": "text", "Label": "label"}
)
Having initialized both models, now we call the training procedure.
trainer.train()
Once the training is completed for 2 epochs, we will evaluate the trained model on the eval_dataset.
trainer.evaluate()
Our trained model produced a peak accuracy of 87.5%.
While an accuracy of 87.5% might not typically be considered high, it's important to take into account that our model was trained with just 32 samples. Given the limited dataset size, achieving an 87.5% accuracy on the test dataset is actually quite impressive, indicating that the model performs admirably well under these circumstances.
SetFit also provides the capability for you to save the trained model to your local storage and subsequently load it from the disk for use in future predictions.
trainer.model._save_pretrained(save_directory="SetFit_ECommerce_Output/")
model = SetFitModel.from_pretrained("SetFit_ECommerce_Output/", local_files_only=True)
To make predictions on new data,
input = ["Campus Sutra Men's Sports Jersey T-Shirt Cool-Gear: Our Proprietary Moisture Management technology. Helps to absorb and evaporate sweat quickly. Keeps you Cool & Dry. Ultra-Fresh: Fabrics treated with Ultra-Fresh Antimicrobial Technology. Ultra-Fresh is a trademark of (TRA) Inc, Ontario, Canada. Keeps you odour free."]
output = model(input)
The predicted output is 1. The LabelEncoded value for the label is “Clothing $ Accessories“.
Therefore, our model demonstrates both accuracy and efficiency in contrast to traditional ML models, which require substantial training resources, including time and data, to become proficient.
After perusing this article, I believe you now possess a fundamental grasp of the concept of few-shot learning and how to apply it using SetFit. To gain a more profound comprehension, I strongly advise you to select a practical scenario, create a dataset, write code, and enjoy the learning process. Those who are intrigued may also explore Zero-shot Learning and One-shot Learning. Keep your learning journey going!