paint-brush
Mastering Few-Shot Learning with SetFit for Text Classificationby@shyamganesh
13,161 reads
13,161 reads

Mastering Few-Shot Learning with SetFit for Text Classification

by Shyam Ganesh SSeptember 13th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Traditional ML Few Shot Learning is a way to teach students how to use a computer to learn. Traditional ML teaches students to use computers to learn, rather than to teach them how to do it. ML teaches people how to learn by using a computer, not by teaching them to learn about it.
featured image - Mastering Few-Shot Learning with SetFit for Text Classification
Shyam Ganesh S HackerNoon profile picture


In this article, I will acquaint you with the notion of Few-shot learning, focusing on the widely-used approach known as SetFit for text classification.


To begin our learning journey, we'll start by revisiting Traditional Machine Learning techniques. Afterward, we'll transition into the realm of Few-shot Learning using SetFit for the purpose of text classification on an E-Commerce dataset.


Traditional ML

In supervised machine learning, a substantial dataset is employed for model training. After completing the training process, we utilize test data to obtain predictions from the model. The model's ability to make precise predictions is honed through extensive training on this large dataset. However, a notable drawback of this conventional supervised learning approach is the necessity for an extensive and error-free training dataset, which may not always be accessible across all domains. Consequently, this has led to the emergence of the concept of few-shot learning.


Prior to delving into Sentence Transformer fine-tuning (SetFit), it is essential to briefly revisit a crucial aspect of Natural Language Processing (NLP) known as few-shot learning.


Few-shot Learning

Few-shot learning involves training a model using limited training datasets, where the model gains knowledge from these small sets referred to as support sets. The key distinction lies in teaching few-shot models to discern both the similarities and differences among the training data. For instance, instead of instructing the model to classify images as cats or dogs, we instruct it to grasp the commonalities and distinctions among various animals. This approach, which focuses on understanding the similarities and distinctions within input data, is commonly referred to as meta-learning or learning-to-learn.


Few-shot learning's support set is also referred to as k-way and n-shot learning, where "k" represents the number of classes in the support set. In binary classification, for instance, k equals 2. Additionally, "n" indicates the number of samples available for each class within the support set. For instance, if there are 10 data points for the positive class and 10 for the negative class, n equals 10. In summary, this support set can be described as 2-way and 10-shot learning.


Now that we have a fundamental understanding of Few-shot learning, let's explore its practical application through SetFit.



SetFit: Sentence Transformer fine-tuning

SetFit, an exhilarating open-source tool for few-shot classification, was jointly developed by teams from Hugging Face and Intel Labs. You can find comprehensive information about it in the project repository.


Using just 8 labeled examples per class from the Customer Reviews (CR) sentiment dataset, SetFit achieves results on par with fine-tuning RoBERTa Large on the complete training set consisting of three thousand examples. It's worth highlighting that the fine-tuned RoBERTa model is three times larger than the SetFit model utilized in this comparison.


Below is the architecture of SetFit.


SetFit architecture

Image source


Implementation of Few-shot learning: SetFit

SetFit is super fast and efficient to train, with a competitive performance compared to larger models like GPT-3 and T-FEW.



SetFit comparison with T-Few 3B model



The below image shows that SetFit outperforms RoBERTa in few-shot learning.

SetFit comparison with RoBERTa

Image source


Dataset

In this instructional guide, we'll be working with a unique E-Commerce dataset comprising four distinct categories: Books, Clothing & accessories, Electronics, and Household items. The primary objective of this dataset is to categorize product descriptions sourced from E-Commerce websites into these specified labels.


To facilitate a few-shot training approach, we will select eight samples from each of the four categories, resulting in a total of 32 training samples. The remaining samples will be reserved for testing purposes.


The support set we use here is 4-way and 8-shot learning.


A sample of the custom E-Commerce dataset is shown in the below figure,


sample of the custom E-Commerce dataset



We employ the Sentence Transformers pre-trained model called "all-mpnet-base-v2" to transform textual data into vector embeddings. This model generates vector embeddings with a dimensionality of 768 for the input text.


We will start the Implementation of SetFit by installing the required packages into the conda environment.

!pip3 install SetFit
!pip3 install sklearn
!pip3 install transformers
!pip3 install sentence-transformers


Having installed the packages, we can now load our dataset.

from datasets import load_dataset
dataset = load_dataset('csv', data_files={
"train": 'E_Commerce_Dataset_Train.csv',
"test": 'E_Commerce_Dataset_Test.csv'
})


We will take a look into the number of training and test samples.
Train and Test data


To convert text labels into encoded labels, we will use LabelEncoder from sklearn package.

from sklearn.preprocessing import LabelEncoder 
le = LabelEncoder()



With LabelEncoder we will encode both training and testing datasets and add those encoded labels into the “Label“ column of the dataset.

Encoded_Product = le.fit_transform(dataset["train"]['Label']) 
dataset["train"] = dataset["train"].remove_columns("Label").add_column("Label", Encoded_Product).cast(dataset["train"].features)
Encoded_Product = le.fit_transform(dataset["test"]['Label']) 
dataset["test"] = dataset["test"].remove_columns("Label").add_column("Label", Encoded_Product).cast(dataset["test"].features)



Now we will initialise the SetFit model and sentence-transformers model.

from setfit import SetFitModel, SetFitTrainer 
from sentence_transformers.losses import CosineSimilarityLoss
model_id = "sentence-transformers/all-mpnet-base-v2" 
model = SetFitModel.from_pretrained(model_id)
trainer = SetFitTrainer( 
  model=model,
  train_dataset=dataset["train"],
  eval_dataset=dataset["test"],
  loss_class=CosineSimilarityLoss,
  metric="accuracy",
  batch_size=64,
  num_iterations=20,
  num_epochs=2,
  column_mapping={"Text": "text", "Label": "label"}
)



Having initialized both models, now we call the training procedure.

trainer.train()


Once the training is completed for 2 epochs, we will evaluate the trained model on the eval_dataset.

trainer.evaluate() 


Our trained model produced a peak accuracy of 87.5%.

While an accuracy of 87.5% might not typically be considered high, it's important to take into account that our model was trained with just 32 samples. Given the limited dataset size, achieving an 87.5% accuracy on the test dataset is actually quite impressive, indicating that the model performs admirably well under these circumstances.



SetFit also provides the capability for you to save the trained model to your local storage and subsequently load it from the disk for use in future predictions.

trainer.model._save_pretrained(save_directory="SetFit_ECommerce_Output/")
model = SetFitModel.from_pretrained("SetFit_ECommerce_Output/", local_files_only=True)



To make predictions on new data,

input = ["Campus Sutra Men's Sports Jersey T-Shirt Cool-Gear: Our Proprietary Moisture Management technology. Helps to absorb and evaporate sweat quickly. Keeps you Cool & Dry. Ultra-Fresh: Fabrics treated with Ultra-Fresh Antimicrobial Technology. Ultra-Fresh is a trademark of (TRA) Inc, Ontario, Canada. Keeps you odour free."]
output = model(input)



The predicted output is 1. The LabelEncoded value for the label is “Clothing $ Accessories“.


Therefore, our model demonstrates both accuracy and efficiency in contrast to traditional ML models, which require substantial training resources, including time and data, to become proficient.


After perusing this article, I believe you now possess a fundamental grasp of the concept of few-shot learning and how to apply it using SetFit. To gain a more profound comprehension, I strongly advise you to select a practical scenario, create a dataset, write code, and enjoy the learning process. Those who are intrigued may also explore Zero-shot Learning and One-shot Learning. Keep your learning journey going!