In machine learning and deep learning, having more data is very important to help you get good performance from your models. You can create more data by using a technique called data augmentation. Data augmentation is a technique used by practitioners to increase the data by creating modified data from the existing data.
“We don’t have better algorithms. We just have more data.”- Peter Norvig
It is a good practice to use data augmentation techniques if you have a small dataset for your project or you want to reduce overfitting in your ML or deep learning (DL) models.
In this article, you will learn how to perform data augmentation by using a new open-source library from Facebook called Augly.
What is Augly?
AugLy is a data augmentation library that can help you evaluate and improve the robustness of your models. The library supports four modalities (audio, video, image, and text) and it contains over 100 ways to perform data augmentations.
If you are working on a machine learning or deep learning project that uses audio, videos, images, or texts datasets, you can use this library to increase your data and improve your model performance.
The library was developed by Joanna Bitton — a software Engineer at Facebook AI, Zoe Papakipos — Research Engineer at FAIR, and other researchers and engineers at Facebook.
The library has been used in different projects such as:
- Image Similarity Challenge - a NeurIPS 2021 competition run by Facebook AI with $200k in prizes. It has produced the DISC21 dataset, which will be made publicly available after the challenge concludes!
- DeepFake Detection Challenge - a Kaggle competition run by Facebook AI in 2020 with $1 million in prizes; also produced the DFDC dataset.
- SimSearchNet - a near-duplicate detection model developed at Facebook AI to identify infringing content on the platforms.
How to Install Augly
AugLy is a Python 3.6+ library. It can be installed with:
pip install augly
Note: The above command installs only base requirements to use the image and text modalities. For audio and video modalities, you can install the extra dependencies required with
pip install augly[av]
In some environments, pip doesn't install python-magic as expected. In that case, you will need to additionally run:
conda install -c conda-forge python-magic
Data Augmentation Techniques for Text Data
The first step is to import text modality which contains augmentation techniques for text data.
import augly.text as textaugs
Then create a simple text input.
# Define input text
input_text = "Hello, world! Today we learn Data Augmentation techniques"
Now we can apply various augmentations as follows:
(a) Simulates Typos
Simulates typos in each text using misspellings, keyboard distance, and swapping techniques.
print(textaugs.simulate_typos(input_text))
Hello, world! Today ew leanr Dtaa Augmentation techniques
As you can see this technique adds some misspellings and swapping on some of the words of text.
(b) Insert Punctuation Chars
You can insert punctuation characters in each input text.
print(textaugs.insert_punctuation_chars(input_text))
['H,e,l,l,o,,, ,w,o,r,l,d,!, ,T,o,d,a,y, ,w,e, ,l,e,a,r,n, ,D,a,t,a, ,A,u,g,m,e,n,t,a,t,i,o,n, ,t,e,c,h,n,i,q,u,e,s']
(c) Replace Bidirectional
This technique reverses each word (or part of the word) in each input text and uses bidirectional marks to render the text in its original order. It reverses each word separately which keeps the word order even when a line wraps.
print(textaugs.replace_bidirectional(input_text))
['\u202eseuqinhcet noitatnemguA ataD nrael ew yadoT !dlrow ,olleH\u202c']
(d) Replace Similar Characters
This replaces letters in each text with similar characters.
print(textaugs.replace_similar_chars(input_text))
Hello, wor7d! T()day we learn Data Augm3^tati[]n techniques
As you can see the character “l” has been replaced with number 7, character “o” has been replaced with “()”, character “e” has been replaced with number 3 and then the character “o” has been replaced with “[]”.
(e) Replace Upside Down
This flips words in the text upside down depending on the granularity.
print(textaugs.replace_upside_down(input_text))
sǝnbᴉuɥɔǝʇ uoᴉʇɐʇuǝɯɓnⱯ ɐʇɐᗡ uɹɐǝl ǝʍ ʎɐpoꞱ ¡plɹoʍ 'ollǝH
(f) Split Words
This function splits words in the text into subwords.
print(textaugs.split_words(input_text))
He llo, world! To day we learn Data Augmentation techniques
Data Augmentation Techniques for Image Data
The first step is to import image modality with its dependencies which contain augmentation techniques for image data.
import os
import augly.image as imaugs
import augly.utils as utils
from IPython.display import display
Now we can apply various augmentations as follows:
(a) Image Scaling
The scale function can help you to alter the resolution of an image. You can use an argument called factor to define the ratio by which the image should be downscaled or upscaled.
input_img_path = "images/simple-image.jpg"
# We can use the AugLy scale augmentation
input_img = imaugs.scale(input_img_path, factor=0.2)
display(input_img)
(b) Blurs the Image
In this function, the larger the radius the blurrier the image.
input_img = imaugs.blur(input_img, radius=5.0)
display(input_img)
(c) Change the Brightness of the Image
To change the brightness you need to adjust the factor argument in this function. Values less than 1.0 darken the image and values greater than 1.0 brighten the image. Setting the factor to 1.0 will not alter the image's brightness.
Let's set factor's value be 1.5.
input_img = imaugs.brightness(input_img,factor=1.5)
display(input_img)
Then let’s set the factor's value to 0.5 to make it darker.
#make it darker
input_img = imaugs.brightness(input_img,factor=0.5)
display(input_img)
(d) Changes the Aspect Ratio of the Image
In this function, the aspect ratio is the width/height of the new image you want to create.
input_img = imaugs.change_aspect_ratio(input_img, ratio=0.8)
display(input_img)
(e)Alters the Contrast of the Image
In this function the factor argument handle everything, When you set the factor to zero, it gives a grayscale image, values below 1.0 decreases contrast,
A factor of 1.0 gives the original image, and a factor greater than 1.0 increases the contrast.
input_img = imaugs.contrast(input_img,factor=1.7)
display(input_img)
(f) Crop the Image
To crop the image, you need to define the position of the left, right top and down the edge of the cropped image.
input_img = imaugs.crop(input_img,
x1=0.25,
x2=0.75,
y1=0.25,
y2=0.75
)
display(input_img
Final Thoughts on Data Augmentation with Augly Library
In this article, you have learned the importance of data augmentation in your ML or DL project. Also, you have learned how to perform data augmentation with augly library for image and text data.
As I have explained before, the library has over 100 augmentation techniques and most of them were not covered in this article.
If you want to learn how to perform data augmentation for audio and video data, please read in the README for each modality!
- Audio -https://github.com/facebookresearch/AugLy/tree/main/augly/audio
- Video - https://github.com/facebookresearch/AugLy/tree/main/augly/video
If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in the next post!
You can also find me on Twitter @Davis_McDavid.
And you can read more articles like this here.
Want to keep up to date with all the latest in python? Subscribe to our newsletter in the footer below