How Deep Learning Can Help Quantify, Monitor, and Remove Marine Plastic: The DeepPlastic Way by@gautamtata

How Deep Learning Can Help Quantify, Monitor, and Remove Marine Plastic: The DeepPlastic Way

image HackerNoon profile picture

Solving the Oceans Most Complex Problems by Making Fish Farming Smart.

Along with my team, I’ve been involved in a project called DeepPlastic, a novel approach that uses Deep Learning to identify marine-plastic and I wanted to share the findings:

85% mean-average precision!

Using our model, we can now detect on average 85% (mAP) of all epipelagic plastic in the ocean. We achieve this level of precision based on a Neural Network architecture called YOLOv5-S. Below we’ve attached a video containing examples of marine plastic that we’ve run our model on:

Where’s the Paper and The Code?

This article would mainly serve as a no-code summary of our research paper. If you’d like to skip straight to the nitty-gritty details of the neural network or the code, you can access them here:

arxiv pre-print:

Codebase and Dataset:

If you’d like me to write another blog post explaining the code, data augmentation, etc., let me know in the comments below!

How Did I Get Involved In This?

I’ve always been passionate about the oceans. After graduating from school at California State University, Monterey Bay, I knew that I wanted to build tools/software that can help protect the ocean. After I collected, annotated, and curated the data along with the model, I realized that the precision metrics were significant, and there could be a manuscript that could be published here. So, I asked for help from Dr. Sara-Jeanne Royer, an expert in Marine plastics, and Jay Lowe and Olivier Poirion for help with the writing and publishing of the manuscript.


The quantification of positively buoyant marine plastic debris is critical to understanding how concentrations of trash gather across the world’s oceans and identifying high concentration garbage hotspots in dire need of trash removal.

Currently, the most common monitoring method to quantify floating plastic requires the use of a manta trawl. Before analysis, the need for physical removal incurs high costs and requires intensive labor–preventing scalable deployment of a real-time marine plastic monitoring service across the oceans. Without better monitoring and sampling methods, the total impact of plastic pollution on the environment as a whole, and details of impact within specific oceanic regions, will remain unknown.

This study presents an automated workflow that utilizes videos and images captured within the epipelagic layer of the ocean as input and produces real-time quantification of marine plastic for accurate quantification and removal.

YOLOv5-S was the best performing model, which operates at a Mean Average Precision (mAP) of 0.851 and an F1-Score of 0.89 while maintaining near real-time speed. In addition, our method can utilize” off the shelves” camera equipment and standard low-cost GPUs to monitor/quantify epipelagic plastic in near real-time.

Goal For The Project

We wanted to build a generalized object detector capable of identifying and quantifying sub-surface plastic around the world.

Now that we understand our goal and how to achieve it, let us jump into the workflow.


Curating The Dataset

Finding a dataset that contained annotated pictures of marine debris was incredibly hard. There was no dataset in existence that had images of marine plastic in the epipelagic layer of the ocean. So, I decided to create one. I bought a GoPro Hero 9, A wetsuit, and snorkeling equipment and headed out to various locations in California along with 2 Plastic bags and 2 Plastic Bottles.

The locations I visited were: Lake Tahoe, Bodega Bay, San Francisco Bay. Here, I shot videos of the plastic in 4K and later broke them into images frame-by-frame. [All plastics used were sanitized and were removed from the environment after I finished capturing the videost] The initial dataset was more than 100,000 images which I then painstakingly went through one by one and chose the best images and annotated them using The final dataset, along with images scraped from the internet, was a whopping 4000 images.

I tried my best to replicate real-world scenarios such as occlusion and brightness by burying objects in sand or placing them across from the sun.


(Bottom Right) Photo by Naja Bertolt Jensen on Unsplash; (Top Right) Photo by Nariman Mesharrafa on Unsplash; (Top Left, Bottom Right) Image by Author

Data Formatting

Images were resized into 416x416 and converted into the format the Darknet and YOLOv5 PyTorch require.

Data Augmentation

Since the final dataset only contained 4000 images, I thought the best way to increase the size of the dataset was to augment it. So, I used Flip, Rotate, Brightness to replicate oceanic environments.

I also used B&W to make the model not generalize towards color and cutouts to simulate occlusion.

Building the Neural Network

Building a Neural Network was a straightforward task. I had two goals for model selection: The model had to be somewhat accurate, and the model had to be fast. Fast enough to be used on Buoys and UAVs. I tried many models, such as Faster R-CNN, EfficientDet, SSD, etc., but stuck with two models: YOLOv4-Tiny and YOLOv5-S.

Interested in a code run down for YOLOv5? Let me know in the comments below or reach out to me.

Things to know/Tuning Hyperparameters:

I used an Adaptive Learning Rate called ADAM to set a decaying learning rate.Used a package called W&B (Weights and Biases) and continuously monitored the loss.I used a softmax as the final layer and only used a single class called trash_plastic.I used and Google Colab pro with NVIDIA v100 GPUs to train the model.Used transfer learning from weights trained on Underwater Scenes and Deep Sea Debris — JAMSTEC JEDI Dataset.

All the code used for the models including architectures can be found at YOLOv4YOLOv5

Interested in our code? Find it here 


After a lot of experimenting with training methods, data augmentations, and fine-tuning hyperparameters, we finally reached a point where the results were good enough to be used in real-world deployments.

Best model YOLOv5-S: Precision: 96%, Mean-Average-Precision: 85%, F1-Score: 0.89, Inference Speed: 2.1 milliseconds/img.


(Bottom Row) Photos by JAMSTEC JEDI(Second Row First Image) (Bottom Right) Photo by Naja Bertolt Jensen on Unsplash; All Other Images by Author

What’s Next?

At the moment, we are in the process of getting our paper published. We are trying to get the model in the hands of other researchers for testing and developing innovative ways to synthesize more data.

If you’re interested, want to contribute, or want to chat, You can reach me here: [email protected]om.

This article was first published here