793 reads

How to Use TensorFlow and Cleanvision to Detect Starfish Threats in the Great Barrier Reef

by aravindput...January 25th, 2024

Too Long; Didn't Read

The blog outlines an innovative approach to protect Australia's Great Barrier Reef using AI and machine learning. It describes the threat posed by crown-of-thorns starfish (COTS) to the reef and how traditional Manta Tow surveys are limited in efficiency and scale. The Great Barrier Reef Foundation, in partnership with CSIRO and Google, initiated a program to use underwater cameras and AI for more effective COTS detection. The technology leverages TensorFlow, CleanVision, KerasCV, and YOLOv8 to identify starfish in underwater videos. CleanVision is used to clean image data for machine learning, ensuring high-quality inputs. The process includes downloading a Kaggle dataset, preparing and augmenting the data, and visualizing bounding boxes. The model, based on YOLOv8, is trained to detect COTS accurately. This AI-powered approach exemplifies the union of technology and ecology for sustainable conservation, highlighting the potential of AI in environmental protection, especially in complex habitats like the Great Barrier Reef.

featured image - How to Use TensorFlow and Cleanvision to Detect Starfish Threats in the Great Barrier Reef

The Great Barrier Reef in Australia is stunningly beautiful and the largest coral reef in the world, home to an incredible diversity of marine life.

Unfortunately, the reef faces threats from crown-of-thorns starfish (COTS) that eat coral. To control COTS outbreaks, reef managers use a method called Manta Tow surveys where divers are towed behind a boat, visually assessing sections of the reef.

However, this method has limitations in efficiency, data quality, and scalability.

To improve COTS surveillance and control, the Great Barrier Reef Foundation started an innovation program. A key part is using underwater cameras to collect reef images and applying AI to detect COTS automatically.

To develop the machine learning technology for this video-based surveying at scale, Australia's national science agency CSIRO has partnered with Google. Their goal is to create systems that can accurately and efficiently analyze large volumes of imagery to pinpoint COTS outbreaks across the vast Great Barrier Reef in near real-time.

This would greatly assist conservation efforts and help secure the long-term protection of the reef ecosystem.

In summary, AI-powered image analysis is being applied to automate detecting crown-of-thorns starfish from footage of the enormous Great Barrier Reef, enabling better monitoring and control at massively increased efficiency and scale to safeguard the reef.

The goal of this competition is to accurately identify starfish in real-time by building an object detection model trained on underwater videos of coral reefs. In this blog post, we will build an ML pipeline to analyze the reef’s images and predict the presence and position of crown-of-thorns starfish.

We will be using Cleanvision - an Opensource package that uses data-centric AI to clean up the issues in the image data and Tensorflow, KerasCV, and YOLOv8 - used for computer vision tasks like objection detection and image classification.

!pip install -q cleanvision keras-cv kaggle

As the data size is huge, we are directly using Kaggle to download and run the tutorial. For that, we need to configure your Kaggle creds and give you the appropriate permissions; read more here!

! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

Downloading and Extracting the Kaggle Dataset

!kaggle competitions download -c tensorflow-great-barrier-reef

!unzip /content/tensorflow-great-barrier-reef.zip

Using Cleanvision to Clean Up the Data

CleanVision is an open-source data-centric AI package that automatically identifies problematic images that could negatively impact computer vision systems. It scans image datasets and detects common data issues like blur, inadequate lighting, duplicate images, etc.

Fixing these problems in the training data is an essential first step before developing machine learning models to ensure their reliability.

CleanVision provides a simple unified interface - just run the same few lines of Python code to audit any image collection, regardless of size or origin. This makes it extremely easy to clean dirty image data.

By proactively flagging low-quality, duplicates, and other undesirable images upfront, CleanVision helps save significant time and improves outcomes when building computer vision applications.

It's a fast way to boost data hygiene as a prerequisite before applying machine learning to image-based tasks.

from cleanvision import Imagelab

# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="train_images/")

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()

import os
for st in imagelab.info['exact_duplicates']['sets']:
  os.remove(st[0])

Importing Relevant Libraries

import pandas as pd
# show images inline
%matplotlib inline

import keras
import tensorflow

# import miscellaneous modules
import matplotlib.pyplot as plt
import cv2
import os
import numpy as np
import time

import tensorflow as tf

Preparing Dataset for Preprocessing

We will read the dataset and then preprocess it to be able to use it for model training. We begin by loading the train CSV file and converting the annotations to integer format from the original JSON format.

df_train = pd.read_csv("train.csv")

df_train=df_train.loc[df_train["annotations"].astype(str) != "[]"]
df_train['annotations'] = df_train['annotations'].apply(eval)

df_train['image_path'] = "train_images/video_" + df_train['video_id'].astype(str) + "/" + df_train['video_frame'].astype(str) + ".jpg"
df_extrain=df_train.explode('annotations') # Single annotation per row
df_extrain.reset_index(inplace=True)
df_extrain.head()

df_extrain_main=pd.DataFrame(pd.json_normalize(df_extrain['annotations']), columns=['x', 'y', 'width', 'height']).join(df_extrain)
df_extrain_main['class']=0
df_extrain_main=df_extrain_main[['image_path','x','y','width','height','class','video_id','video_frame']]
df_extrain_main.head(10)

After the CSV processing, we try to split the CSV into a dataset and use the appropriate bounding box format, classes, and image paths to create the preliminary dataset ready to be processed further.

def create_tf_example(rowss,data_df):


    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []

    # Convert ---> [xmin,ymin,width,height] to [xmins,xmaxs,ymins,ymaxs]
    xmin = rowss['x']
    xmax = rowss['x']+rowss['width']
    ymin = rowss['y']
    ymax = rowss['y']+rowss['height']


    #main_data.append((rowss['image_path'],xmins,xmaxs,ymins,ymaxs))
    return rowss['image_path'],xmin,ymin,xmax,ymax

from PIL import Image, ImageDraw
paths = []
bboxes = []
classes = []
for index, row in df_extrain_main.iterrows():
            if index % 1000 == 0:
                print('Processed {0} images.'.format(index))
            image_path,xmins,ymins,xmaxs,ymaxs=create_tf_example(row,df_extrain_main)
            paths.append(image_path)
            bboxes.append([[float(xmins),float(ymins),float(xmaxs),float(ymaxs)]])
            classes.append([0])

Here, we are using tf.ragged.constant to create ragged tensors from the bbox and classes lists. A ragged tensor is a type of tensor that can handle varying lengths of data along one or more dimensions.

This is useful when dealing with data that has variable-length sequences, such as text or time series data.

In this case, the bbox and classes lists have different lengths for each image, depending on the number of objects in the image and the corresponding bounding boxes and classes. To handle this variability, ragged tensors are used instead of regular tensors.

Later, these ragged tensors are used to create a tf.data.Dataset using the from_tensor_slices method. This method creates a dataset from the input tensors by slicing them along the first dimension.

By using ragged tensors, the dataset can handle varying lengths of data for each image and provide a flexible input pipeline for further processing.

bbox = tf.ragged.constant(bboxes)
classes = tf.ragged.constant(classes)
image_paths = tf.ragged.constant(paths)

data = tf.data.Dataset.from_tensor_slices((image_paths, classes, bbox))

num_val = int(bbox.shape[0] * 0.2)

# Split the dataset into train and validation sets
val_data = data.take(num_val)
train_data = data.skip(num_val)

KerasCV includes pre-trained models for popular computer vision datasets, such as ImageNet, COCO, and Pascal VOC, which can be used for transfer learning. KerasCV also provides a range of visualization tools for inspecting the intermediate representations learned by the model and for visualizing the results of object detection and segmentation tasks.

In this particular notebook, we are using YOLOV8, so we have to format the dataset which is in a compatible format according to the YOLOV8 model. The bounding box input format should be as follows.

bounding_boxes = {
    # num_boxes may be a Ragged dimension
    'boxes': Tensor(shape=[batch, num_boxes, 4]),
    'classes': Tensor(shape=[batch, num_boxes])
}

The dictionary has two keys, 'boxes' and 'classes', each of which maps to a TensorFlow RaggedTensor or Tensor object. The 'boxes' Tensor has a shape of [batch, num_boxes, 4], where batch is the number of images in the batch and num_boxes is the maximum number of bounding boxes in any image.

The 4 represents the four values needed to define a bounding box: xmin, ymin, xmax, ymax.

The 'classes' Tensor has a shape of [batch, num_boxes], where each element represents the class label for the corresponding bounding box in the 'boxes' Tensor. The num_boxes dimension may be ragged, which means that the number of boxes may vary across images in the batch.

The final model input looks something like this:

{"images": images, "bounding_boxes": bounding_boxes}

def load_image(image_path):
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    return image


def load_dataset(image_path, classes, bbox):
    # Read Image
    image = load_image(image_path)
    bounding_boxes = {
        "classes": tf.cast(classes, dtype=tf.float32),
        "boxes": tf.cast(bbox, dtype=tf.float32),
    }
    return {"images": tf.cast(image, tf.float32), "bounding_boxes": bounding_boxes}

Data Augmentation

Applying effective data augmentation is critical yet challenging when building object detection models. Transformations like crops, flips, etc. must update the bounding box coordinates correctly. Doing this manually is complex and error-prone.

KerasCV provides specialized layers to handle augmentation aware of bounding boxes. It offers a wide range of transformations that automatically adjust bboxes to match the augmented images.

This integrated bounding box handling enables easily incorporating powerful augmentation into object detection pipelines with KerasCV. By leveraging tf.data pipelines, the augmentation can be done on-the-fly during training.

With KerasCV's robust bbox-aware augmentation capabilities, developers can achieve more diverse and useful training data and improved generalization in object detection models, while avoiding cumbersome manual handling of coordinates. The layers handle that complexity behind the scenes.

augmenter = keras.Sequential(
    layers=[
        keras_cv.layers.RandomFlip(mode="horizontal", bounding_box_format="xyxy"),
        keras_cv.layers.RandomShear(
            x_factor=0.2, y_factor=0.2, bounding_box_format="xyxy"
        ),
        keras_cv.layers.JitteredResize(
            target_size=(640, 640), scale_factor=(0.75, 1.3), bounding_box_format="xyxy"
        ),
    ]
)
resizing = keras_cv.layers.JitteredResize(
    target_size=(640, 640),
    scale_factor=(0.75, 1.3),
    bounding_box_format="xyxy",
)

Training Dataset With Augmentation

We are preparing the training dataset with augmentation for our object detection model. We start by setting the batch_size to 4, and then use the load_dataset function to load the data.

Next, we shuffle the data to ensure that our model doesn't overfit on a particular kind of data. We then use the ragged_batch function to batch the data with a fixed size of 4 and drop any remaining data.

Finally, we apply the augmenter function to the dataset to perform data augmentation, which helps to increase the diversity of the dataset and improve the model's accuracy.

The num_parallel_calls parameter is set to tf.data.AUTOTUNE to allow TensorFlow to dynamically tune the number of parallel calls to improve performance.

BATCH_SIZE =4
train_ds = train_data.map(load_dataset, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.shuffle(BATCH_SIZE * 4)
train_ds = train_ds.ragged_batch(BATCH_SIZE, drop_remainder=True)
train_ds = train_ds.map(augmenter, num_parallel_calls=tf.data.AUTOTUNE)

Validation Dataset

In this code block, we are creating a validation dataset by mapping the 'load_dataset' function to the 'val_data' set. This function helps in loading the data from the dataset and preprocessing it for the model.

We are also setting 'num_parallel_calls' to 'tf.data.AUTOTUNE' to enable parallelism for faster processing.

Next, we are shuffling the validation dataset by a factor of 'BATCH_SIZE * 4' to increase randomness and prevent overfitting. Finally, we are creating a 'ragged_batch' of 'BATCH_SIZE' with 'drop_remainder=True' to ensure that all batches have the same size.

This helps in making the model more efficient and consistent during training.

val_ds = val_data.map(load_dataset, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.shuffle(BATCH_SIZE * 4)
val_ds = val_ds.ragged_batch(BATCH_SIZE, drop_remainder=True)

Bounding Box Visualisation

We use the keras_cv library for bounding box visualization. It imports the necessary packages and defines a class mapping for the dataset. The visualize_dataset() function takes in the dataset inputs and uses the plot_bounding_box_gallery() function to display the images with bounding boxes overlaid. You can see that via train_ds and val_ds datasets.

import keras_cv
from keras_cv import bounding_box
from keras_cv import visualization
class_mapping = {0:'fish'}

def visualize_dataset(inputs, value_range, rows, cols, bounding_box_format):
    inputs = next(iter(inputs.take(1)))
    images, bounding_boxes = inputs["images"], inputs["bounding_boxes"]
    visualization.plot_bounding_box_gallery(
        images,
        value_range=value_range,
        rows=rows,
        cols=cols,
        y_true=bounding_boxes,
        scale=5,
        font_scale=0.7,
        bounding_box_format=bounding_box_format,
        class_mapping=class_mapping,
    )


visualize_dataset(
    train_ds, bounding_box_format="xyxy", value_range=(0, 255), rows=2, cols=2
)

visualize_dataset(
    val_ds, bounding_box_format="xyxy", value_range=(0, 255), rows=2, cols=2
)

Next, we unpack the dataset to be able to feed it into the model training function of the Keras API.

def dict_to_tuple(inputs):
    return inputs["images"], inputs["bounding_boxes"]


train_ds = train_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.prefetch(tf.data.AUTOTUNE)

val_ds = val_ds.map(dict_to_tuple, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(tf.data.AUTOTUNE)

Model Building

YOLOv8 is the newest iteration of the popular YOLO (You Only Look Once) family of models used for computer vision tasks like object detection and image classification. It was created by Ultralytics, building on their previous YOLOv5 model.

Compared to prior versions, YOLOv8 integrates numerous architectural upgrades for improved accuracy and performance. The creators also aimed to enhance the overall developer experience.

As the latest release in the industry-leading YOLO lineup, YOLOv8 represents the current state-of-the-art in domains like object detection. It incorporates both algorithmic advances to push the boundaries of CV capability and usability improvements for practitioners.

Keras-CV offers multiple YOLOV8 models-YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. For this example, we choose the YOLOV8s backbone and load it with the coco pre-trained weights.

Next, we build a YOLOV8 model using the YOLOV8Detector which accepts a feature extractor as the backbone argument, a num_classes argument that specifies the number of object classes to detect based on the size of the class_mapping list, a bounding_box_format argument that informs the model of the format of the bbox in the dataset, and finally, the feature pyramid network (FPN) depth is specified by the fpn_depth argument.

backbone = keras_cv.models.YOLOV8Backbone.from_preset(
    "yolo_v8_s_backbone_coco"  # We will use yolov8 small backbone with coco weights
)
yolo = keras_cv.models.YOLOV8Detector(
    num_classes=1,
    bounding_box_format="xyxy",
    backbone=backbone,
    fpn_depth=1,
)

Compiling the Model

The training process of a YOLOv8 model involves the use of two types of losses - classification loss and box loss.

The classification loss measures the difference between the predicted class probabilities and the true class probabilities for each detected object. Binary cross entropy loss is utilized in this case, as each detected item either belongs to a particular class or does not. This loss function helps optimize the model for accurate object classification.

On the other hand, the box loss calculates the dissimilarity between the predicted bounding boxes and the ground truth boxes. YOLOv8 uses Complete IoU (CIoU) instead of basic IoU for box loss calculation. CIoU considers additional factors like the boxes' aspect ratio, center distance, and scale. This helps better represent box similarity by looking at more box attributes beyond just overlap.

By jointly optimizing for object classification and precise localization during training, the classification and box losses help YOLOv8 generate highly accurate object detection by minimizing discrepancies between outputs and label data across classes and bounding box coordinates.

optimizer = tf.keras.optimizers.Adam(
    learning_rate=10e-3,
)

yolo.compile(
    optimizer=optimizer, classification_loss="binary_crossentropy", box_loss="ciou"
)

Training the Model

During each epoch, the model iteratively adjusts its parameters to minimize the difference between its predicted output and the actual output. This process helps the model learn how to accurately detect crown-of-thorns starfish in the underwater videos of coral reefs.

The number of epochs can be adjusted based on the training dataset size, complexity of the model, and desired accuracy. After training, the model can be used to predict the presence and position of starfish in real-time, helping researchers and conservationists monitor and control COTS outbreaks more efficiently and effectively.

yolo.fit(
    train_ds,
    epochs=3,
)

Conclusion

The integration of Cleanvision, TensorFlow, KerasCV, and YOLOv8 in this project exemplifies the power and versatility of AI in environmental conservation, particularly for the Great Barrier Reef.

By leveraging the strengths of these advanced tools, we've developed a robust model capable of accurately identifying crown-of-thorns starfish in complex underwater environments.

This not only marks a significant step forward in protecting one of the world's natural wonders but also demonstrates the immense potential of AI in addressing ecological challenges.