After the success of and the rise of deep learning, tackling video was an obvious next step. Just like the classification of images, the task of video classification is the most straightforward start on the path to general video understanding models. As for the specific labels that are being classified, the computer vision research community has gravitated toward classifying human actions in videos. image classification dataset challenges One of the earliest human action recognition video datasets, even before deep learning took off, was from 2004. Action recognition datasets have come a long way since then, some focusing on clips from , while others . the KTH dataset Hollywood movies focusing on sports In 2017, DeepMind released one of the largest and most impactful human action recognition datasets yet, . As of the writing of this post, four versions of the Kinetics dataset have been released: , , , and . The version number indicates the number of action classes. Additionally, each version adds new videos to replace those that have been deleted from YouTube over time. Kinetics 400 600 700 700–2020 This integration includes a sophisticated way to download the dataset, as well as examples of how to evaluate and improve models trained on the dataset. This post walks through the integration of Kinetics into the open-source dataset curation and model analysis tool, . FiftyOne Downloading Kinetics is now as easy as: import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("kinetics-600") Setup To run the examples in this post, you need to : install FiftyOne pip install fiftyone You will also need to install which is used by FiftyOne to download videos from YouTube: Pytube pip install pytube Downloading Kinetics Until recently, the only way to access the Kinetics dataset was to download each video directly from their sources on YouTube. This resulted in including videos having been deleted, YouTube throttling downloads, and inefficiencies in clipping videos. numerous issues The Common Visual Data Foundation (CVDF) has collaborated with the Kinetics dataset maintainers to for the general public to download. It should be noted that the CVDF-hosted version does not include all samples present in the original dataset, only those that were available on YouTube at the time that the CVDF version was created. host all versions of the dataset on AWS The CVDF has made it much easier to gain access to the full dataset. However, you still need to handle the challenges of visualizing, wrangling, and subsetting the dataset to meet your needs. In some cases, you don’t want to have to download the entire dataset, to begin with. This is where the integration of comes in. With just one line of Python code, you can now specify the version, the split, and the classes that you want and then visualize it in the with just another line of code. Kinetics into the FiftyOne Dataset Zoo FiftyOne App import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "kinetics-700-2020",
    split="validation",
    classes=["grooming cat", "grooming dog"],
    max_samples=10,
)

session = fo.launch_app(dataset) Training and Evaluating a Model After having downloaded Kinetics, you can now start using it to train action recognition models. Since the dataset is already in FiftyOne, it is easy to use libraries like or to train a model directly on the dataset. PyTorch PyTorch Lightning Flash pip install lightning-flash lightning-flash[video] torchvision pytorchvideo import torch

from flash import Trainer
from flash.video import VideoClassificationData, VideoClassifier

import fiftyone as fo
import fiftyone.zoo as foz

classes = [
    "swimming backstroke",
    "swimming breast stroke",
    "swimming butterfly stroke",
    "swimming front crawl",
]

# Load Kinetics
dataset = foz.load_zoo_dataset(
    "kinetics-700-2020",
    splits=["train", "validation"],
    classes=classes,
    max_samples=50,
    shuffle=True,
)

# Replace spaces in class names with underscore
labels = dataset.distinct("ground_truth.label")
labels_map = {l: l.replace(" ", "_") for l in labels}
dataset = dataset.map_labels("ground_truth", labels_map).clone()

# Create views for dataset splits
train_view = dataset.match_tags("train")
val_view = dataset.match_tags("validation")

# Create the Flash Datamodule
datamodule = VideoClassificationData.from_fiftyone(
    train_dataset=train_view,
    val_dataset=val_view,
    predict_dataset=val_view,
    label_field="ground_truth",
    batch_size=1,
    clip_sampler="uniform",
    clip_duration=1,
    decode_audio=False,
)

# Build the model
model = VideoClassifier(
    backbone="x3d_xs",
    labels=datamodule.labels,
    pretrained=True,
)

trainer = Trainer(
    max_epochs=10,
    limit_train_batches=5,
    gpus=torch.cuda.device_count(),
)

# Finetune the model
trainer.finetune(model, datamodule=datamodule, strategy="freeze") After your model is trained, you can then generate predictions on the validation and test splits and use FiftyOne to of the model. evaluate the performance from itertools import chain

from flash.core.classification import FiftyOneLabelsOutput

def get_fo_label_preds(samples, datamodule, trainer):
    # Return a list of predictions in fo.Detection format
    predictions = trainer.predict(
        model,
        datamodule=datamodule,
        output=FiftyOneLabelsOutput(return_filepath=False, labels=datamodule.labels),
    )
    predictions = list(chain.from_iterable(predictions))  # flatten batches
    return predictions

predictions = get_fo_label_preds(val_view, datamodule, trainer)

# Add predictions to FiftyOne dataset
val_view.set_values(
    "predictions", predictions
)

session = fo.launch_app(val_view)

results = val_view.evaluate_classifications(
    "ground_truth",
    "predictions",
    eval_key="eval",
) The results of the evaluation can be used for things like plotting and . confusion matrices precision-recall curves pip install ipywidgets underscore_classes = [c.replace(" ", "_") for c in classes]

plot = results.plot_confusion_matrix(classes=underscore_classes)
plot.show() As you can see, since we only finetuned the model on a few dozen samples, it is overfitting to the backstroke and butterfly stroke classes. This implies that we should download additional samples of the other two classes and continue training. Analyzing the model to find the best and worst-performing samples can shed light on the best ways to improve your model’s performance. from fiftyone import ViewField as F

eval_view = val_view.filter_labels(
    "predictions", (F("confidence") > 0.6) & (F("eval") == False)
)

session.view = eval_view The following shows one of the top examples in this evaluation view of highly confident but incorrectly predicted samples. There are multiple issues that we can see with this sample. First, the footage is first-person which is rare in this dataset. If we want to predict on first-person videos, then more should be added to the training set. Second, there are examples of both breaststroke and backstroke in the video so it would be difficult to assign a label. Third, the ground truth label is front crawl which does not appear at all in the dataset. Using FiftyOne to get hands-on and analyze specific samples can lead to results like these highlighting ways that you can improve the dataset itself. Since Kinetics is a very large dataset, we could easily download additional videos to supplement problematic samples that we may want to exclude from training. Improvements to your dataset can lead to easier gains in model performance than working on improving the model architecture itself. Summary The makes it easier than ever to be able to download exactly the subset of Kinetics that you want or even the dataset in its entirety. Additionally, FiftyOne allows for in-depth evaluation and analysis of video models leading to better datasets and higher performing models. integration of the Kinetics dataset into FiftyOne -- (Originally posted by ) here

Kinetics Dataset - Training and Evaluating Models for Video Classification

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Security Products to Protect Your Smart Home

10 Must-Try Open Source Tools for Machine Learning

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Biggest Image Datasets for Computer Vision

10 Best Image Classification Datasets for ML Projects

11 Torchvision Datasets for Computer Vision You Need to Know

10 Security Products to Protect Your Smart Home

10 Must-Try Open Source Tools for Machine Learning

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Biggest Image Datasets for Computer Vision

10 Best Image Classification Datasets for ML Projects

11 Torchvision Datasets for Computer Vision You Need to Know

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps