Strategy for Incorporating Data Engineering for Computer Vision in Autonomous Driving

Introduction

Perception in an autonomous vehicle can be called its eyes because it helps in understanding the surrounding environment by performing various computer vision tasks, such as object detection, classification, tracking, and image segmentation.

So, while working in the autonomous driving field, a significant amount of time is dedicated to working with datasets. Either a lot of raw data comes, and it needs to be annotated, or there is a lack of data to solve a problem, and it is required to figure out how to enlarge the volume artificially. Therefore, the objective is to incorporate data engineering within autonomous driving perception projects. The action plan of the strategy is presented below.

First of all, identify and analyze the requirements for the task being solved, such as characteristics of data and available resources:

accessibility of the annotation team;
hardware requirements;
the essential amount of data;
the sources to collect data;
the relevant type of annotation.

Secondly, implement the tactics below:

Set up data annotation processes for real data;
Replenish datasets with augmentations or synthetics;
Use data version control.

Let’s examine them closely.

Tactic I. Set Up the Environment for Data Annotation

In autonomous driving, the common situation is when the sizes of the public datasets with real-world data are not enough to train a model because they do not cover either the desired amount of data, or scenarios, or objects which are specific to a particular area. That is why the need to gather real data yourself arises. Hence, the collected data will be raw with no annotations, as if it were in publicly available datasets. Without annotation, for example, the model will not be trained to detect traffic signs and, consequently, the vehicle will not be able to adapt its behavior to the situation on the road automatically.

There are multiple data annotation platforms. From an annotator’s point of view, the difference in utilizing the platforms is not significant. The main difference is in how to manage platforms as the engineer and coordinator for your purposes and needs of your project. Let’s discuss in more detail how to achieve your goals.

First of all, it is needed to collect all the possible information about the data and how it is planned to be utilized further in the project to choose the suitable platform:

type of the input data (separate images or as most of the time, streams like videos);
type of task to be solved (detection and tracking of pedestrians and vehicles, detection of lanes, detection of parking slots, etc.);
specific scenarios to be implemented (for example, cross-annotation and further analysis of annotations, such as calculating progress statistics or finding those who annotate differently from others);
and so forth.

Secondly, taking into account the gathered information above and peculiarities of features that are maintained by specific data annotation platforms, choose a data annotation platform to utilize and build the infrastructure accordingly. Below are some highlights about the platforms that deserve attention, covered in more detail.

For example, there is a difference between platforms in the various data formats that are supported. Additionally, some of them provide an opportunity to label images with a built-in or custom neural network for a specific list of objects. It might be useful and effective because there is less work to do manually, namely, almost no need to create annotations of objects from scratch – only making corrections to annotations provided by a network. Besides image annotation, some tools also support 3D point cloud annotation.

Another way to simplify and accelerate the video annotation process is to use the track mode. It becomes a widespread feature on platforms. Annotations that have been created on the first frame will appear on the next frame automatically. The identification number of the annotation that is assigned to a specific object will be retained associated with it in the next frame as well. Thus, the track mode leads to speeding up the work and making annotations more consistent.

What is more, for convenience, images can be distributed across either annotators or jobs by a configurable size depending on the data annotation platform – just how many images will be inside a tab or a job, respectively. Also, some platforms support switching the status to indicate that the work is finished and can be checked. Nonetheless, there might be limitations; for example, there might not be an explicit tracking of annotators’ individual progress.

Also, there can be a case when the implementation of cross-annotation is desired. Figure 1 demonstrates the diagram to visualize the essence of cross-annotation, where both annotators have their own images to label, and there are shared images, which need to be annotated by each of them. The main purpose of cross-annotation is to analyze annotations of the same images, which were labeled by several annotators, for research purposes and analysis of the quality of annotations.

Some open-source platforms provide convenient functionality to implement cross-annotation via API. The realization can be easily achieved when a tool is designed with a modular architecture that is open for custom extensions. For instance, when annotators have their own space for work, such as tabs. Tabs should be created in the project for each annotator, and images should be distributed randomly between them, taking into account intersections. Furthermore, progress tracking can be done for each tab and for all together. Also, data can be filtered in the tab by various conditions, for instance, not to show already labeled images.

Moreover, the processes of uploading data to the server and downloading annotations can be automated for the entire project or individual tasks. This can be implemented via the API of a chosen data annotation platform.

All in all, it is essential to think through the expectations from the annotation process and capabilities provided by the platform and choose the platform accordingly.

Tactic II. Identify and Implement Methods to Replenish Datasets

There may be times when the already gathered real-world data is not enough to train a neural network, and it may not be possible to collect real-world data before training. Also, it may not be possible to cover all the required cases with real-life data, for example, some weather conditions (fog, sun glare, etc.). However, a lot of accurate annotations are required. So, how to get the already annotated data? How to add more data?

There are a number of methods to increase the amount of data: transforming existing data or generating new data from scratch.

The first approach is known as data augmentation. It applies some modifications to the data to enlarge the number of samples with already existing scenarios. The second method produces synthetic data by creating new samples from scratch, without transforming existing data. It can be used to increase not only the volume of datasets, but also their diversity by creating rare situations and conditions, which helps to improve the generalization ability of deep learning models. Let’s take a closer look at both methods.

Data augmentation can be applied to data from different sensors, such as cameras that capture images and LiDAR systems that produce point clouds of the environment. Concerning, for example, the images, it is straightforward to implement by transforming the image, including changes in hue, saturation and brightness, rotations and flipping, perspective transformations, and so forth. It could be beneficial to get nighttime images and objects from new angles of views. There are over 70 implemented transformations in open-source, including snow and rain. Moreover, data augmentation is widely used in self-supervised learning. For instance, when data is not labeled and annotation is expensive. It is done by pulling together the embeddings of augmentations of the same image and repelling the embeddings of augmentations of other images. For LiDAR’s data, there are global and local augmentations. The global augmentations transform the entire point cloud, for instance, by rotating or translating all points along a specific axis. The local augmentations, on the contrary, transform only the points which belong to specific objects inside the point cloud.

Considering synthesizing data, the simulators or generative neural networks can be used to create samples. Simulators allow programming the setup for the synchronized collection of images from cameras along with accurate corresponding annotations. They can be resource-intensive when the development version is utilized for programming customized solutions or improvements. Another way to simulate data is by creating 3D models of objects in the 3D computer graphics software and then rendering 2D images from 3D scenes. Also, annotations can be extracted for the segmentation task by creating masks and modifying them to get bounding boxes for detection. Even though the approach based on neural networks is actively studied, generative deep learning, like generative adversarial networks and diffusion models, can be utilized to create realistic and high-quality images, diverse trajectories, and structured LiDAR point clouds.

To sum up, while enriching datasets, it is crucial to consider their domains and possible constraints and to choose a suitable way to add more content.

Tactic III. Utilize Data Version Control

Usually, datasets and code are stored separately from each other because Git has performance issues storing large datasets. In addition, the datasets are large in autonomous driving. They may reach dozens of terabytes or hundreds of petabytes. So, it is easier and more flexible to maintain code and data when they are in different repositories since they have independent lifecycles. It lets datasets grow and be updated over time without managing software and run software without making changes in data.

Therefore, a simple and effective way to manage the modifications of the datasets is to incorporate a version control system for your data. Below you will find the reasons to consider using data version control in computer vision projects.

First of all, versioning the data allows tracking large files, such as images and videos, along with corresponding files with annotations per frame. So, it is a way to control the changes that were made in a dataset. This means that you can switch between the versions of datasets.

Secondly, many systems are easy to get started with, especially for those who are familiar with Git. They are either built on top of Git or provide Git-like capabilities, utilizing branches and having similar commands.

Thirdly, some data versioning tools provide an environment to track not only the data but whole machine learning experiments, including source code, models, parameters, metrics and so forth.

Last but not least, data versioning systems might be lightweight in cases when they do not require any services or databases. Such systems store data locally in cache or in remote repositories, which is also convenient for team collaborations. For example, space can be saved by creating file links to the images instead of duplicating them and storing copies.

Thus, data version control is a powerful tool for your project and workflow since it brings reliability, scalability and flexibility.

Conclusion

In conclusion, computer vision data engineering in the autonomous driving field is extensive, and it requires thinking through a lot of details and making choices at the initial stages of the project to simplify the following decisions in the workflow. The article covers specific actions to achieve the goal of incorporating data engineering, as well as highlights features that are worth paying attention to.