Is your model not performing well? Try digging into your data. Instead of getting marginal improvements in performance by searching for state-of-the-art models, drastically improve your model’s accuracy by improving the quality of your data. Since most data scientists are adapting off-the-shelf algorithms to specific business applications, one of the most difficult challenges that data scientists face today is creating a continuous workflow that consistently feeds high-quality training data into their algorithms. At the same time, your model is learning and you want to be able to leverage this intelligent model to label the rest of your data set. Building the infrastructure to do annotation that integrates with your model and managing the workflow is the most challenging part of machine learning. Iteration => Accuracy & Consistency The axiom of can be masked in training. Even when fed random noise, such as random labels or unstructured pixels, certain models are capable of overtraining to the point of attaining 0% training error ( ). This is because recent high-capacity models like deep neural networks can memorize even massive data sets. While these models do not commit errors during training, when tested, they perform no better than random guessing. garbage in garbage out Understanding Deep Learning Requires Rethinking Generalization Therefore, iteration and rigorous QA/QC processes are essential to a proper data labeling workflow. “Quality evaluation methods can be classified in three main families: (i) automatic, (ii) by direct inspection of the job provider and (iii) methods using the crowd itself as evaluator” ( ). Since, in most cases, automated evaluation without human input is either impossible or guarantees minimal quality, we will discuss how to implement QA/QC methods of the latter categories to help improve the confidence in the quality of your training data. Worker Ranking Determination in Crowdsourcing Platforms using Aggregation Functions Test questions Direct inspection Consensus Test questions and direct inspection are QA/QC methods that fit into category (ii) where the job provider, or data scientist, is directly responsible for evaluating quality. Test questions is a standard technique amongst companies. It refers to a set of data that is correctly labeled by the data scientist and then distributed randomly amongst labelers to test their accuracy. Direct inspection is the process of visually inspecting your labeled data to gauge accuracy. Visual screening is a basic functionality that everyone should have to preprocess data and post-label review for accuracy. In the article, Pete Warden recommends randomly browsing through your data. This basic practice can reveal valuable information about your data set, such as “unbalanced number of examples in different categories, corrupted data (for example PNGs labeled with JPG file extensions), incorrect labels, or just surprising combinations.” For more practical tips on improving your data quality, read his article . While most open source tools do not provide this essential feature, Labelbox is a repository of labeled data where you can visually browse and manage your data in one place. Why You Need To Improve Your Training Data, And How To Do It, here While the QA/QC methods of category (ii) are extremely useful, they have two inherent drawbacks. First, they are inherently unscalable since the resources of the job provider, or data scientist, to evaluate the accuracy of crowdsourced labels is finite. Second, in order to perform these methods, the correct answers must already be known. Consensus, on the other hand, is both inherently scalable and useful when the correct answers are unknown. Consensus requires multiple different annotators to provide labels for the same piece of data. With that information, consensus computes Intersection Over Union (IOU) to average out idiosyncrasies across labelers and get better attenuation of the signal. In other words, the answers to the same question are compared to determine the rate of agreement. High agreement is indicative of a high-quality data set, while low agreement typically points to poor data quality, but can also be indicative of ambiguous examples. Labelbox offers a built-in consensus tool so you can monitor your quality metrics in real-time. Read more about how the Labelbox Consensus tool works . here Consensus Diminishing Marginal Returns Google published a study that showed that even when you think you have enough data, adding more can make your model perform even better ( ). And yet, the answer is more complicated than more is always better. The Unreasonable Effectiveness of Data The core question to ask is, not whether you have enough data, but whether you have hit the efficient frontier where the marginal costs of labeling exceed the marginal gains in model performance. To visualize this, plot the model’s performance over time on held-out evaluation data. For example, start with 1000 samples to train your model and evaluate it on 200 held out samples to measure your starting accuracy. Then collect another 1000 samples and repeat the experiment with the second set. The model is expected to do better with 2000 examples because it is learning to see natural variations in the data and filter out idiosyncrasies while better attenuating to signal. Workflow Transparency It is common practice to use a labeling service where you outsource the data and get labeled data in return. However, if you are outsourcing your data labeling, but have no way of measuring the quality of the labeling service, you are essentially gambling with your investment. Outsourced labeling services can be a good go-to for basic object classification, like labeling cars or dresses. If you need to generate a large labeling task force on a specific subject matter, there are different Business Process Outsourcing (BPO) firms that can accommodate particular specialized knowledge categories. Through , you can connect with our partner BPOs, monitor the quality of your outsourced data labeling services, and create and manage your own workflow all on a single unified platform. Labelbox To Sum it Up, Clean it Up Your model is only as good as your training data. Now that you know how to ensure that your training data is consistent enough, accurately labeled, and sufficient in size, go clean it up! Visit to explore Labelbox for free or about an enterprise solution for your business. www.labelbox.com speak to one of our team members Originally published at medium.com on November 12, 2018.

It all Boils Down to the Training Data

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI is More Accessible Than You Know

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

AI is More Accessible Than You Know

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps