Machine learning is now everywhere and seems to be able to solve most problems. But running neural network is not an easy task without a proper data science team. So more and more companies are issuing A.I. focused Request For Tender.
One main challenge is: how to measure the accuracy of an algorithm?
More than often we see requierements to one accuracy metric :
Your Machine Learning algorithm needs to have over 90% accuracy. - another artificial intelligence Request For Tender
This article will show that a high score can hide poor business performance.
Cornis is inspecting wind turbines since 2011. By 2019, we have acquired 4 millions of pictures. During the same period, our experts annotated only 6000 high criticity defects.
On average, only 0.1% of the pictures contains a high criticity defect.
The worst defect detection algorithm would be a system predicting no defect. Such a ridiculous system would still be 99.9% accurate on this database (6000 defects are only 0.1% of 4 millions images).
Other metrics can give you better information on the performance. Since high critical defects are what we don't want to miss, we can count how many defects the algorithm detects. We call this metric the true positive rate.
If an algorithm detects well the 6000 defects of our database it has a 100% true positive rate. If it misses 600 defects, it has a 90% true positive rate.
Is this the score you should ask for? Is a perfect 100% true positive rate telling that every detection is a defect? Not at all.
This score only tells you that the solution doesn't miss any defect. But missing very few defects has a cost : the number of false detections.
Let's imagine the dumb algorithm that says that every single image contains a defect. It will well detect 100% of the defects but also 100% of the images without defect...
To judge a machine learning algorithm, you need to check at least four metrics :
Even if these four metrics are mandatory, they may not at all reflect any business value. You still need to use them to compute business KPI.
To expertise a wind turbine, we process the images of 4 faces of the 3 blades. We slice the blade panoramas on small 256x256 pixels images to detect defects. We count on average 150000 images per wind turbine to analyse.
If no defects are on this turbine, a 99% precise algorithm will still detect on average 1500 images.
The main expertise KPI is the time spent per turbine. How much time can we gain by keeping only 1500 images on the turbine? We discovered the hard way that the answer can go from 99% of the expertise time to no time at all.
If all the detections are around the same zone, the expertise can last only some minutes.
If detections are happening about everywhere on the blade, expertise will be longer than without A.I.
In 2019, Cornis announced expertise twice as fast thanks to A.I.
This algorithm is far from 99% false positive performance. But with our tools you can perform quality expertise in half the time you used to.
When you want to assess the quality of a machine learning algorithm don't ask for a magic 99% number. Ask for several evaluations and focus on how it will improve your processes.