How to unwrap wine labels programmatically
In our application, there is a feature like in Vivino — wine detection by label picture. Under the hood, we use third-party services, Tineye — to detect the best matching label, Google Vision — to read text on it. The latter is necessary to precise the product, as the image search doesn’t count importance of some regions on labels, usually, it’s textual information — vintage and varietal.
However, accuracy goes down due to cylinder distortion.
Especially it’s noticeable in the case of OCR — any text out of the center of the region is almost not readable by Google Vision, meantime a human will easily recognize it. In the article, I’ll describe how to revert cylinder distortion and as result — improve product matching accuracy.
First of all, let’s dig into distortion’s nature.
A rectangular label, being stick to a cylinder, have a specific barrel-like shape (b on the picture above). A curve ABC in that case, with quite a good approximation, — an ellipse, as we see a circle (cylinder cross-section) under some angle. A set of all horizontal lines of the label are related to a set of ellipses on the picture.
The most interesting it’s only required to specify just 6 markers to fully unwrap a label (ABCDEF):
Using them it’s possible to build a mesh, covering cylinder surface.
And using the mesh above, we can transform each cell individually and thus recover the original surface of the label.
The code is available on GitHub https://github.com/Nepherhotep/unwrap_labels
The main advantage of the method — input parameters of the unwrap function are visually detectable (the corners, upper, and bottom lines), which allows us to fully automate the process.
The next part of the article is devoted to markers detection, but the code is not public.
The first step — we convert the image to grayscale.
Then we need to detect label edges. We can use the Sobel transformation for that https://en.wikipedia.org/wiki/Sobel_operator.
The next thing we need to do — detect the longest lines on the image, which are usually the edges of the bottle. In that particular case it’s true, but if we take a picture of a bottle staying among other bottles, it’s not necessarily true.
To detect those lines, let’s use Hough transform https://en.wikipedia.org/wiki/Hough_transform
The idea — we use all the lines goes across the picture, and calculate the pixels’ average (let’s say, use all the lines going from top to bottom). We put all the values into a new coordinate system and build something like a heat map. We need to find two max values on the heatmap — they are edges of the bottle.
It’s depicted on the image below how the left bottle line is converted to a point on the heat map:
It’s a bit harder to deal with ellipses, but it’s possible to use Hough transform to any mathematically described curve. As ellipse is such a case, let’s do that.
First of all, we need to turn the task into a two-dimensional representation. As we know the bottle is symmetric, let’s use the center axis as new “Y” and the left axis as new “X” (see the image below). We will use averages of pixels, collected along a set of ellipses drawn between X and Y axes, as values in a new coordinate system. It’s possible to do, as there is only one way to connect two given points on X and Y. Probably it’s not obvious from the first glance, but quite easy to catch if refer to a parametric form of the ellipse:
x = a * cos(t)
y = b * sin(t)
Similarly, we will find two maximums, which will define top A-B and bottom F-E curves. Now, as we have all the required markers (left, right lines, and top, bottom ellipses), we can apply the algorithm described in the first part of the article and undo cylinder distortion.
What can be improved? Firstly, the algorithm doesn’t count perspective distortion of the ellipses, thus the right and left sides of the label are stretched a bit more than they should. To fix that it’s required to know the angle of the camera, or at least to find the most typical angle for mobile phones (can be done empirically).
Secondly, Hough transform will be unstable in complicated cases — like a bottle staying among other bottles.
Thirdly, if the form of the label is not rectangular (let’s say elliptical), markers will be wrongly detected and the further transform will distort the label even more.
In practice, it’s much more interesting to use a neural network for markers detection, since it can be trained on complicated cases as well, and even alert if the algorithm was not able to detect the label (and keep the label untransformed), but it will be a topic for another article.