This blog post on Canny Edge Detection is the second in the series about OpenCV and Computer Vision. Other parts: Part 1.
With edge detection, we hope to identify edges, boundaries, or silhouettes of an object or region inside the image. Humans have a natural ability to recognize silhouettes such as famous landmarks, logos, and even people. Computers also have this ability, but maybe not as you and I imagined it. Especially not compared to our favorite police procedural drama series. Everything we need we can already find in OpenCV. All we need is a simple nudge from the developer.
OpenCV has many edge-detection filters already preinstalled inside the library. Most of them are based on the same premise. Edge detection algorithms try to analyze the intensity of an image and the significant changes in the intensity within the image. The resulting image is a duotone image, with non-edge regions being black and edge regions having a white stroke. Depending on the image and the algorithm used, the results can be prone to noise which can be mitigated by blurring and/or thresholding the image during the preprocessing steps.
Some of OpenCV's most commonly used edge-detection algorithms are Canny, Sobel, Scharr, and Laplacian. Let us quickly and simply go through how one of those algorithms works. We'll look into Canny as it is one of the most commonly used, and then we'll try to implement it from scratch, although a pretty simplified version.
Canny edge detection is a complex, multilayered algorithm that detects edges. It was named after its creator John Canny, and it is widely used in Computer Vision, even in computationally intensive scenarios, due to its accuracy and effectiveness. During the whole process, the algorithm takes additional steps that suppress noise and minimize false detections in order to get the best edge-detection results. The complete list of steps involved are:
Denoising using a Gaussian Filter
Gradient calculation
NMS (non-maximum suppression) applied on the edges
Applying the double threshold
Tracking edges
After we explain the steps involved, we'll also implement a rudimentary Canny edge detection algorithm.
If you need to know anything about noise in edge detection algorithms, know this - it is highly, highly problematic. Noise is considered bad because it introduces false edges and reduces the accuracy of existing ones. In order to reduce noise, we smooth the image by applying a blurring algorithm. A blurring algorithm uses kernels. The kernel is a small square matrix (3x3, 5x5, or 7x7) that defines a set of coefficients/weights used in computer vision algorithms. Basically, this kernel is mathematically utilized on each and every pixel in the image.
The math behind the Gaussian blur is not in the scope of this article, but below, you can see the formula of the Gaussian Kernel. We are providing two input properties, the kernel size and sigma. Kernel size determines the size of the square kernel matrix, and sigma controls the standard deviation of the Gaussian distribution. Larger kernel sizes increase the blurring effect but can blur out edges. Sigma controls the spread of the Gaussian filter.
G(x) = (1 / sqrt(2 * pi * sigma^2)) * exp(-((x - mu)^2) / (2 * sigma^2))
In this step, we are calculating the magnitude and direction gradients of the blurred image.
Gradients are helpful as they give us more information about the changes in intensity in different directions by analyzing the rate of change. The most commonly used algorithm to calculate gradients is the Sobel operator, which is a spatial filter that convolves the image with a small kernel in order to approximate the gradients.
What does it mean to convolve the kernel? With convolving, we apply a kernel to an image. The kernel is moved through the entire image, and on each location, a weighted sum of pixel values is computed. We actually used the same process when we reduced noise using the Gaussian filter. We'll show how this works mathematically in the implementation section. The new image generated represents the convoluted result.
With non-maximum suppression, we are refining the detected edges. They should be thin and clear. It is a post-processing step during which we preserve the edges we already have while removing weak edges. Only the local maxima will survive this filtering. The steps are also relatively simple; we are comparing each pixel in the gradient magnitude image. If the current pixel is not the maximum among the three selected pixels, suppress the value and set it to zero. With this algorithm, we will keep the existing edges but also thin edges that are too thick.
Thresholding is a segmentation process in which we classify pixels based on categories defined by predefined threshold values. With thresholding, we can distinguish between strong and weak edges. The process is quite simple in theory; we define two threshold values, those values will separate the strong and weak edges, then we’ll compare each pixel in the gradient image to the thresholds.
Based on the comparison of the threshold, we will classify the pixel as either a weak edge, strong edge, or no edge. What happens then is further analysis, but most commonly, strong edges are kept, weak edges require more investigation, and no edges are discarded.
Edge tracking is the finishing step and aims to transform weak pixels into strong ones in order to create continuous sections with detected edges. The ultimate goal of edge tracking is to improve the completeness and wholeness of the edge map. After these many steps, it is extremely common to have sections of an image that are broken. In order to complete and connect the disconnected section, we’ll check the neighborhood around a pixel for strong pixels. Most commonly, those pixel sections are part of the same edge.
Ok, let us start with the Gaussian filter denoising. We already know what we want. We want to convolve the image with a 2D kernel. Let us first create the kernel:
def generate_gaussian_kernel(kernel_size, sigma):
kernel = np.fromfunction(
lambda x, y: (1 / (2 * np.pi * sigma**2))
* np.exp(
-((x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2)
/ (2 * sigma**2)
),
(kernel_size, kernel_size),
)
kernel /= np.sum(kernel)
return kernel
gauss_kernel = generate_gaussian_kernel(3, 1)
print(gauss_kernel)
blurred_image = cv2.filter2D(img, -1, gauss_kernel)
<class 'numpy.ndarray'>
[[0.07511361 0.1238414 0.07511361]
[0.1238414 0.20417996 0.1238414 ]
[0.07511361 0.1238414 0.07511361]]
The first thing that might be interesting is numpy.fromfunction
, that constructs an array by executing a function over each element. The producing element is an array from the numerical Python library called – numpy.ndarray
. lambda
is the anonymous function (without the function name) that calculates the kernel.
We will play with the kernel size and sigma to produce the best product when we merge all steps together, but now for the sake of learning, we will create a 3×3 bell-curve-shaped kernel with a sigma deviation of 1 (meaning no deviation). Let us take a step back and ask a really important question. How do we actually convolve the image? By calculating the weighted average at a certain pixel. Check this:
[[168 167 169]
[165 163 168]
[170 165 167]]
For the sake of this example, we’ll imagine this is our input matrix. Actually, it is only a part of the input image sliced at a certain position. We are convoluting the matrix by computing the weighted average, that is:
kernel_{0, 0} * matrix_{0, 0} + kernel_{0, 1} * matrix_{0, 1} + kernel_{0, 2} * matrix_{0, 2} + ... + kernel_{w, h} * matrix_{w, h}
If we multiply the kernel above with the slice and sum up the elements, we’ll get:
print(np.sum(slice * gauss_kernel))
166.2624376159791
And if we print out the convoluted image at the same index position, we get:
Blurred
[[167 167 167]
[167 166 167]
[166 166 166]]
The value of the middle pixel is 166, which we actually calculated manually! Amazing! Of course, we won’t do this by hand for every pixel. Thankfully the filter2D
method will do everything that we did manually (and more!). It will actually apply the filter to the image and produce a new one.
gauss_kernel = generate_gaussian_kernel(3, 1.2)
blurred_image = cv2.filter2D(img, -1, gauss_kernel)
The next step is to calculate a gradient. We will convolve the blurred image with two Sobel kernels for the horizontal and vertical direction, and we will produce two gradient values. Then we’ll calculate two new values. gradient_magnitude
is a measure of how much the intensity changes around a pixel and gradient_direction
represents the angle of the gradient at each pixel. We’ll use both of those variables in the algorithm as the magnitude is used to identify edge points, and direction is used to refine the edges itself. We are using asType(np.uint8))
to convert the magnitude and direction values to 8-bit integer values.
We do this because after calculating the Euclidean norm and the arctan function, we most probably have floating-point data. Additionally, the gradient_direction
variable is, by default, returned in radians. That is why we will need to convert it to degrees when we use it.
def gradient_calculation(input_image):
sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
gradient_x = cv2.filter2D(input_image, -1, sobel_x)
gradient_y = cv2.filter2D(input_image, -1, sobel_y)
gradient_magnitude = np.hypot(gradient_x, gradient_y)
gradient_direction = np.rad2deg(np.arctan2(gradient_y, gradient_x))
gradient_magnitude = gradient_magnitude.astype('uint8')
gradient_direction[gradient_direction < 0] += 180
return gradient_magnitude, gradient_direction
Let us continue with the non-maximum suppression. Now we have information about the strength and direction of the intensity change at each pixel. Using non-maximum suppression, we’ll try to thin the edges by preserving only the maximum values. We’ll go through each pixel in the gradient_magnitude
image and check the gradient direction. Based on the direction, we’ll identify the neighboring pixels and compare the values with the neighboring values.
If our pixel has the maximum magnitude, it is considered an edge pixel, and the value is set to 255 (white); otherwise, it is set to 0. The resulting image should contain thin edges.
def non_maximum_suppression(magnitude, direction):
size = magnitude.shape
suppressed = np.zeros(size)
for i in range(1, size[0] - 1):
for j in range(1, size[1] - 1):
if (0 <= direction[i, j] < 22.5) or (157.5 <= direction[i, j] <= 180):
value_to_compare = max(magnitude[i, j - 1], magnitude[i, j + 1])
elif (22.5 <= direction[i, j] < 67.5):
value_to_compare = max(magnitude[i - 1, j - 1], magnitude[i + 1, j + 1])
elif (67.5 <= direction[i, j] < 112.5):
value_to_compare = max(magnitude[i - 1, j], magnitude[i + 1, j])
else:
value_to_compare = max(magnitude[i + 1, j - 1], magnitude[i - 1, j + 1])
if magnitude[i, j] >= value_to_compare:
suppressed[i, j] = magnitude[i, j]
suppressed = np.multiply(suppressed, 255.0 / suppressed.max())
return suppressed
Let us apply the double threshold and edge tracking. After finishing the non-maximum suppression, we have an edge map with edge candidates. We’ll apply the double threshold by setting two threshold values (low and high) in order to distinguish strong and weak edges. We’ll mark pixels in the edge map that are greater than the high values as strong edges and the ones between low and high as weak edges.
We also apply a process called hysteresis or edge tracking, which will link weak edges and strong edges in order to get a more complete picture of the edges. The end result is the edge map that only has strong and connected edges remaining.
def double_threshold_and_edge_tracking(image, low_threshold, high_threshold):
weak = 50
strong = 255
size = image.shape
result = np.zeros(size)
weak_x, weak_y = np.where((image > low_threshold) & (image <= high_threshold))
strong_x, strong_y = np.where(image >= high_threshold)
result[strong_x, strong_y] = strong
result[weak_x, weak_y] = weak
dx = np.array((-1, -1, 0, 1, 1, 1, 0, -1))
dy = np.array((0, 1, 1, 1, 0, -1, -1, -1))
size = image.shape
while len(strong_x):
x = strong_x[0]
y = strong_y[0]
strong_x = np.delete(strong_x, 0)
strong_y = np.delete(strong_y, 0)
for direction in range(len(dx)):
new_x = x + dx[direction]
new_y = y + dy[direction]
if((new_x >= 0 & new_x < size[0] & new_y >= 0 & new_y < size[1]) and (result[new_x, new_y] == weak)):
result[new_x, new_y] = strong
np.append(strong_x, new_x)
np.append(strong_y, new_y)
result[result != strong] = 0
return result
NOTE: Keep in mind if you want to replicate the code we have written until now to change the config variables depending on the picture you provide to the algorithm (when generating the Gaussian kernel and/or thresholding the edge map.
You know, there is a better alternative to all of the work above. We could just use the cv2.Canny()
method that comes with the OpenCV library, which is highly superior to the custom edge detection algorithm we created above. We can provide lots of different arguments, such as the image (of course), the output edge map, threshold values for the hysteresis procedure, aperture size for the Sobel algorithm, and the boolean value for the gradient algorithm. Below, the Canny algorithm produced the following edge detection result only with the image and the two threshold values.
canny_algorithm = cv2.Canny(img,100,200)
The complete code for this edge detection project can be found here.
Also published here.