This blog post on Canny Edge Detection is the second in the series about OpenCV and Computer Vision. Other parts: . Part 1 With edge detection, we hope to identify edges, boundaries, or silhouettes of an object or region inside the image. Humans have a natural ability to recognize silhouettes such as famous landmarks, logos, and even people. Computers also have this ability, but maybe not as you and I imagined it. Especially not compared to our favorite police procedural drama series. Everything we need we can already find in OpenCV. All we need is a simple nudge from the developer. OpenCV has many edge-detection filters already preinstalled inside the library. Most of them are based on the same premise. Edge detection algorithms try to analyze the intensity of an image and the significant changes in the intensity within the image. The resulting image is a duotone image, with non-edge regions being black and edge regions having a white stroke. Depending on the image and the algorithm used, the results can be prone to noise which can be mitigated by blurring and/or thresholding the image during the preprocessing steps. Some of OpenCV's most commonly used edge-detection algorithms are Canny, Sobel, Scharr, and Laplacian. Let us quickly and simply go through how one of those algorithms works. We'll look into Canny as it is one of the most commonly used, and then we'll try to implement it from scratch, although a . pretty simplified version Canny Edge Detection Canny edge detection is a complex, multilayered algorithm that detects edges. It was named after its creator , and it is widely used in Computer Vision, even in computationally intensive scenarios, due to its accuracy and effectiveness. During the whole process, the algorithm takes additional steps that suppress noise and minimize false detections in order to get the best edge-detection results. The complete list of steps involved are: John Canny Denoising using a Gaussian Filter Gradient calculation NMS (non-maximum suppression) applied on the edges Applying the double threshold Tracking edges After we explain the steps involved, we'll also implement a rudimentary Canny edge detection algorithm. Noise reduction using a Gaussian Filter If you need to know anything about noise in edge detection algorithms, know this - it is highly, highly problematic. Noise is considered bad because it introduces false edges and reduces the accuracy of existing ones. In order to reduce noise, we smooth the image by applying a blurring algorithm. A blurring algorithm uses . The kernel is a small square matrix (3x3, 5x5, or 7x7) that defines a set of coefficients/weights used in computer vision algorithms. Basically, this kernel is mathematically utilized on each and every pixel in the image. kernels The math behind the Gaussian blur is not in the scope of this article, but below, you can see the formula of the Gaussian Kernel. We are providing two input properties, the kernel size and sigma. Kernel size determines the size of the square kernel matrix, and sigma controls the standard deviation of the Gaussian distribution. Larger kernel sizes increase the blurring effect but can blur out edges. Sigma controls the spread of the Gaussian filter. G(x) = (1 / sqrt(2 * pi * sigma^2)) * exp(-((x - mu)^2) / (2 * sigma^2)) Gradient calculation In this step, we are calculating the magnitude and direction gradients of the blurred image. Gradients are helpful as they give us more information about the changes in intensity in different directions by analyzing the rate of change. The most commonly used algorithm to calculate gradients is the operator, which is a spatial filter that convolves the image with a small kernel in order to approximate the gradients. Sobel ? With convolving, we apply a kernel to an image. The kernel is moved through the entire image, and on each location, a weighted sum of pixel values is computed. We actually used the same process when we reduced noise using the Gaussian filter. We'll show how this works mathematically in the implementation section. The new image generated represents the convoluted result. What does it mean to convolve the kernel Non-maximum suppression With non-maximum suppression, we are refining the detected edges. They should be thin and clear. It is a post-processing step during which we preserve the edges we already have while removing weak edges. Only the local maxima will survive this filtering. The steps are also relatively simple; we are comparing each pixel in the gradient magnitude image. If the current pixel is not the maximum among the three selected pixels, suppress the value and set it to zero. With this algorithm, we will keep the existing edges but also thin edges that are too thick. Double thresholding Thresholding is a segmentation process in which we classify pixels based on categories defined by predefined threshold values. With thresholding, we can distinguish between strong and weak edges. The process is quite simple in theory; we define two threshold values, those values will separate the strong and weak edges, then we’ll compare each pixel in the gradient image to the thresholds. Based on the comparison of the threshold, we will classify the pixel as either a weak edge, strong edge, or no edge. What happens then is further analysis, but most commonly, strong edges are kept, weak edges require more investigation, and no edges are discarded. Edge Tracking Edge tracking is the finishing step and aims to transform weak pixels into strong ones in order to create continuous sections with detected edges. The ultimate goal of edge tracking is to improve the completeness and wholeness of the edge map. After these many steps, it is extremely common to have sections of an image that are broken. In order to complete and connect the disconnected section, we’ll check the neighborhood around a pixel for strong pixels. Most commonly, those pixel sections are part of the same edge. Implementation Ok, let us start with the . We already know what we want. We want to the image with a 2D kernel. Let us first create the kernel: Gaussian filter denoising convolve def generate_gaussian_kernel(kernel_size, sigma): kernel = np.fromfunction( lambda x, y: (1 / (2 * np.pi * sigma**2)) * np.exp( -((x - (kernel_size - 1) / 2) ** 2 + (y - (kernel_size - 1) / 2) ** 2) / (2 * sigma**2) ), (kernel_size, kernel_size), ) kernel /= np.sum(kernel) return kernel gauss_kernel = generate_gaussian_kernel(3, 1) print(gauss_kernel) blurred_image = cv2.filter2D(img, -1, gauss_kernel) <class 'numpy.ndarray'> [[0.07511361 0.1238414 0.07511361] [0.1238414 0.20417996 0.1238414 ] [0.07511361 0.1238414 0.07511361]] The first thing that might be interesting is , that constructs an array by executing a function over each element. The producing element is an array from the numerical Python library called – . is the anonymous function (without the function name) that calculates the kernel. numpy.fromfunction numpy.ndarray lambda , but now for the sake of learning, we will create a 3×3 bell-curve-shaped kernel with a sigma deviation of 1 (meaning no deviation). Let us take a step back and ask a really important question. How do we actually convolve the image? By calculating the weighted average at a certain pixel. Check this: We will play with the kernel size and sigma to produce the best product when we merge all steps together [[168 167 169] [165 163 168] [170 165 167]] For the sake of this example, we’ll imagine this is our input matrix. Actually, it is only a part of the input image sliced at a certain position. We are convoluting the matrix by computing the weighted average, that is: kernel_{0, 0} * matrix_{0, 0} + kernel_{0, 1} * matrix_{0, 1} + kernel_{0, 2} * matrix_{0, 2} + ... + kernel_{w, h} * matrix_{w, h} If we multiply the kernel above with the slice and sum up the elements, we’ll get: print(np.sum(slice * gauss_kernel)) 166.2624376159791 And if we print out the convoluted image at the same index position, we get: Blurred [[167 167 167] [167 166 167] [166 166 166]] The value of the middle pixel is 166, which we actually calculated manually! Amazing! Of course, we won’t do this by hand for every pixel. Thankfully the method will do everything that we did manually (and more!). It will actually apply the filter to the image and produce a new one. filter2D gauss_kernel = generate_gaussian_kernel(3, 1.2) blurred_image = cv2.filter2D(img, -1, gauss_kernel) The next step is to . We will convolve the blurred image with two Sobel kernels for the horizontal and vertical direction, and we will produce two gradient values. Then we’ll calculate new values. is a measure of how much the intensity changes around a pixel and represents the angle of the gradient at each pixel. We’ll use both of those variables in the algorithm as the magnitude is used to identify edge points, and direction is used to refine the edges itself. We are using to convert the magnitude and direction values to 8-bit integer values. calculate a gradient two gradient_magnitude gradient_direction asType(np.uint8)) We do this because after calculating the Euclidean norm and the arctan function, we most probably have floating-point data. Additionally, the variable is, by default, returned in radians. That is why we will need to convert it to degrees when we use it. gradient_direction def gradient_calculation(input_image): sobel_x = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]) sobel_y = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]]) gradient_x = cv2.filter2D(input_image, -1, sobel_x) gradient_y = cv2.filter2D(input_image, -1, sobel_y) gradient_magnitude = np.hypot(gradient_x, gradient_y) gradient_direction = np.rad2deg(np.arctan2(gradient_y, gradient_x)) gradient_magnitude = gradient_magnitude.astype('uint8') gradient_direction[gradient_direction < 0] += 180 return gradient_magnitude, gradient_direction Let us continue with the . Now we have information about the strength and direction of the intensity change at each pixel. Using non-maximum suppression, we’ll try to thin the edges by preserving only the maximum values. We’ll go through each pixel in the image and check the gradient direction. Based on the direction, we’ll identify the neighboring pixels and compare the values with the neighboring values. non-maximum suppression gradient_magnitude If our pixel has the maximum magnitude, it is considered an edge pixel, and the value is set to 255 (white); otherwise, it is set to 0. The resulting image should contain thin edges. def non_maximum_suppression(magnitude, direction): size = magnitude.shape suppressed = np.zeros(size) for i in range(1, size[0] - 1): for j in range(1, size[1] - 1): if (0 <= direction[i, j] < 22.5) or (157.5 <= direction[i, j] <= 180): value_to_compare = max(magnitude[i, j - 1], magnitude[i, j + 1]) elif (22.5 <= direction[i, j] < 67.5): value_to_compare = max(magnitude[i - 1, j - 1], magnitude[i + 1, j + 1]) elif (67.5 <= direction[i, j] < 112.5): value_to_compare = max(magnitude[i - 1, j], magnitude[i + 1, j]) else: value_to_compare = max(magnitude[i + 1, j - 1], magnitude[i - 1, j + 1]) if magnitude[i, j] >= value_to_compare: suppressed[i, j] = magnitude[i, j] suppressed = np.multiply(suppressed, 255.0 / suppressed.max()) return suppressed Let us apply the and . After finishing the non-maximum suppression, we have an edge map with edge candidates. We’ll apply the double threshold by setting two threshold values (low and high) in order to distinguish strong and weak edges. We’ll mark pixels in the edge map that are greater than the high values as strong edges and the ones between low and high as weak edges. double threshold edge tracking We also apply a process called hysteresis or edge tracking, which will link weak edges and strong edges in order to get a more complete picture of the edges. The end result is the edge map that only has strong and connected edges remaining. def double_threshold_and_edge_tracking(image, low_threshold, high_threshold): weak = 50 strong = 255 size = image.shape result = np.zeros(size) weak_x, weak_y = np.where((image > low_threshold) & (image <= high_threshold)) strong_x, strong_y = np.where(image >= high_threshold) result[strong_x, strong_y] = strong result[weak_x, weak_y] = weak dx = np.array((-1, -1, 0, 1, 1, 1, 0, -1)) dy = np.array((0, 1, 1, 1, 0, -1, -1, -1)) size = image.shape while len(strong_x): x = strong_x[0] y = strong_y[0] strong_x = np.delete(strong_x, 0) strong_y = np.delete(strong_y, 0) for direction in range(len(dx)): new_x = x + dx[direction] new_y = y + dy[direction] if((new_x >= 0 & new_x < size[0] & new_y >= 0 & new_y < size[1]) and (result[new_x, new_y] == weak)): result[new_x, new_y] = strong np.append(strong_x, new_x) np.append(strong_y, new_y) result[result != strong] = 0 return result Keep in mind if you want to replicate the code we have written until now to change the config variables depending on the picture you provide to the algorithm (when generating the Gaussian kernel and/or thresholding the edge map. NOTE: Conclusion You know, there is a better alternative to all of the work above. We could just use the method that comes with the OpenCV library, which is highly superior to the custom edge detection algorithm we created above. We can provide lots of different arguments, such as the image (of course), the output edge map, threshold values for the hysteresis procedure, aperture size for the Sobel algorithm, and the boolean value for the gradient algorithm. Below, the Canny algorithm produced the following edge detection result only with the image and the two threshold values. cv2.Canny() canny_algorithm = cv2.Canny(img,100,200) The complete code for this edge detection project can be found . here Resources https://github.com/StefanPitur/Edge-detection—Canny-detector https://towardsdatascience.com/canny-edge-detection-step-by-step-in-python-computer-vision-b49c3a2d8123 https://docs.opencv.org/3.4/da/d22/tutorial_py_canny.html Also published . here