Sudoku Solver with OpenCV 3.2 and Go

A couple of weeks ago I had an itch to start learning Go… and OpenCV. I decided to scratch that itch by working on a Sudoku puzzle solver web app powered by OpenCV and Go with a little machine learning to boot. The idea was simple enough:

Upload an image and “parse” the puzzle with OpenCV e.g. find the grid and digits
Use a test set of digits from sample puzzles to train a machine learning model to identify the digits
Solve the puzzle using some constraint propagation algorithms I learned while working through the Udacity AI nanodegree program

Along the way, diving into OpenCV led to cutting my teeth on C++, spending a fair amount of time understanding CGO (the bridge between Go and C/C++) and a few new nuggets of Docker knowledge. Here’s a quick demo of how it ended up:

Read on for how it came together…

“Parsing” Sudoku Images with OpenCV

The current 3.x version of OpenCV exposes a C++ API so I installed the C/C++ Tools plugin for Visual Studio Code and started reading sample OpenCV code. I wanted to be able to detect both “digital puzzle” images such as the one on the left and pictures of puzzles that may have some skew and poor lighting like the example on the right.

Original Sudoku Puzzle Images

I ended up with the following image processing steps:

Convert to Grayscale and Resize — Subsequent steps don’t rely on color and assuming a consistent size made some of the later processing more simple

Extract and “Un-warp” the Grid—The next task was detecting the Sudoku grid in the image. Supporting the “digital puzzle” images only would’ve been significantly easier but the newspaper images required more work. It started with some denoising and adaptive thresholding.

Thresholding and Denoising Applied

These steps prepare for the Canny edge detection algorithm (OpenCV docs) which further reduces the information in the image to just object edges like the grid we’re after.

Canny Edge Detection Algorithm

Now we can use OpenCV’s [findContours](http://docs.opencv.org/3.2.0/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a) algorithm to detect shapes in the image; the [RETR_EXTERNAL](http://docs.opencv.org/3.2.0/d3/dc0/group__imgproc__shape.html#ga819779b9857cc2f8601e6526a3a5bc71) mode is used to limit the search to only extreme outer contours skipping any that lie inside other contours.

Contour Detection (there is a single contour in the image on the left if you look close)

With an array of contours, which are simply lists of X/Y coordinates that “trace” the shape, the next task was to find the largest contour with the help of [contourArea](http://docs.opencv.org/3.2.0/d3/dc0/group__imgproc__shape.html#ga2c759ed9f497d4a618048a2f56dc97f1) and then identify which X/Y coordinates are the corners of the grid by finding the outermost point in the contour within each quadrant of the area covered by the contour.

Detecting Perspective Transform Points for the Largest Contour

These corners are then used to “un-warp” the grid section of the image with OpenCV’s [getPerspectiveTransform](http://docs.opencv.org/3.2.0/da/d54/group__imgproc__transform.html#ga8c1ae0e3589a9d77fffc962c49b22043) and [warpPerspective](http://docs.opencv.org/3.2.0/da/d54/group__imgproc__transform.html#gaf73673a7e8e18ec6963e3774e6a94b87). While not much has changed in the “digital puzzle”, now our newspaper photo and “digital puzzle” image are on more equal footing for subsequent processing.

Perspective Transform Applied to Detected Grid

Eliminate Grid Lines — Before we looking for the digit regions of the image, it’s useful to get rid of the grid lines. A structuring element (I used 1px horizontal and vertical kernels) are passed to [erode](http://docs.opencv.org/3.2.0/d4/d86/group__imgproc__filter.html#gaeb1e0c1033e3f6b891a25d0511362aeb) and [dilate](http://docs.opencv.org/3.2.0/d4/d86/group__imgproc__filter.html#ga4ff0f3318642c4f469d0e11f242f3b6c) in order clean up the grid lines. [findContours](http://docs.opencv.org/3.2.0/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a) is again used but this time the contours are filtered by aspect ratio to just those that approximate horizontal and vertical lines.

Grid lines detected in each image (pink is just the image border)

Sometimes the newspaper lines were still too “curvy” or indistinct to pass as grid lines (in this implementation) as in the example here. These regions are then expanded slightly and subtracted from the image yielding the following result.

Removing Grid Lines and Denoising to Prepare for Digit Detection

Finding Digits — Finally it’s time to detect the digit regions of the image. Once again [findContours](http://docs.opencv.org/3.2.0/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a) is used followed by another aspect-ratio filter inspired by my friend Kevin Kazmierczak’s article. This filter allows us to skip over contours for some of the horizontal artifacts left in the newspaper image on the right.

Detecting Digit Regions of the Image

Machine Learning with OpenCV

Having found digit regions in the image, the next step was to identify what number was present in each digit image. I eyeballed several sample images and put together a simple csv data file of sample sudoku images and the digits they contained as follows:

Then each sample image was fed through the “parsing” logic described earlier to find the digit regions. The center of each digit region was matched to the nearest center point of a 9x9 grid imposed on the image and then assigned a digit value based on the data in the .csv file. Each digit was then resized to a consistent size and combined into an overall “training” image where the top row contains all the “1” images, the second row contains all the “2” images, etc. This becomes our raw data to train a machine learning model.

Digit regions extracted from several sample Sudoku puzzle images and combined together

There are plenty of digit recognition tutorials out there but I found Satya Mallick’s of particular value for it’s clear and detailed explanation. I adapted his approach for this project. To quickly summarize, a Histogram of Gradients Descriptor (see also Satya’s great explanation of the HOG Descriptor) is computed for each digit image. 80% of these descriptors along with the digit label (inferred from the row in our training image) are then fed as our “observations” into OpenCV’s [SVM](http://docs.opencv.org/3.2.0/d1/d2d/classcv_1_1ml_1_1SVM.html) class to train the model. The trained SVM model is then used to predict the appropriate digit for the remaining 20% of the digits.

I built a small C++ CLI tool to invoke training process and ended up with 97.73% accuracy on the albeit limited set of sample puzzles I gathered. The trained SVM model is saved and loaded again later for use in solving novel puzzles.

OpenCV 3.x C++ Integration with Go via CGO

All the work thus far was done with C++ (if you look at the code be gentle, I’m a C++ newbie as of this writing). Invoking this from Go turned out to be trickier than I anticipated, not an easy scenario for my first work in Go.

[cgo](https://golang.org/cmd/cgo/) enables Go to call C code using a special comment called a preamble to specific import of a “C” package (this import should be on a line by itself) like this. Notice that the preamble supports platform-specific values to support cross-compiling the safe codebase on different platforms e.g. darwin (my local setup) and linux (the docker container created to run this app).

Quoting the documentation (emphasis mine):

When the Go tool sees that one or more Go files use the special import “C”, it will look for other non-Go files in the directory and compile them as part of the Go package. Any .c, .s, or .S files will be compiled with the C compiler. Any .cc, .cpp, or .cxx files will be compiled with the C++ compiler.

That seemed simple enough but I lost time due to a couple of the points I emphasized above:

CGO let’s you call C code (not call C++ code) — So what’s the deal with the mention of .cpp files being compiled with the C++ compiler? Well, as a newbie to C/C++ it took me a while to understand that cgo will compile C++ that is referenced from the C code that is being imported. However, you cannot directly import C++ code which means C++ types such as [string](http://www.cplusplus.com/reference/string/string/) and [vector](http://www.cplusplus.com/reference/vector/vector/?kw=vector) have to be converted to their C equivalent types.
CGO compiles files in the same directory ONLY — My original C++ Sudoku puzzle parsing code was organized into src/ and include/ directories and my cgo preamble was trying to include a C header from a subdirectory. That did not work.

I ended up with the following directory structure to get cgo properly compiling and linking my C++ OpenCV code into the imported “C” package

There were a few more notable gotchas:

Debugging with CGO — Delve, the Go debugger is awesome. However, I wasn’t able get it working once I started using cgo. I couldn’t tell if it was actually supported or I just had issues. I found it easier to launch and debug the C++ code separately
Watch Your Pointers!— Early on I was passing back a string from C++ which represented the “parsed” sudoku puzzle and getting strange garbage results intermittently. I ended up allocating C memory beforehand and passing pointers into the C code allowing it to read/write from that allocated memory. Go’s defer and C.free() ensure we free it up when we’re done. See below for an example.

The result of this call is simply an 81 character string identifying the digits that were detected from the image:

Solving the Sudoku Puzzle

Going into how to solve a Sudoku puzzle given the digits is not a primary focus of this article; if that’s what you’re after, checkout Peter Norvig’s approach and explanation. However, I’ll just note that the main technique is constraint propagation, where we essentially use constraints on the problem domain to limit the number of options to search. If we naively conceive of a Sudoku board as a 9x9 grid in which each cell can take a value between 1–9 then we have 9⁸¹ possible board states (over 3 billion). Of course we know there are many constraints which limit the search space:

The known starting digits
Each row must contain all digits 1–9
Each column must contain all digits 1–9
etc.

By applying these constraints we can drastically reduce the possible search space. Once a constraint is applied to narrow the search space we can re-evaluate other constraints to iteratively narrow in on valid values for each cell in the puzzle. Combined with a search algorithm for cases when constraints do not eliminate all possibilities, most puzzles can be solved in a few seconds or less. Check out [sudokuboard.go](https://github.com/jamesandersen/go-sudoku/blob/master/sudokuboard.go) for the Go implementation used in this project.

In addition to the Sudoku solving code, the Go portion of the project contains some unremarkable code for serving a static directory of web content where the UI of the project lives and accepting a multipart file upload to which Sudoku puzzle images are sent. It also uses the NY Times Gzip handler to compress responses.

Docker Deployment

OK, so we’ve got OpenCV code to “parse” a puzzle image (passed as a byte array), we’ve integrated that with a simple Go web app that takes in a puzzle image, hands it off to be parsed by OpenCV and then solves the missing digits. Great! But where to deploy it?

The dependency on OpenCV means we do not have a self-contained binary and rules out something like the Go Standard Environment on Google Cloud Platform. A containerized solution was the next logical choice so off to DockerHub…

Go image? ✔

OpenCV image? Plenty of ‘em!

Go and OpenCV 3.x? ☹

… so I cobbled together my first contribution to DockerHub, an Alpine Linux based image with Go 1.8.3 and OpenCV 3 libraries that are fortunately available in Alpine Linux Edge. There were a few difficulties here such as getting symbolic links setup to the version specific opencv libs but eventually this container, plus the platform specific cgo preamble seen earlier got me successfully building the app inside a docker container.

Having read Kelsey Hightower’s article on tiny Docker containers for Go apps, I was conscious that, while I could run my app from this image, it would be inflated because all the Go tooling was still present. Luckily for me, the Docker team added multi-stage builds in Docker 17.05 since Kelsey wrote his article and now, in a single Dockerfile I can build the app, discard the build image with SDK tools, etc. and copy the Go binary from a previous stage to a clean, lighter weight image! Great work Docker team!

Thanks to folks at Hyper.sh for offering a free tier for dead-simple hosting of a docker container.

Conclusion

If you’ve hung in this long… thanks! I hope there’s been something valuable for you. The solver is by no means foolproof; it still has trouble with some images and either will fail to parse them or parse them incorrectly leading to failure to solve them (because it may have parsed as an invalid puzzle). The goal of course was to ramp up on some new technologies and the project has been valuable from that perspective.

The solver is live (as of this writing) at http://gosudoku.jander.me/ and source code is on GitHub if you’re interested. Drop me a note if you find it useful or have any follow-up questions. Thanks!