Map Room envisions a future for the city that is also wise as a result of a shared understanding of lived experience amongst its citizens.
— Jer Thorp, Making Mapping more Human
The St. Louis Map Room was an experiment in collaborative mapping as a source of community engagement. A vacant school’s gymnasium provided a forum where, over the course of a month, 29 groups came in to make 100 square foot maps of their neighborhoods and communities. They were aided by robots and projection, but primarily drew huge maps by hand that speak to their lived experience of their city.
Community groups mapped hand-collected bicycle traffic data, community gardens, churches, magnet schools, and food banks. Groups of students mapped their schools, how they got there, and where it was or wasn’t safe.
Before the project opened, we decided to use robotics to add some fun and to save people work. Map Room’s robots would do the precise but less creative work of laying out the basics of a map, which would fade into the background as humans filled in with their own personal stories.
There were a number of constraints on our design:
We’ll talk through how we addressed those constraints, with a special emphasis on the computer vision we used to make the project a success.
Most drawing machines operate like CNC routers: they have only three degrees of freedom: the x and y axes plus a pen that can be raised and lowered. And since they operate on rails, their stepper motors or rotary encoders can be calibrated to position a pen very accurately — down to a millimeter or smaller. Check out classic pen plotters or the more recent AxiDraw for examples of machines like these.
But if you want to fit the capital-R Robot aesthetic, you can’t beat a fully mobile robot. There are a few kinds to choose from. You could use a two-wheeled differential-drive robot, if you’re willing to solve the parallel parking problem. Alternatively, you could use a holonomic-drive robot if you’re willing to do more math. Either way, you’ve got more factors to worry about. You’ve got another degree of freedom (the rotation of the robot) and you’re not on rails like a pen plotter, so you have to devise some kind of positioning system.
That’s the biggest challenge you face with a mobile robot: knowing where it is. One way to solve it is dead reckoning: calculating where you are based on where you’ve been and how you’re moving. It’s the equivalent of oceanic navigation before clocks could be used at sea. Polynesian navigators, faced with this problem, came up with solutions like tracking bird migrations to gather more information. Our robot probably can’t be as clever.
You could also mount a camera on the robot itself and it could figure out where it is. If you’re building a self-driving car, you’ve got to take the cameras with you.
For our robot, however, we have one big advantage: the area that we need to cover is known and we can build specifically for it. In our case, we wanted to cover a 10’x10’ canvas. Unlike a ship, we can track ourselves externally — like if we had a satellite overhead.
On a practical level, that means we’ve got some more tools at our disposal. Movie studios have been using motion capture technology for decades. The Microsoft Kinect brought some of that into people’s living rooms, and these days room-scale virtual reality tracking is advancing by leaps and bounds as well. HTC just released the Vive Tracker that should be able to do 6-DoF tracking for <$1000.
But what if you’re really on a budget? Let’s see what we can do for $100.
We figured the most cost-effective way to track our robot was visually: put a marker on the robot, track it with a camera, and relay that information to the robot.
With a Raspberry Pi 3 ($39), power supply ($10), case ($7), 16gb SD card ($13), and Raspberry Pi Camera Module v2 ($30), you’ve got all the hardware you need for almost exactly $100. Setting up a Raspberry Pi with OpenCV can be a pain, but I highly recommend pyimagesearch’s Raspbian Jesse + OpenCV 3 set up guide.
In our case, we also needed the tracking system to be wireless — the robots couldn’t have wires, and we had to suspend the camera 14’ in the air in a place we couldn’t easily run Ethernet cables. The Raspberry Pi worked great for this as well.
A word on accuracy: the eventual framerate we were able to get was ~15fps at 960 x 720 px. If the full 10 foot by 10 foot robot area was aligned to 720px, then one pixel represents 0.17 inches (= 10 feet / 720 px). In our case, we could get the map to fill about 500 pixels, so a pixel represented 0.25 inches — pretty good, but definitely jarring. The Maproom’s goal is to focus on human stories, and the underlying map should not distract from these. We had to do better.
We considered a few other options, like tracking custom markers with SIFT/SURF, or tracking known shapes, but ArUco had features that we really liked: quick setup and tracking multiple unique markers out of the box. In theory, it would let us do better positioning than the 0.25 inches we calculated above by using refinement algorithms to locate the center of each marker to accuracies better than a pixel.
Any camera that uses a lens — including the Raspberry Pi Camera — has some distortion in the image it produces. This is a problem if you want to be able to use pixel values as positions, like in the case of our robot.
Fortunately, a lot of work has gone into making it easy to calculate exactly how a camera’s lens distorts the world, and it’s quick do with OpenCV. We wrote a script to use a ChArUco board to perform the calibration quickly and easily (more on ArUco. You just print out the board, hold it in front of the camera until enough frames are recorded, and OpenCV does the rest (see a whole example).
We found from testing that for best results you should use the largest, flattest board you can — a trip to Kinko’s doesn’t hurt to get a large sheet printed. Our final board was 16” x 23”, printed on matte paper and mounted on posterboard. The errors here will propagate down to the rest of the system, so it’s worth getting your camera calibrated really well!
After calibration, you can use ArUco to detect markers in your frame. As we mentioned before, ArUco was designed for augmented reality, so it’s designed to give you coordinate systems for each marker relative to the camera. In theory, that means we know where the markers are in 3d space!
That’s a good idea, but in practice we had some issues…
The jumpy animation above shows the positions and rotations in space for markers that were all lying still on a flat plane. They should have all been (at least) facing the same way, but clearly have a lot of jitter. Some smoothing would have helped, but to control a robot you need fast update times.
We can use the fact that we’re operating on a flat surface to avoid this problem — let’s not do the pose estimation at all! The marker detection routine actually gives us the four (pixel) corners of the marker in the image. It was straightforward to just take the those corners and average them to give the center of the marker. The rotation of the marker can also be found by averaging the “top two” corners of the marker, and drawing a vector between the center and the “top”.
We found these positions to be stable and accurate to about 1/8” from 14’ up (about 3mm at 4m) — that’s really good given the price point of the hardware we’re working with!
Once we’ve figured out where the markers are in the scene, we still have to be able to compare that to what we want to draw. It’d be nice if the camera were mounted perfectly above the center of the canvas, pointed straight down — but it wasn’t in our setup, and probably wouldn’t be in yours either.
We can fix this by applying a perspective transform to the marker positions, which will take the angled view that the camera sees (left above) and make it flat and square (right above) — just like the SVGs we’ll be drawing. This is different than the calibration step above; it’s not the distortion of the lens we’re correcting for, but instead the tilt of the camera relative to the scene.
That transformation is just a 3x3 matrix. OpenCV makes it easy to calculate the matrix from four pairs of points. We wrote a script to build the matrix using ArUco markers, which let to better precision than clicking on corners on the screen since ArUco positions are accurate below a pixel.
At this point, we’ve got an undistorted, perspective transformed, sub-pixel accurate position for a marker! We can send those coordinates back to the robot, and they’ll know exactly where they are.
Every frame, these steps have to happen:
The fact that this can happen 15 times per second on a $30 computer is pretty astounding! If you have a beefier computer, you could probably get way higher framerates out of this — my Macbook Pro was able to hit 30fps, which was the limit of the camera.
There are a whole lot of other tricks we had to apply to get the tracking system working the way we needed:
Thanks for following along! Want the source code? It’s written in Python, lives and lives on Github as O-C-R/maproom-robots/skycam.