How Teleport uses machine vision to peek into every street corner
I’ve teased my readers before on how AI will reorganize the human population. Just like online dating sites are letting AI decide who gets to make babies with whom, we’re letting AI decide who’s moving where. So…
How exactly do we decide what’s a good city these days?
We read the news, hear good things from our friends, see photos and videos, get the best job offers from there, and quite often, we often just go and see the city.
I’m not going to get into how AI is already involved in deciding which news, friends’ posts, videos or photos of cities we get to see, but I am going to shed some light into how we use AI at Teleport to help you see cities more objectively. And since I’m obsessed with convincing every kid to start coding, then I’m going to walk through the process with simple code examples to illustrate how easy it is to get started playing with AI with all the tools out there already.
The following ideas were mainly inspired by MIT’s StreetScore project.
Visiting cities through the eyes of AI
An obvious benefit of using software to visit places is that Newton ain’t got nothing on software bots and they can accelerate their way through thousands of places in a heartbeat without worrying about their mass (in case you did not know — heartbeat is a well known time unit measure under International System of Units).
So let’s grab the borders of Tartu, Estonia (my university town) from the Teleport Developers API and generate 10 000 random geographical coordinates to visit. Why 10K you might ask? Because science!
Such uniform distributions over geographies would of course leave our perception of the city quite biased, as most of the time we would end up in the woods or on top of roofs with little to see. But luckily with the help of Google Maps API we can bias those random locations out from the middle of nowhere and get something more like this:
As you can see, the locations (red dots) we plan to visit are more or less on roads and streets where we hope to see more action than watching trees grow.
Now that we have our locations, it’s time to get into the seeing bit and get acquainted with the Google Street View Image API. Essentially I’m calling their API with my coordinates and saving the resulting image locally.
After some reverse geocoding magic and religious following of Google’s API quotas, I’m actually getting photos returned for each coordinate (as opposed to “Sorry, we have no imagery here.” that you’d get with just uniform distribution).
Here’s a peek into thousands of images of Tartu my code has access to without leaving the room (or my computer?).
What’s on the image HAL?
This is where the AI bit comes in. To keep things simple, I’m running our set of images through publicly available machine vision APIs. There are plenty of choices out there like
- Microsoft Computer Vision API
- Google Cloud Vision API
- IBM Vision Recognition API
- Cloud Sight API
but for this example I went with Microsoft’s Oxford Project. I’ve always been impressed with the Microsoft Research Organization and to be honest, their API free use terms are the most favorable. They’ve even been kind enough to provide quick start code in Python to get you going.
In essence you just upload an image to their API and get back a line of text describing the scene. Here’s an example of an image from Tartu along with the output text from MS Vision API.
Description: “A motorcycle parked in front of a building”
Compression is intelligence
With their API, I mapped 8.5GB of raw image pixel data from Tartu into 255KB of text data (object space). If you’ve ever looked into the relationship of compression and intelligence, then perhaps this 35294x size reduction speaks volumes.
Anyhow, now that we finally have natural language descriptions of the scenes, we can start looking into statistics to see if if we can find something that could possibly speak into someones decision to move.
I almost fell asleep reading the list until I read “motorcycle”. The amount of motorcycles or more specifically their proportion to the population speaks very much into how motorcycle friendly a city potentially is and that is something that resonates with me personally (I own two motorcycles).
Surely there are other sources for finding statistics about motorcycles in various cities, but bare in mind that they typically come from different sources for different countries and would require a substantial amount of work to capture and normalize across hundreds of major urban areas. The beauty of Google Street View is how well it covers big cities as a single data source and how well it lends itself to capturing any knowledge visible in images. The premise is that …
If You can see it, then AI can see it better
If not today, then surely tomorrow. “Live in the future, then build what’s missing” as Paul Graham has eloquently said. Advances in AI and more specifically Deep Neural Networks are catching us by surprise every week and if you’re still in doubt about their potential for extracting more information from images than humans can, then consider this list of recent achievements in AI beating humans:
- Google’s AI wins fifth and final game against Go genius Lee Sedol
- Carnegie Mellon Artificial Intelligence beats top poker pros
- Microsoft AI beats humans at speech recognition
Motorcycles of course are just a toy example of something I personally care about, but it’s not hard to imagine extracting any knowledge that humans can see (or perhaps don’t even notice) from the images. There are many clues to objective measures describing quality of life hidden in the millions of images captured by Google.
Here’s a quick way of sanity checking if the idea has any merit by checking something we know to be true. Capture another 10K images for Amsterdam and count the occurrence of bicycles in the image tags:
Without regard to population density (or actually counting how many bicycles are on each image) at least the general intuition seems to be validated.
From counting bicycles and children to mapping out neighborhoods with run down buildings and graffiti, the opportunities to translate this massive source of image data into ideas that shape our moves are plentiful.
Getting Off the Streets
Certainly not everything we might care about is visible from the streets. Sometimes you have to go off-road. Here’s an example of some work Tanel Pärnamaa did when interning at Teleport where he took a set of public geo tagged photos from Flickr and used machine vision tagging to identify beaches. Funnily enough he had a lot of sandy golf courses show up when only limiting the signal to the machine vision outputs.
Even more so, with Planet just launching 88 more satellites to orbit, we have another amazing image source that we can let AI loose on. The amount of data out there is growing exponentially and so is our understanding of life quality in cities.
Choose your job, choose your city, choose your life!
One of our investors has said that “The spread of computers and the internet will put jobs in two categories: people who tell computers what to do, and people who are told by computers what to do.”
I’ve seen incumbents in our space spending money on hands-on-the-ground teams to capture something that could be done with code. I’m convinced that the efficiencies gained through machine learning and crowd-sourcing are giving us a substantial competitive advantage, and I’m eager to see how it all plays out in the long run.
In the mean time, for those of you more inclined to tell computers what to do, here are some example challenges of the approach described above:
- how to distribute your agents/probes across geographies while minimizing various biases (introduced by urban area size, population density, etc )
- how to normalize captured data across urban areas to facilitate fair comparisons
- how to avoid double counting objects present in close scenes?
- how to reduce the effects of weather, season, daytime etc. in said statistics?
- how to optimize the viewing direction, sampling or combining of images in each coordinate to take advantage of 360 degree views?
- how to build/train models with a specific goal of extracting quality of life data from images (as general purpose classifiers exemplified above will only get you so far and are meant more as inspiring examples)?
And lastly what cool things can you think of that could be captured from public image data that would correlate with quality of life and could potentially influence your decisions to move somewhere? Come share your thoughts on twitter!
Don’t forget to click the heart below or share on social if you liked what you read