Machine vision captured tag cloud from the city of Tartu, Estonia How Teleport uses machine vision to peek into every street corner I’ve teased my readers before on how . Just like online dating sites are letting AI decide who gets to make babies with whom, we’re letting AI decide who’s moving where. So… AI will reorganize the human population How exactly do we decide what’s a good city these days? We read the news, hear good things from our friends, photos and videos, get the best job offers from there, and quite often, we often just go and the city. see see I’m not going to get into how AI is involved in deciding which news, friends’ posts, videos or photos of cities we get to see, but I am going to shed some light into how we use AI at to you . And since I’m obsessed with convincing , then I’m going to walk through the process with simple code examples to illustrate how easy it is to get started playing with AI with all the tools out there already. already Teleport help see cities more objectively every kid to start coding The following ideas were mainly inspired by MIT’s . StreetScore project Visiting cities through the eyes of AI An obvious benefit of using software to visit places is that Newton ain’t got nothing on software bots and they can accelerate their way through in a heartbeat without worrying about their mass (in case you did not know — is a well known time unit measure under International System of Units). thousands of places heartbeat So let’s grab the borders of Tartu, Estonia (my university town) from the and generate random geographical coordinates to visit. Why 10K you might ask? Because ! Teleport Developers API 10 000 science Such uniform distributions over geographies would of course leave our perception of the city quite biased, as most of the time we would end up in the woods or on top of roofs with little to see. But luckily with the help of we can bias those random locations out from the middle of nowhere and get something more like this: Google Maps API As you can see, the locations (red dots) we plan to visit are more or less on roads and streets where we hope to see more action than watching trees grow. Now that we have our locations, it’s time to get into the bit and get acquainted with the . Essentially I’m calling their API with my coordinates and saving the resulting image locally. seeing Google Street View Image API After some reverse geocoding magic and religious following of Google’s API quotas, I’m actually getting photos returned for each coordinate (as opposed to that you’d get with just uniform distribution). “Sorry, we have no imagery here.” Here’s a peek into of Tartu my code has access to without leaving the room (or my computer?). thousands of images What’s on the image HAL? This is where the AI bit comes in. To keep things simple, I’m running our set of images through publicly available . There are plenty of choices out there like machine vision APIs Microsoft Computer Vision API Google Cloud Vision API IBM Vision Recognition API Cloud Sight API Clarifai etc. but for this example I went with Microsoft’s I’ve always been impressed with the Microsoft Research Organization and to be honest, their API free use terms are the most favorable. They’ve even been kind enough to provide to get you going. Oxford Project. quick start code in Python In essence you just upload an image to their API and get back a line of text describing the scene. Here’s an example of an image from Tartu along with the output text from MS Vision API. Description: “ ” A motorcycle parked in front of a building Compression is intelligence With their API, I mapped 8.5GB of raw image pixel data from Tartu into 255KB of text data (object space). If you’ve ever looked into the relationship of , then perhaps this size reduction speaks volumes. compression and intelligence 35294x Anyhow, now that we finally have natural language descriptions of the scenes, we can start looking into statistics to see if if we can find something that could possibly speak into someones decision to move. 27 of 1528 most frequent terms of machine vision based analysis of 10K images of Tartu, Estonia I almost fell asleep reading the list until I read “motorcycle”. The amount of motorcycles or more specifically their proportion to the population speaks very much into how a city potentially is and that is something that resonates with me personally (I own two motorcycles). motorcycle friendly Surely there are other sources for finding statistics about motorcycles in various cities, but bare in mind that they typically come from different sources for different countries and would require a substantial amount of work to capture and normalize across hundreds of major urban areas. The beauty of Google Street View is how well it covers big cities as a and how well it lends itself to capturing visible in images. The premise is that … single data source any knowledge If You can see it, then AI can see it better If not today, then surely tomorrow. as Paul Graham has eloquently . Advances in AI and more specifically Deep Neural Networks are catching us by surprise every week and if you’re still in doubt about their potential for extracting more information from images than humans can, then consider this list of recent achievements in : “Live in the future, then build what’s missing” said AI beating humans Google’s AI wins fifth and final game against Go genius Lee Sedol Carnegie Mellon Artificial Intelligence beats top poker pros Microsoft AI beats humans at speech recognition etc. Motorcycles of course are just a toy example of something I personally care about, but it’s not hard to imagine extracting any knowledge that humans can see (or perhaps don’t even notice) from the images. There are many clues to objective measures describing hidden in the millions of images captured by Google. quality of life Here’s a quick way of sanity checking if the idea has any merit by checking something we know to be true. Capture another 10K images for Amsterdam and count the occurrence of bicycles in the image tags: Without regard to population density (or actually counting how many bicycles are on each image) at least the general intuition seems to be validated. From counting bicycles and children to mapping out neighborhoods with run down buildings and graffiti, the opportunities to translate this massive source of image data into ideas that are plentiful. shape our moves Getting Off the Streets Certainly not everything we might care about is visible from the streets. Sometimes you have to go off-road. Here’s an example of some work Tanel Pärnamaa did when interning at Teleport where he took a set of public geo tagged photos from Flickr and used machine vision tagging to . Funnily enough he had a lot of sandy golf courses show up when only limiting the signal to the machine vision outputs. identify beaches Beaches as identified by AI in Flickr photos by Tanel Pärnamaa Even more so, with Planet just to orbit, we have another amazing image source that we can let AI loose on. The amount of data out there is growing exponentially and so is our understanding of life quality in cities. launching 88 more satellites Choose your job, choose your city, choose your life! One of our investors has that said “The spread of computers and the internet will put jobs in two categories: people who tell computers what to do, and people who are told by computers what to do.” I’ve seen incumbents in our space spending money on hands-on-the-ground teams to capture something that could be done with . I’m convinced that the efficiencies gained through and crowd-sourcing are giving us a substantial competitive advantage, and I’m eager to see how it all plays out in the long run. code machine learning In the mean time, for those of you more inclined to , here are some example challenges of the approach described above: tell computers what to do how to distribute your agents/probes across geographies while minimizing various biases (introduced by urban area size, population density, etc ) how to normalize captured data across urban areas to facilitate fair comparisons how to avoid double counting objects present in close scenes? how to reduce the effects of weather, season, daytime etc. in said statistics? how to optimize the viewing direction, sampling or combining of images in each coordinate to take advantage of 360 degree views? how to build/train models with a specific goal of extracting quality of life data from images (as general purpose classifiers exemplified above will only get you so far and are meant more as inspiring examples)? And lastly that could be captured from public image data that would correlate with quality of life and could potentially influence your decisions to move somewhere? Come share your thoughts on ! what cool things can you think of twitter Don’t forget to click the heart below or share on social if you liked what you read Silver Keskküla on Twitter Find your best city with Teleport
Share Your Thoughts