Reporting from 22nd November 2016 through January 18 2017
Welcome to issue #17 of my newsletter analysing all aspects of the AI world. Grab your hot beverage of choice ☕ and enjoy the read! A few quick points before we start:
1. I penned two pieces recently: Why go long on artificial intelligence? and 6 areas of AI and machine learning to watch closely. Do have a read and share your views!
2. I will be in San Francisco ✈️️ from the 22nd January through 5th February. Drop me a line if you fancy catching up on anything AI related: investing, research, building or buying companies.
3. We have a terrific lineup for our next London.AI meetup on Thursday March 2nd. Come along to learn about reinforcement learning, networks with memory and formal verification of algorithms from the brilliant Raia Hadsell (Google DeepMind), Vishal Chatrath (Prowler.io) and Denis Ignatovich (Aesthetic Integration).
4. The 2017 edition of my research and applied AI summit in London is slated for July. Shoot me with ideas for speakers and topics you’d like to hear from 👍
🚗 Department of Driverless Cars
After launching their driverless cars on public roads in SF Uber’s operations were promptly shut down. Furthermore, an advocacy agency in New York mandates Uber’s service to be blocked for 50 years.
Meanwhile, Nutonomy has had their robo-taxis navigating 6km worth of streets in Singapore since April 2016. In contrast to purely deep learning-based systems, the car uses formal logic based on hand-crafted rules to prioritise how it drives. While this is interpretable, it will still be important to use an approach like formal verification to prove the correctness and stability of said algorithms.
NVIDIA closed 2016 as the top performing company in the S&P500 thanks to its 225% appreciation in market value. At CES 2017, the company announced their Xavier AI car supercomputer, which packs in an 8 core custom ARM64 CPU, 512-core Volta GPU drawing 30W and reaching 30 trillion operations per second. It also presented their in-car AI co-pilot, which watches front, rear and sides of the vehicle, as well as the driver using face recognition, head tracking, gaze tracking, and lip reading. NVIDIA also signed partnerships with Audi and Mercedes-Benz to ship a Level 4 autonomous cars, Japan’s Zenrin mapping company (adds to Baidu and TomTom), ZF and Bosch for auto supplies. More details to follow at the Detroit Auto Show.
Though it doesn’t look to be working with NVIDIA, Ford presented their autonomous development Fusion Hybrid car at CES too. Their Chief Program Engineer also penned a piece on the vehicle’s development. Furthermore, the Baidu Intelligent Vehicle division and BAIC Motor Corporation, which manufacturers cars in China, announced they would implement Baidu telematics into vehicles in H1 2017 and road test Level 3 vehicles by the end of 2017.
Alphabet spun out their driverless car project as an independent company, Waymo, led by John Krafcik a former President and CEO of Hyundai in the US. The Company is said to design and build all the requisite hardware and software for their autonomous technology in-house. They will launch a fleet of 100 autonomous vans, equipped with radars, eight vision modules and three LiDARs, with the latter’s price point dropped by 90%. This is a big move for Alphabet as it seeks new product lines to generate meaningful revenues from non-advertising driven business models. Is this a further sign that CFO Ruth Porat is keeping moonshot projects on a tighter leash?
Elon announced before Christmas that Tesla’s deep learning-based vision systems were and a week ago that a revised Autopilot would be rolled out for 1000 vehicles with second generation hardware and software (October 2016 onwards).
Meanwhile, Elon’s self-proclaimed nemesis, George Hotz of Comma.ai, announced that his Company would be open sourcing two components of its technology as it reframed its mission to become the “Android of self-driving cars”. Comma.ai released openpilot, a package that provides adaptive cruise control and lane keeping assist system for Hondas and Acuras, and NEO, a hardware kit based on the OnePlus 3 smartphone that can run openpilot. George gave a talk at Udacity on this work. Note the machine learning models are closed-source binary blobs. He talks about how inverse reinforcement learning could be used on their dataset of state/action pairs to learn a self-driving car.
Rodney Brookes, founder and CTO of Rethink Robotics, suggests that self-driving cars might become social outcasts and elicit anti-social behavior of owners. What’s more, he draws attention to the unnecessary media frenzy around ethics of self-driving cars (i.e. Trolley experiment), noting that these situations are hardly ever encountered in the real world, so why should they be a focus point for driverless cars? Solving long-tail perception problems are far more of a roadblock. Put simply, this debate is in his view “pure mental masturbation dressed up as moral philosophy.”
💪 The big guys
Google’s Eric Schmidt opines that we should embrace machine learning, not fear it. He points to examples where ML is helping us solve problems that we can’t on our own, including screening for diabetic retinopathy, a preventable condition that can lead to blindness. Open sourcing leads to democratisation of opportunity, he claims, and ML should not lead cost society more jobs than they create. For more, check out a new piece from the Backchannel on all there is to gain from the coming AI revolution.
True to his word, Mark Zuckerberg completed his challenge for 2016: building a home automation system, Jarvis. Challenges he faced (and thus opportunities for startups) included: inferring context awareness such that the system faithfully completed a request, connectivity and interoperability of hardware/software and the open-endedness of colloquial human conversation.
For an extensive summary of what happened at CES 2017, refer to these notes courtesy of Evercore ISI. The bank also published a deep dive on the star of the show, Amazon Alexa, with an emphasis on the required investment and the platform’s scalable attributes.
Intel join the open sourcing club by releasing BigDL, a distributed deep learning library for Apache Spark, a powerful in-memory cluster computing framework. It is optimised to run on Intel hardware, where it claims orders of magnitude faster out-of-the-box performance vs. TensorFlow. Intel have some catching up to do vs. NVIDIA, which basically own the GPU market.
An emerging battlefield for AI are simulation environments. These are software products (such as games in fact) that can recapitulate the state, physics and actions one can take in the real world. They offer a sandbox in which to train AI systems, which are able to take actions in the environment in order to achieve goals. Several key movements on this topic have come to bear:
Do note that simulation environments like games can be exploited by reinforcement learning agents if they find a glitch in the game that results in them accruing the most reward but not with behaviour you actually want. More in this post from OpenAI. Tl;dr think carefully about how you reward your AI agent depending on the behavior you want it to learn. Or use this as a way of finding bugs in your game!
💻 AI in production
While AI is in the news almost every day, very few companies are running AI systems at scale in production. Simon Chan of Salesforce/PredictionIO shares guidelines on how to cross the chasm from experiments to scalable deployments. Teaser: invest in a central infrastructure for your teams, collect data in a single place and choose relevant evaluation metrics. This area, machine learning infrastructure-as-a-service, is another one where I bet we’ll see M&A as incumbents realise the value of a owning this in-house.
For a more detailed run-through of the same subject, read through this deck by a Google ML Researcher on how to Nail your next ML gig. Great way of understanding the nuts and bolts of what’s required and where Google services currently play.
Are you curious about how Facebook has evolved their use of AI across the News Feed, images, video and live products? This piece runs through the story. Of note, it mentions that FBLearner Flow (the Company’s ML infrastructure — see, it’s important to have one!) is currently used by more than 40 product development teams within Facebook. It won’t come as a surprise that understanding video is their next immediate frontier.
The Economist run a thorough and quality piece on how deep learning has transformed translation, speech recognition and synthesis. It does a great job explaining how these systems work, unlike another recent piece that stated Google’s neural machine translation had “invented its own language to help it translate more effectively.” Ugh. Do remember to pay attention to the details when reading a vulgarised version of a research paper, especially when the former is written by a non-expert. No, the model did not create a new language — it learned internal representations and parameters that could be used for transfer learning.
A Japanese insurance firm, Fukoku Mutual Life Insurance, is said to implement a £1.4m IBM Watson-based system to compute the payouts to policyholders as a function of their medical certificates, histories and procedures. It is expected to save £1m a year and result in the layoffs of 30 employees. For context, the company made £450m in profits for the year ending March 2016.
McKinsey Global Institute published a report, The age of analytics: Competing in a data-driven world, in which they present market research for the impact of AI. Amongst other quotable findings and figures, they identify 120 potential use cases of machine learning in 12 industries. Keep them coming :-)
A team of Canadian and Czech researchers have built an AI agent, DeepStack, that beat professional poker players in heads-up no-limit Texas hold’em, which has 10¹⁶⁰ possible paths of play for each hand. The game is noteworthy because unlike Go or Chess, players do not observe perfect information on the game and each other’s hands. The research paper can be found here.
Meanwhile, Google DeepMind’s AlphaGo has been covertly competing against premier players online, racking up a 60–0 record. One recently defeated grandmaster said, “AlphaGo has completely subverted the control and judgment of us Go players.” We’re yet to reach the zenith of its ability, I’m sure.
On the subject of AI’s playing Atari games, researchers at MIT and Harvard present systematic data on how humans learn Atari games. They show that humans learn orders of magnitude faster than a version of DeepMind’s deep RL AI agent. By experimentally manipulating gameplay, the authors show that human’s rapid learning rate can be explained, in part, by their ability to build a mental model of the game by reading instructions and observing game play of others before them, but not by a prior understanding of the properties of objects in the game.
📚 Policy and governance
The European Parliament’s Committee on Legal Affairs publish a draft report for a Resolution on AI and robotics, stating that both have “become one of the most prominent technological trends of our century.” It recommends designers must: implement kill switches, build privacy by design features and ensure that the decision making of agents are amenable to reconstruction and traceability. The report also recommends the creation of a European Agency for robotics and AI that should “provide the necessary technical, ethical and regulatory expertise to support the relevant public actors.”
IEEE Standard Association, who are responsible for setting and governing many of the technology standards we use today, published version 1 of Ethically aligned design: A vision for prioritizing human well being with AI and autonomous systems. It includes many requirements and goals, the solutions to which mostly remain open questions. Like many similar reports, it calls for “algorithmic traceability…on what computations led to specific results” and “indirect means of validating results and detecting harms”.
Reminder: outgoing Obama administration published two reports with high level recommendations available here via HBR.
The Knight Foundation, Omidyar Network and LinkedIn founder Reid Hoffman decided to fund a new center at Harvard/MIT, Ethics and Governance of Artificial Intelligence Fund, to the tune of $27 million.
The Google Brain team recount major achievements in 2016, focused on their research output, while their London counterpart, DeepMind, recap their own incredible round-up of 2016. There is so much terrific research these days that it’s hard to choose which papers feature, so here are 4 papers that caught my attention and why:
StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. Rutgers, Lehigh, Chinese University of Hong Kong and UNC Charlotte. The tasks of generating photo-realistic images from text descriptions and upscaling the resolution of low quality images to higher ones have been the subject of many papers recently. However, there can be several plausible images that correspond to any given textual description. This problem of modelling multi-modal data and synthesising photo-realistic images are well suited to generative adversarial networks (GANs, a very hot space right now), but GANs produce images that are very small (64x64 pixels) and suffer from a lack of detail. To overcome these issues, the authors use a two-step stacked GAN process (StackGAN). First, the Stage-I GAN is tasked with generating a low resolution 64x64 image from a target textual description by sketching rough shapes and basic colors of the object and painting the background. Second, the authors take this coarse and defect-laden low resolution image and run it through a “Stage-II” GAN (also conditioned on the target textual description) that must only focus on drawing details rectifying defects to create a high resolution image. In this way, the Stage-II GAN learns to add in the visually-represented textual information that the Stage-I GAN left out. The results are state-of-the-art. Have a look!
A compositional object-based approach to learning physical dynamics, MIT. In this paper, Joshua Tenenbaum’s lab considers the task of reasoning in the physical world — a cornerstone problem for learning, perception, planning, inference and understanding in AI. In particular, AI agents should exhibit generalisable reasoning over the properties of physical objects, how they relate to one another and how they influence future dynamics of a complex system. Here, the authors present the Neural Physics Engine (NPE), a hybrid model that combines symbolic reasoning and neural networks. The NPE architecture creates object-based state representations that can be used to predict the future state of a system involving multiple moving objects. This is achieved by conditioning the next immediate future timestep as a function composition of the pairwise interactions between one object and other neighboring objects in the scene. In experiments involving worlds of moving balls and obstacles using the matter-js physics engine, the authors show that the NPE model consistently outperforms in its ability to predict future states of the world 50 timesteps in the future. Their comparisons are drawn against the popular LSTM architecture or the NPE without the pairwise combination layer (“NP”) that relates one object’s future as a function of others around it. Moreover, the NPE’s performance continues to improve with training, exhibits little divergence for predictions further into the future, generalises much better and scales to complex dynamics and world configurations (see Figure 3). Each of these feature cannot be recapitulated as well (or at all) by either NP or LSTM models. This is exciting foundational work that inches AI agents closer to solving real world problems.
Generating focussed molecule libraries for drug discovery with recurrent neural networks. University of Münster, AstraZeneca and Shanghai University. A perennial challenge facing the pharmaceutical industry is that of identifying, experimenting, optimising and clinically validating new drugs from a search space made up of 10⁶⁰ synthetically accessible drug-like molecules. Computational approaches are needed because in vitro high-throughput screening approaches in the lab can only test circa 10⁶ molecules, which barely makes a dent in this broader search space. This study has two main contributions towards computational structure generation and optimisation in a single model. First, the authors train a recurrent neural network (RNN) model with three stacked LSTM layers on 1.4 million molecules represented in SMILES format (a string of characters instead of Lewis structures). This model learns molecular grammar and can output valid (but unfocused) chemical structures when sampled from. In order to obtain novel active drug molecules that are focused against a particular target, the authors then re-fit the general molecule pre-trained RNN on a small dataset of known active molecules for the target. At each iteration of this transfer learning cycle (“epoch”) on the small dataset, the model is sampled to generate novel active molecules against the target. Target prediction models based on gradient boosting trees were used to validate whether generated molecules were active or not. Results show that sampling from early epochs generates molecules that are closely related to the training samples while later epochs yield new chemotypes or scaffold ideas. When trained on 1000 active molecules against Staphylococcus, the model can retrieve 14% of 6051 test molecules. Scaling down the training set to just 50 molecules (1% of the data), the model can still recover 2.5% of the test set — 21.6x better performance compared to what the general model without re-fitting to active molecules of interest is capable of. The authors suggest future work in framing molecule generation as a reinforcement learning problem. Here, the pre-trained molecule generator can be seen as the policy and the score each generated molecule receives from the target prediction model could be the reward function.
The Predictron: End-to-end learning and planning. Google DeepMind. Reinforcement learning (RL) comes in two broad flavors: model-based (where we understand how an environment works and how to transition between states) and model-free (where we don’t). Model-based RL is made from two subproblems: a) learning the model and 2) using this model to evaluate and select among possible strategies to achieve a goal, i.e. planning. Here, the authors present a new architecture, the predictron, which combines learning and planning into one end-to-end training procedure (instead of separately, which is more common), using raw pixel inputs to output accurate value prediction for possible actions. Taking a video game as an example, the fact that the model is abstract means that when it finds an optimal plan to advance its goal (e.g. predict the future score given an action in the game), this plan will also correspond to an optimal plan for the underlying game. In fact, the model need not use the same state space representation of the game than we humans do to achieve optimal predictions. In experiments where the predictron is pitted against both feedforward and RNN model-free architectures, the authors show that the predictron achieves more accurate predictions of value in RL environments such as predicting whether the bottom-right corner of a maze is connected to the top-left corner.
Nautilus run a brilliant piece on the Walter Pitts, one of the central characters in the history of neural networks.
Andrew Ng of Baidu delivered a fantastic lecture at NIPS 2016, a version of which you can watch here. This pieces summarises Andrew’s main points, including his categorisation framework for AI (a. general deep learning, b. sequence models, c. computer vision and d. other, i.e. RL/unsupervised learning/GANs etc.) and a 5-step method for building better systems. No equations were harmed in this video :-)
Piotr Mirowski of Google DeepMind’s delivered a thorough talk, Deep Learning and Playing with Sequences, at a recent London Machine Learning meetup. He explores applications in language modelling and control in 3D gaming environments.
Many have written NIPS summary reports — I found this one to be rather digestible.
Interest is growing on the design implications of AI systems — here is a piece on how eBay navigated their experience creating shopping bots.
Following on from the famed business canvas, here is the machine learning canvas, a framework to connect the dots between data collection, machine learning, and value creation.
Microsoft released their MAchine Reading COmprehension Dataset, a reading comprehension dataset of 100,000 questions and answers.
CB Insights publish the results of their call for 100 AI startups. Another infographic to decorate that office of yours :-)
63 companies (48 in US, 12 in EU and 3 in Asia) raised $585m from 157 investors. Median deal size was $3.90m ($4.78m in US vs. $1m in EU) at a pre-money valuation of $19m.
8 companies were acquired, including:
Meanwhile, Lily Robotics, which sold $34m in pre-orders for a camera hooked up to a quadcopter drone that would autonomously take pictures and videos of its user, went out of business. It was founded in 2013, employed 99 FTE and raised $15m in equity to date from Spark Capital, SV Angel, Winklevoss and Dorm Room Fund. Several others like are still (somewhat) flying….
Anything else catch your eye? Just hit reply or hit me on Twitter.