Editor @Hackernoon by day, VR Gamer and Anime Binger by night
In this machine learning news roundup, we will go over some of the biggest news from 2019 that went viral or made an impact in various fields of AI. Furthermore, we will briefly cover interesting AI applications and games released in 2019 that you can try today, as well as a few open dataset resources for machine learning projects.
2019 was an eventful year for Tesla, but the company had its fair share of mishaps. Most notably, in May of 2019, a tragic accident involving a Tesla Model 3 vehicle ended in the death of the driver. The accident occurred while the car’s autopilot function was engaged. The Tesla slammed into a stationary truck, failing to make any evasive maneuvers. This incident led to doubts about the widespread use of autonomous vehicles and their safety on public roadways.
Waymo is another large player in the autonomous vehicle industry who made headlines last year with their own self-driving car. Technically under Google’s parent company, Alphabet, Waymo sent an email to their ride-hailing app users. This email informed customers that their next Waymo trip might be completely autonomous, without a human driver behind the wheel.
One of the biggest impacts in the world of Natural Language Processing (NLP) was the release of GPT2 1.5B in November of 2019. A text-generating neural network from Open AI, GPT2 made headlines around the world due to its amazing ability to generate natural-sounding text. Some writers have even been able to create entire articles using GPT2, garnering the attention of numerous machine learning influencers and well-known scientists.
Open AI had released previous versions of the neural network in the past, but GPT2 1.5B is the strongest iteration yet.
In this article Open AI explains their 5 major findings:
People find output from GPT2 convincing
The GPT2 neural network can be fine-tuned for misuse
Detecting synthetic text is challenging
There has been no strong evidence of misuse so far
Deepfakes were one of the biggest machine learning topics of 2019. The unprecedented advancements in deepfakes has led to widespread misuse and public fear of the technology. Furthermore, to understand and prepare for all the threats posed by the technology, the US Intelligence Committee held an open hearing on deepfakes and AI in June of 2019.
This article summarizes the most important points raised by each speaker, the potential dangers of deepfakes, as well as solutions and countermeasures.
Synthetic voices and audio are emerging industries that made leaps and bounds last year. Replica Studios is a synthetic voice company that generated a buzz in 2019, attracting the attention of data scientists, celebrities, and game development studios interested in using their software. Part of this virality was due to an impressive proof-of-concept video showcasing the synthetic voices of Sundar Pichai (Google CEO), Jeff Bezos (Amazon CEO), Arnold Schwarzenegger, Kevin Hart, Morgan Freeman, David Attenborough, Snoop Dogg, Ellen Degeneres, and even Geralt of Rivia (The Witcher).
Impressively, Replica Studios is able to make a synthetic copy of any voice using just a few minutes of speech recordings. In an interview, Replica CEO Shreyas Nivas said the technology was at a point where “Synthetic voices are indistinguishable from real voices and can rival human performances.”
Access to training data is one of the blockers slowing the pace of AI progress today. With deep learning especially, many models require not thousands, but millions of data instances for training. As a result, many data scientists and students turn to dataset aggregators like Kaggle and rely on open data provided by the community. To help improve access to open data, Google released a search engine solely for publishing and downloading datasets.
While Google Dataset Search was still in beta in 2019, Google announced on January 23rd that they have indexed nearly 25 million datasets and the search engine is officially out of beta.
Interesting AI Applications and Resources Released in 2019
Talk To Transformer - A user-friendly implementation of Open AI’s GPT2 1.5B that anyone can use. Simply type in a custom prompt, a heading for an article, or the first lyrics of a song and see what the text-generating neural network comes up with.
Google Dataset Search - As mentioned in article #6 above, this is the free-to-use dataset search engine by Google. You can both search for open datasets and learn how to get your own resources crawled by the search engine.
AI Dungeon 2 - A text adventure game that generates unique storylines with each decision you make. Powered by GPT2, this game can literally go anywhere and no two stories are ever the same. Check out an example of how this works here.
Ultimate Dataset Aggregator - This dataset aggregator from Lionbridge AI includes hundreds of open datasets spanning dozens of use cases and subjects, including computer vision, parallel text, life sciences, finance, and more. This page is constantly updated as new datasets are curated.
AI is one of the world’s fastest growing industries, and there is surely more big machine learning news to come in 2020. I hope one of these AI articles sparked your interest. For more machine learning news and resources for open datasets, please subscribe to my Hacker Noon posts below and don't forget to follow me on Twitter.