paint-brush
15 Must-read Machine Learning Articles for Data Scientistsby@limarc
371 reads
371 reads

15 Must-read Machine Learning Articles for Data Scientists

by Limarc Ambalina5mSeptember 5th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The fields of deep learning and natural language processing are as busy as ever. Despite many industries being hindered by quarantine restrictions in many countries, the machine learning industry continues to move forward. In this article, we will briefly go over some of the biggest recent news in the field of NLP and deep learning. Here are 15 must-read Machine Learning articles for Data Scientists, as well as some must-reads guides, feature articles, tools, resources, and datasets you may want to check out.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - 15 Must-read Machine Learning Articles for Data Scientists
Limarc Ambalina HackerNoon profile picture

As always, the fields of deep learning and natural language processing are as busy as ever. Despite many industries being hindered by the quarantine restrictions in many countries, the machine learning industry continues to move forward.

It seems almost every week, new models are being released, and new startups are showing off AI-powered technologies that will help build a better world. In this article, we will briefly go over some of the biggest recent news in NLP and deep learning, as well as some must-read guides, feature articles, tools, resources, and datasets you may want to check out.

Machine Learning News

1. How Deep Learning Can Keep You Safe with Real-Time Crime Alerts

From Nikunj Aggarwal, the Machine Learning Lead at Citizen, this article gives us a great example of how deep learning is being used to create life-changing (or life-saving) technologies. Citizen is an emergency and safety alert app that warns people of incidents and crimes that have taken place in their area in real-time.

Image from Citizen

The company used a speech-to-text engine and a convolutional neural network to analyze first responder radio frequencies. In doing so, the company was able to scale their app to multiple cities in the United States. This technology could mark a huge change in the police and first responder infrastructure in years to come.

2. The Release of Open AI API

The release of GPT-3 by Open AI was likely the biggest news in the field of NLP this year. However, what many people may have missed is the release of Open AI’s API. The purpose of the API is to give people access to future models developed by the company, including GPT-3. This is big news, as it marks a shift for the company’s normal practices of open-sourcing their models (as they did with GPT-2). In the article, the company explains why they decided to release a commercial product, why they went away from open-source this time around, and how they will control potential misuse of their API.

3. IBM will no longer offer, develop, or research facial recognition technology

In a letter to congress, the CEO of IBM publicly stated that the company would be halting development and service offerings of general-purpose facial recognition technology.

This was a huge step for the company and a big message to the data science community as a whole. IBM’s move to prioritize ethics and safety might have encouraged other large tech companies (including Microsoft) to do the same.

4. Introducing the Model Card Toolkit for Easier Model Transparency Reporting

With the creation of larger and possibly more complicated deep learning models, it becomes increasingly difficult to explain their intended use cases and other information to users downstream. To help solve this problem, researchers at Google have developed the “Model Card Toolkit” to help make model transparency reports easier to create.

5. You Don’t Need College Anymore, Says Google

Do you need a Ph.D. to work in data science? Well, Google’s new certification program may change the game. On July 14th, 2020, Google announced their new professional certification programs in the fields of UX design, project management, and data analysis. 

Whether or not a Google Certificate in data analysis will be enough to land you a job at a data science team is yet to be determined. However, a certification from the largest tech company in the world may end up being worth more than a 4-year degree.

6. The Court of Justice invalidates Decision 2016/1250 on the adequacy of the protection provided by the EU-US Data Protection Shield

In July of 2020, a big decision was made by the Court of Justice of the European Union that may greatly affect data transfer between Europe and the United States. Essentially, the decision was made to invalidate “Decision 2016/1250”, which fostered in a data transfer agreement titled the “EU-U.S. Privacy Shield”.

Instead, those whose data are transferred to a country not in the EU, must be afforded “a level of protection essentially equivalent to that guaranteed within the EU by the GDPR.” This means that if a company like Tik Tok wants to transfer data from users in the EU to be processed on servers in the United States, authorities have the responsibility to prohibit this data transfer if they deem that data privacy and security measures in the United States don’t comply with GDPR standards.

If you want to learn more about this, TechCrunch’s article is a much easier read than the actual legal document.

Machine Learning Guides & Feature Articles

7. Deep Learning Algorithms — The Complete Guide

From Sergios Karagiannakos, the founder of AI Summer, this article serves as a meaty guide to deep learning. It introduces many topics, from the different kinds of neural networks to deep learning baselines in NLP and computer vision.

8. OpenAI's GPT-3 Language Model: A Technical Overview

As mentioned previously, Open AI’s launch of GPT-3 was likely the biggest news in NLP so far this year.

For those of you that don’t know, GPT-3 is a text-generating neural network that has 175 billion parameters, which is incredibly larger than the previous model, GPT-2 (1.5 billion parameters). This guide serves as a great overview of the model, with key takeaways and explanations about the model and data used to train it.

9. Philosophers On GPT-3 (updated with replies by GPT-3)

From Daily Nous, this is an interesting thought piece where 9 philosophers take a deep dive into Open AI’s GPT-3. These thought leaders explore the possible ethical and moral issues, as well as the lingering questions brought forth by the technology.

10. End to End Multiclass Image Classification Using Pytorch and Transfer Learning

From Rahul Agarwal, a data scientist at WalmartLabs, this guide is a step-by-step tutorial on creating a multiclass image classification model. Furthermore, Agarwal explains what transfer learning is and how to use it to improve your own image classification models.

11. The 100-Year History of Self-Driving Cars

This feature piece from OneZero talks about the long history of autonomous vehicles, from the first manual auto-pilot maneuvers on ships to the self-driving cars we see from the likes of Tesla and Google today.

Machine Learning Tools & Resources

12. 5 Fantastic Natural Language Processing Books

Written by KDnuggets Editor Matthew Mayo, this useful guide introduces five books on NLP from his personal library. Unlike other book lists you may find online, Matthew has personally read all of these books and vouches for their quality. Please note that these books are not free, so they require a bit of investment on your part.

13. Tools to Spot Deepfakes and AI-Generated Text

With the rampant spread of misinformation on social media, I was very concerned when I saw this spread reach my own inner circles. As it has become easier and easier to create deepfakes and generate fake articles using AI, I wanted to help combat the malicious use of these technologies. This article introduces a few simple methods and browser plugins that may help you detect both deepfakes and AI-generated text.

14. 30 Largest TensorFlow Datasets for Machine Learning

This listicle is a simple curation of the largest datasets in the TensorFlow library that may prove useful in improving your deep learning models. It introduces the largest audio, video, image, and text datasets on the platform and some of their intended use cases.

15. Machine Learning Developer Hourly Rate Calculator

From Toptal, this handy tool can help you determine the average hourly rate for data scientists based on your location, programming languages, and skills. You can use this calculator to compare the average salary for your position in your own country and other countries to help you evaluate your career and plan your next steps.

We hope that these NLP and deep learning articles and guides helped you catch up with some of the big things happening in machine learning this year. For more reading, please take a look at the top stories below.

Previously published on: https://www.kdnuggets.com/2020/08/must-read-nlp-deep-learning-articles.html