paint-brush
How Data Scientists Can Become More Marketable by@andrew-ste
3,003 reads
3,003 reads

How Data Scientists Can Become More Marketable

by Andrew SteJuly 23rd, 2019
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

There are 144,527 data science jobs on LinkedIn alone. Big Data adoption in enterprises soared from 17% in 2015 to 59% in 2018. MapReduce (36), and Redshift (29) are the most popular ones. Feature Engineering and Hyperparameter Tuning (21) are still being mentioned in vacancies as well. Data Visualization (55) is a no less important skill for any data scientist. Hadoop and Big Data are on the list because the popularity of the list has some inertia.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How Data Scientists Can Become More Marketable
Andrew Ste HackerNoon profile picture

This headline may seem a bit odd to you. After all, if you’re a data scientist in 2019, you’re already marketable. Since data science has a huge impact on today’s businesses, the demand for DS experts is growing. At the moment I’m writing this, there are 144,527 data science jobs on LinkedIn alone.

But still, it’s important to keep your finger on the pulse of the industry to be aware of the fastest and most efficient data science solutions. To help you out, our data-obsessed CV Compiler team analyzed some vacancies and defined the data science employment trends of 2019.

The most in-demand data science skills of 2019

The following chart represents the skills employers are seeking from data science engineers in 2019:

For this analysis, we looked at 300 Data Science vacancies from StackOverflow, AngelList, and similar websites. Some terms might have been repeated more than once within one job listing.

Note: Bear in mind, this research represents the preferences of the employers, rather than the data science engineers themselves.

Key takeaways and Data Science trends

Obviously, Data Science is more about fundamental knowledge than frameworks and libraries, yet there are still some trends and technologies worth noting.

Big Data

According to the 2018 Big Data Analytics Market Study, Big Data adoption in enterprises soared from 17% in 2015 to 59% in 2018. Thus the popularity of Big Data tools also grew. If we don’t take Apache Spark and Hadoop into account, (we will talk about the latter in detail in the next section), the most popular ones are MapReduce (36), and Redshift (29).

Hadoop

Despite the popularity of Spark and cloud storage, the 'era' of Hadoop hasn't yet ended. Ergo, some employers still expect candidates to be familiar with Apache Pig (30), HBase (32), and similar technologies. HDFS (20) is still being mentioned in vacancies as well.

Real-time data processing

With the increasing use of various sensors, mobile devices, and IoT (18), companies are aiming to get more insights from real-time data processing. Thus the stream analytics platforms such as Apache Flink (21) are popular among some employers.

Feature Engineering and Hyperparameter Tuning

Preparing data and selecting the model parameters is a key part of any data scientist’s job. The term Data Mining (128) is quite popular among employers. Some employers also pay great attention to Hyperparameter Tuning (21). However, as a data scientist, you need to first pay attention to Feature Engineering. Choosing the best features for your model matters as they determine the success of your model on the earliest stage of its creation.

Data visualization

The ability to process data and extract valuable insights from it is vital. However, Data Visualization (55) is a no less important skill for any data scientist. It’s crucial that you could represent the outcomes of your work in a format, understandable to any team member or a customer. As for the data visualization tools, employers prefer Tableau (54).

General trends

In the vacancies, we encountered such terms as AWS (86), Docker (36), and Kubernetes (24). Hence the general trends in the software development industry are applicable to Data Science field, too.

What experts say

The technologies in this rating are on par. However, in Data Science, there are some things that are just as important as coding. It’s the ability to glean insights from “data output” such as final data sets and trends, visualization, and telling the story with that data. Also, it’s the ability to present the findings in a manner that is understandable. Know your audience — if they are Ph.D.’s, talk to them in an appropriate manner, but if they’re from the C Suite, they won’t care about programming — only results and ROI.

Carla Gentry,
Data Scientist/Owner at 
Analytical Solution 
LinkedIn | Twitter

The snapshot data is useful to see the current state of the market but it doesn’t represent the trends, so it’s hard to plan for the future based on the snapshot alone. I would say that the usage of R will continue to steadily decline (the same can be said about MATLAB), while the popularity of Python among data scientists will keep rising. Hadoop and Big Data are on the list because the industry has some inertia: Hadoop will disappear (no one seriously invests in it anymore) and big data is no longer a hot trend. Whether one has to invest their time in learning Scala is unclear: Google officially supports Kotlin (also a JVM language), it’s simpler to learn while Scala has a steep learning curve. I’m also skeptical about the future of TensorFlow: academia already switched to PyTorch and academia’s influence is the strongest in data science compared to other industries. (The opinions are mine and might not represent Gartner’s views.)

Andriy Burkov,
Director of Machine Learning at Gartner, 
author of the Hundred-Page Machine Learning Book. 
LinkedIn

PyTorch is the driving force of reinforcement learning with mathematical operations on CUDA tensors with GPUs.  It is also a stronger framework for parallelizing the code natively on multiple GPUs at the same time unlike TensorFlow that requires to wrap each operation to a device. PyTorch also builds dynamic graphs which are efficient for recurrent neural networks.  Theano-based TensorFlow produces static charts and is more complicated to learn compared to Torch-based PyTorch. The TensorFlow reflects the larger community of developers and researchers. PyTorch will show more momentum, when it builds machine learning dashboard visualization tools such as TensorBoard.  PyTorch is more Pythonic in terms of debugging and data visualization libraries with matplotlib and seaborn. Most of the debugging tools of Python can be leveraged to debug PyTorch as well. TensorFlow comes with its own debugging tool tfdbg.

Dr. Ganapathi Pulipaka,
Chief Data Scientist, Accenture,
winner of Top 50 Tech Leader Awards
.
LinkedIn | Twitter

I think of data science “jobs” differently than data science “careers.” Job listings offer insights into specific skills the market needs now but for a career, one of the most important skills I’ve seen is the ability to learn. Data science is a fast moving field and you need to be able to easily pick up new techniques, tools, and domain knowledge if you’re going to succeed over the long term. Do that by challenging yourself and avoid getting too comfortable.

Lon Riesberg,
Founder/curator of 
Data Elixir,
Ex-NASA.
Twitter | LinkedIn

Data Science is a fast-evolving and complicated industry, where general knowledge matters as well as the experience with particular technologies. Hope this article helps you get valuable insights on what skills of both kinds you need to stay marketable in 2019. Good luck!

This article was brought to you by the team of CV Compiler — online resume enhancement tool for data scientists, machine learning engineers, and other IT professionals. If you need a flawless IT resume, tap here.