This headline may seem a bit odd to you. After all, if you’re a data scientist in 2019, you’re already marketable. Since data science has a huge impact on today’s businesses, the demand for DS experts is growing. At the moment I’m writing this, there are 144,527 data science jobs on LinkedIn alone.
But still, it’s important to keep your finger on the pulse of the industry to be aware of the fastest and most efficient data science solutions. To help you out, our data-obsessed CV Compiler team analyzed some vacancies and defined the data science employment trends of 2019.
The following chart represents the skills employers are seeking from data science engineers in 2019:
For this analysis, we looked at 300 Data Science vacancies from StackOverflow, AngelList, and similar websites. Some terms might have been repeated more than once within one job listing.
Note: Bear in mind, this research represents the preferences of the employers, rather than the data science engineers themselves.
Obviously, Data Science is more about fundamental knowledge than frameworks and libraries, yet there are still some trends and technologies worth noting.
Big Data
According to the 2018 Big Data Analytics Market Study, Big Data adoption in enterprises soared from 17% in 2015 to 59% in 2018. Thus the popularity of Big Data tools also grew. If we don’t take Apache Spark and Hadoop into account, (we will talk about the latter in detail in the next section), the most popular ones are MapReduce (36), and Redshift (29).
Hadoop
Despite the popularity of Spark and cloud storage, the 'era' of Hadoop hasn't yet ended. Ergo, some employers still expect candidates to be familiar with Apache Pig (30), HBase (32), and similar technologies. HDFS (20) is still being mentioned in vacancies as well.
Real-time data processing
With the increasing use of various sensors, mobile devices, and IoT (18), companies are aiming to get more insights from real-time data processing. Thus the stream analytics platforms such as Apache Flink (21) are popular among some employers.
Feature Engineering and Hyperparameter Tuning
Preparing data and selecting the model parameters is a key part of any data scientist’s job. The term Data Mining (128) is quite popular among employers. Some employers also pay great attention to Hyperparameter Tuning (21). However, as a data scientist, you need to first pay attention to Feature Engineering. Choosing the best features for your model matters as they determine the success of your model on the earliest stage of its creation.
Data visualization
The ability to process data and extract valuable insights from it is vital. However, Data Visualization (55) is a no less important skill for any data scientist. It’s crucial that you could represent the outcomes of your work in a format, understandable to any team member or a customer. As for the data visualization tools, employers prefer Tableau (54).
General trends
In the vacancies, we encountered such terms as AWS (86), Docker (36), and Kubernetes (24). Hence the general trends in the software development industry are applicable to Data Science field, too.
The technologies in this rating are on par. However, in Data Science, there are some things that are just as important as coding. It’s the ability to glean insights from “data output” such as final data sets and trends, visualization, and telling the story with that data. Also, it’s the ability to present the findings in a manner that is understandable. Know your audience — if they are Ph.D.’s, talk to them in an appropriate manner, but if they’re from the C Suite, they won’t care about programming — only results and ROI.
Carla Gentry,
Data Scientist/Owner at Analytical Solution
LinkedIn | Twitter
The snapshot data is useful to see the current state of the market but it doesn’t represent the trends, so it’s hard to plan for the future based on the snapshot alone. I would say that the usage of R will continue to steadily decline (the same can be said about MATLAB), while the popularity of Python among data scientists will keep rising. Hadoop and Big Data are on the list because the industry has some inertia: Hadoop will disappear (no one seriously invests in it anymore) and big data is no longer a hot trend. Whether one has to invest their time in learning Scala is unclear: Google officially supports Kotlin (also a JVM language), it’s simpler to learn while Scala has a steep learning curve. I’m also skeptical about the future of TensorFlow: academia already switched to PyTorch and academia’s influence is the strongest in data science compared to other industries. (The opinions are mine and might not represent Gartner’s views.)
Andriy Burkov,
Director of Machine Learning at Gartner,
author of the Hundred-Page Machine Learning Book.
LinkedIn
PyTorch is the driving force of reinforcement learning with mathematical operations on CUDA tensors with GPUs. It is also a stronger framework for parallelizing the code natively on multiple GPUs at the same time unlike TensorFlow that requires to wrap each operation to a device. PyTorch also builds dynamic graphs which are efficient for recurrent neural networks. Theano-based TensorFlow produces static charts and is more complicated to learn compared to Torch-based PyTorch. The TensorFlow reflects the larger community of developers and researchers. PyTorch will show more momentum, when it builds machine learning dashboard visualization tools such as TensorBoard. PyTorch is more Pythonic in terms of debugging and data visualization libraries with matplotlib and seaborn. Most of the debugging tools of Python can be leveraged to debug PyTorch as well. TensorFlow comes with its own debugging tool tfdbg.
Dr. Ganapathi Pulipaka,
Chief Data Scientist, Accenture,
winner of Top 50 Tech Leader Awards.
LinkedIn | Twitter
I think of data science “jobs” differently than data science “careers.” Job listings offer insights into specific skills the market needs now but for a career, one of the most important skills I’ve seen is the ability to learn. Data science is a fast moving field and you need to be able to easily pick up new techniques, tools, and domain knowledge if you’re going to succeed over the long term. Do that by challenging yourself and avoid getting too comfortable.
Lon Riesberg,
Founder/curator of Data Elixir,
Ex-NASA.
Twitter | LinkedIn
Data Science is a fast-evolving and complicated industry, where general knowledge matters as well as the experience with particular technologies. Hope this article helps you get valuable insights on what skills of both kinds you need to stay marketable in 2019. Good luck!
This article was brought to you by the team of CV Compiler — online resume enhancement tool for data scientists, machine learning engineers, and other IT professionals. If you need a flawless IT resume, tap here.