Data science is a new and maturing field, with a variety of job functions emerging, from data engineering and data analysis to machine and deep learning. A data scientist must combine scientific, creative and investigative thinking to extract meaning from a range of datasets, and to address the underlying challenge faced by the client.
There is an ever-growing amount of data generated in all areas of life — from retail, transport and finance, to healthcare and medical research.
Increases in available computing power and recent advances in artificial intelligence have propelled data scientists — the people who take the raw data, analyze it, and make it useful and usable — into the spotlight.
Data science has topped the list of 50 best jobs in North America since 2016, based on criteria such as earning potential, reported job satisfaction, and the number of job openings on Glassdoor.
Image from svgsilh.com CC0 1.0
So what does it take to become a data scientist?
For some pointers on the skills for success, I interviewed Ben Chu, who is a Senior Data Scientist at Refinitiv Labs.
Chu has a background in artificial intelligence, particularly in the areas of linguistics, semantics and graphs, and has worked for Refinitiv Labs in Singapore for two years.
Chu started off our interview by saying that data scientists should think like investigators.
You need to be curious and excited by asking ‘why?’. “It’s a bit like being a detective, joining the dots and finding new clues.”
In finance, data scientists extract meaning from a range of datasets to inform clients and guide their key decisions.
The data scientist has to zoom in on the challenge that the client wants to solve, and to pick up on clues in the data they are working with.
From talking to Chu, I learned how important it is to be able to shift focus and consider the context of the investigation.
The perfect analysis isn’t helpful if it doesn’t solve the underlying problem. Sometimes you need to circle back, try a new approach and reframe the question you are trying to answer. At its heart is curiosity. You need to love questions!
Data scientists use a range of tools to manage their workflows, data, annotations and code.
“I have to be very diligent. I need to measure and track my progress so I can back up and try a new direction, reuse previous work, and compare results.
“It’s important to be scientific, take observations, experiment and document well as you go along, so you can reproduce your findings. I need to organize my observations, so I use Notion as my primary tool to keep all my notes, papers, and visualizations in one place.”
Chu emphasized the need to keep records that stretch back across not just his current investigations, but of all previous findings.
“It’s like data science journaling. I keep good reference points and refer back to them to guide my next steps, whenever I encounter a similar scenario.”
Data science isn’t just about having a scientific approach. The job title can be misleading; you don’t have to come from a scientific background, but you do need to be able to think creatively. Often, alternative thinking is key to the way you tackle a challenge.
“I have to switch between scientific thinking to solve problems, and creative thinking to lead me down new and different pathways of exploration.
“Logical, scientific thinking is essential to helping me arrive at my conclusions, but putting on a creative hat is equally important: I use both good and failed examples as clues to observe new patterns. It’s all about ‘coded intelligence’.”
Image from svgsilh.com
You need solid coding skills to be able to pre-process different data sources, using various data processing techniques, to resolve noisy or incomplete data.
You will also need to be able to create a machine learning pipeline, which will require you to know how to build a model, and use tools and frameworks to evaluate and analyze its performance.
Chu uses Python, as do most data scientists, because of the number of excellent packages available to manipulate and model data.
In fact, Glassdoor took a sample of 10,000 job listings for data scientists placed on their site in the first half of 2017, and found that three particular skills — Python, R, and SQL — form the foundation of most job openings in data science.
Ben Chu’s team relies on open source machine learning packages, such as Tensorflow, Pytorch and BERT.
“We use Confluence primarily as a documentation tool; MLFlow, Amazon Sagemaker, Scikit-Learn, Tensorflow, PyTorch and BERT for machine learning; Apache Spark to build speedy data pipelines on large datasets; and Athena as our database to store our processed data.
“We also use Superset to connect the data and to more easily build dashboards to output charts, which makes it more intuitive.”
Chu is now a senior data scientist at Refinitiv Labs, but he wanted to be a musician when he was growing up, and is fascinated by languages. “For my area of work in natural language processing, I need a good understanding of linguistics, particularly semantics and the nuances of language.”
He explains that a data science team needs a range of skills — he and his colleagues have overlapping skills developed from their different backgrounds.
“The skills you need will depend on the domain you work in. For example, I need to have a good understanding of finance.
“For instance, data analytics is being applied to mitigate fraud by building anomaly detection methods to detect fraudulent ‘behaviors’ as irregular patterns in transaction data.
“Data scientists like me need to be well-versed in how to work with various and isolated financial data. It is crucial to know what to combine because without that understanding, I cannot build a successful model.”
It isn’t essential to be a computer scientist or mathematician to get into data science. Nobody has all the expertise in every area. You could come from a background in law or economics or the sciences. It’s all about the way you think.
If you can be flexible and systematic, you will be able to develop familiarity with the specifics of the tools, frameworks and datasets as you use them.
For those keen to develop their data science skills, Chu offers a few practical tips that you can easily adopt despite the disruptions caused by COVID-19.
You can seek out research communities, attend webinars and find training courses online. Once in-person networking is feasible again, Chu recommends that you get active in the data science community.
“Go to Meetups and hackathons, which will help you to build a strong network to discuss your ideas, inspire your research and answer your questions”.
Also, remember that the field of data science is new and still maturing.
There is a variety of different job titles emerging, such as data scientist, data engineer and data analyst, along with machine learning and deep learning engineers. You may find that one role suits your interests and skills better than another.
Tap into your curiosity and creativity, brush up your Python skills and get into data science!
This article appeared originally on Refinitiv Perspectives in early April 2020.