If you are thinking of starting a career in data science, you probably agree that things are a bit confusing in this field! What is a Data Scientist anyway? What’s the difference between a Data Analyst and a Data Scientist? What does a Machine Learning Engineer do? What about a Data Engineer, Business Intelligence (BI) Engineer, and Machine Learning (ML) Researcher…?
In this post, we’ll take inventory of different roles in data science, explain what they are and the differences between them. We’ll also establish an “ideal profile” for each one. This is important for career satisfaction and job search success — if you apply for a role that you’re a good fit for, you’ll have better chances of getting the job; if you do something you enjoy, you’ll love doing it rather than feeling like escaping everyday!
Let’s take a look at data science roles. We’ll expand a bit to cover all the roles that are available for candidates with data skills, i.e. data careers. Broadly speaking, we can break data roles into two types: business or engineering oriented. The difference is that business-oriented roles require a mix of technical and business skills such as communication and presentation; while engineering oriented roles are mainly focused on modeling and software engineering skills.
On the other hand, some traditional roles have been there for a long time, while other roles just came into existence within the last several years or are still emerging. Let’s look at each role in more detail.
Data Analyst/ Data Scientist
At the core, Data Analysts and Data Scientists are the same since they do the same thing — getting value from data. The value can be in different forms: for Data Analysts, the value means insights, while for Data Scientists, it’s about product development intelligence, on top of insights.
Data analysts analyze data to derive insights and help inform business decisions, e.g. what is driving increased site traffic, or what are the top reasons for users to leave the site? On the other hand, data scientists are more concerned about using machine learning and A/B testing to power and improve products. They may be interested in questions such as “will a larger sized button increase click-through-rate?” and “which customers are likely to cancel their subscriptions?”
Data scientists focus on looking forwards, i.e. to make predictions while data analysts concentrate more on looking backwards like analyzing historical data. Data scientists should be more experienced and are able to tackle business problems with a scientific approach which involves framing the business question, coming up with a hypothesis, then designing and conducting experiments to test the hypothesis, and finally making the conclusion (which is essentially research skills, and that’s why hard science PhD’s are sometimes the preferred candidates for data scientist roles). While on the other hand, data analysts are supposed to gather, clean, analyze data and communicate the findings using report or data visualization techniques.
The above is the general difference between the two roles, but it may not always be the case since data science is still new and is far from being standardized. You can sometimes have data scientists doing basic analysis work and data analysts performing ML modeling. Regardless of the titles, these two career profiles are definitely the most important ones that are being sought after by employers when it comes to analytic roles in data science, and therefore should be our targets when job searching and thinking about best fit (same for other roles below).
It’s also important to point out that here we are referring to the general data scientist who deals with both statistical modeling, A/B testing, machine learning, plus data wrangling and data visualization. For the ML focused data scientist, we would actually categorize it the same as a ML Researcher/ Scientist, as explained below.
Data Engineer
We’ve discussed quite a bit about data scientists, but in reality, data scientists are not able to contribute without the help from data engineers. Why? Since data engineers build data pipelines that bring in the data! Think about oil refineries idling since there’s no crude oil coming in because oil pipelines are not constructed yet…Let’s say as an ad-tech company we have data coming from various internal and external sources in real-time; we have ad delivery data from ad servers, campaign and client data from our internal database, and there is also campaign performance data from a 3rd party provider and our internal logs… In order to build a real-time campaign analytics dashboard and to do further analysis and modeling, we’ll need all the data aggregated to the right level. On top of that, we’ll need to build a data warehouse so that our querying won’t affect the production server’s performance… This is what data engineers can help us with. As you can see, this is basically software engineering for data.
ML/ DL/ AI Researcher/ Scientist, ML /DL / AI Engineer
An ML Researcher is actually the same as an ML-focused Data Scientist. Unlike general Data Scientists who are “full-stack” and handle the whole spectrum of concerns in data science, the ML-focused Data Scientists would focus on ML modeling, and/ or the research and development of new machine learning algorithms. On the other hand, an ML Engineer is more concerned about the productionizing of the machine learning model. Think of a recommender ML model that’s built using a public dataset. After fine tuning the model, we’ve achieved great performance results, but the model is still not helpful since it’s just a piece of software sitting in our computer. To make it useful, we need to deploy the model into a production environment, say our e-commerce website, so that it can make real time recommendations for users, and thus help us increase revenue. Deploying machine learning models into production is an engineering concern that is different from building the model, it involves different types of engineering work such as integrating an ML model into a software system, optimizing the model for performance and scalability, monitoring the ML system, and re-training it with new data. Of course, there’s always the modeling part, i.e. experimenting and building machine learning models using various ML libraries, plus implementing ML algorithms to fit business needs.
The difference between a researcher/ scientist and an engineer is the “deploying” part, i.e. whether you’ll be responsible for putting your ML model into production. If yes, then we are talking about the above-mentioned engineering concerns, and the role would be an engineer, otherwise, it’s a research role.
Business Analyst (Various Functions)
The business analyst we have here is not the traditional IT Business Analyst (BA). Traditional BA’s elicit, document business requirements and act as liaison between business and technology. Instead, we are using the title of Business Analyst as an umbrella title to cover all those analyst roles with a business nature (non-technical) and that require significant data skills. Since data is ubiquitous these days, almost all analyst roles require some sort of data skills. As a result, business analyst roles are great job search targets for data-savvy candidates with domain expertise.
The best way to find out these roles would be to use keywords on a job search engine. For example, on Indeed.com, if you enter “analyst sql” as the keywords, you’ll find many different titles such as Performance Analyst, Healthcare Data Analyst and Demand Planning Analyst. These are different business analyst roles that data-savvy candidates can pursue.
BI Analyst, BI Engineer/ Developer
On the other hand, we also have the traditional Business Intelligence (BI) Analyst and Business Intelligence Engineer roles. Generally speaking, when we talk about BI, we are referring to data analysis and reporting in a “big corporate” setting using a “well defined BI infrastructure”, which refers to the various enterprise software systems (ERP, CRM…) together with the connecting and reporting BI tools on top of them; “Big corporate” — because traditionally it’s been the big enterprises who had the financial resources to build and maintain these BI systems.
BI Analysts are quite similar to Data Analysts since they both analyze and report on data. Generally speaking, they don’t do predictive modeling. The difference would be that BI Analysts work with larger corporations in a structured environment (with BI systems), while data analysts can be anywhere and do not necessarily work with an existing BI infrastructure.
When it comes to BI Engineers/ Developers vs. BI Analysts, it is exactly like Data Engineers vs. Data Scientists since BI Engineers build the reporting tools that BI analysts can rely on to make the analysis needed by the business. Therefore, Data Engineers can be viewed as the latest version of the BI Engineers/ Developers role, and the latter one can be a good fit for the former one thanks to a similar skillset.
Data/ ML Product Manager
As we explained above, data is now omnipresent. It is no wonder that products these days also rely on data science, in particular, machine learning. For products that are machine learning centric or heavily dependent on data science, data-savvy product managers would be best suited to support them. Candidates with ML expertise and prior experience in product management would then have an upper hand in this type of roles.
Ideal Profiles
We now clearly understand the major roles in data science, but what does an ideal candidate’s skillset look like for each of them? To illustrate in an intuitive way, I created spider plots below using Matplotlib to visualize the ideal profiles. Since this is largely done based on my intuition, we’ll scrape and analyze job posting data from Indeed in future iterations to validate the profile configurations. (The project code can be found here).
There you have it, the data science roles and its corresponding ideal profiles! With a solid understanding of the responsibilities of and difference between the different roles, you’ll be able to identify the career path that you’re passionate about; the ideal profiles can then not only be used for deciding the best fit target role, but also can be the roadmap for resume tailoring and personal branding to make your profile relevant.
Best of luck with your career and job search in data!
— — — — — — — — — — — — — — — —
Bio: George is the Founder of Datakademy.com where data enthusiasts acquire data skills in weeks. As a seasoned-business-professional-turned data scientist, George possesses the unique ability to turn complicated theories into easy to understand concepts. As a passionate data science mentor, he has coached thousands of students around the world on various data science topics including programming, data wrangling, statistics, and machine learning. As a data career expert boasting substantial business background and first-hand job search experience, George has helped numerous students secure job offers from iconic firms including Facebook, BMW, Amazon, Morgan Stanley, Farmers Insurance, Deloitte, Symantec, and many more!
LinkedIn: https://www.linkedin.com/in/georgeliu2/