Data science came a long way from the early days of (KDD) and conferences. 1980s-90s software engineers handling databases evolved into . Meanwhile pockets of computer scientists in smaller research labs experiments on . The big data meets smart algorithm collided in a , making “ ”. That brings us to a decade later, post-pandemic 2022, asking the question, “ ”. Knowledge Discovery in Databases Very Large Data Bases (VLDB) specialized database engineers in the 2000s machine learning and artificial intelligence Cambrian explosion in the 2010s Data Scientist: The Sexiest Job of the 21st Century Is Data Scientist Still the Sexiest Job of the 21st Century? Why are you writing this article? Pardon the short cut-away, but this article is written in conjunction with the 2022 Noonies Award. The HackerNoon’s 2002 Noonie Awards celebrate the technical writers sharing their best and brightest insights in all the things tech. A Formal introduction: Hi, I’m Liling. By day, I am an applied scientist in Amazon and by after-work, I code open source and write tech articles on natural language process and sometimes articles on gaming pop-culture. It is a joy and honour to be nominated in the (NLP) category and if you have enjoyed by NLP or Machine Translation content that I’ve been sharing, help at Hackernoon Contributor of the Year for Natural Language Processing smash the vote button https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-natural-language-processing To celebrate the nomination, I’m writing up this article in a “Ask Me Anything” questions and answers format. As a tech writer, I love to share the emergent technologies in machine learning and I have a particular soft-spot of language and translation related technologies. To celebrate the nomination, I’m writing up this article in a “Ask Me Anything” questions and answers format. Learn more about my thoughts and opinions on “ ” in the tech industry in the follow sections. what kind of a scientist am I? Back to the “Sexiest Job in the 21st Century” Nowadays, job description for “ ” comes in different forms and it falls broadly under these categories: data scientists Data Scientist Research Scientist Applied Scientist Data Engineer Research Engineer Machine Learning (ML) Engineer If you ask anyone about the difference between the role and responsibilities of the different job titles, you will most probably end up with a vague line that delineated each of them. If you ask anyone about the difference between the role and responsibilities of the different job titles, you will most probably end up with a vague line that delineated each of them. In reality, it is usually a fuzzy overlapping scope of work that differs based on the company’s and team’s role definitions. The major difference usually comes between “Scientist” and “Engineer” roles where the scientist is usually expected to focus more on the data and model quality side of things while the engineer focuses more on the model integrity and service reliability. Q: What data or model quality? This is usually the responsibility of the “ ”. In the industry, this is specific to the different task and applications the team is supports and/or develops. It it similar to the academic researchers building machine learning model but the practicality of whether the final model is usable usually trumps the need to beat the state-of-the-art results in the industry. scientists Data quality tasks usually involves: What open source data can you use to train/improve the model? Who owns internal data sources that you can use to train/improve the model? How to extract, transform, store and load the data to fit the model? How to improve the quality and size of the data? Model quality tasks usually involves: Finding the right algorithm or network architecture to use to solve the task Defining/Refining the evaluation framework use to evaluate the task/application Improving the model performance based on a defined evaluation metric/framework Optimizing the speed and performance tradeoff for the algorithm to make the model usable in production Q: What is model integrity and service reliability? This is usually the responsibility of the “ ”. Reliability is critical to any modern machine learning applications today. It is important to make sure that scientists’ carbon-emitted efforts to produce the best model for the customers/users produces the expected performance in production. engineers A scientist’s “ ” statement is unacceptable in the industry and engineers help to make “ ” a dream come true. it works on my laptop it works, anywhere Model Integrity tasks usually involves: Building and maintaining the framework to automate model training and deployment Making sure features/improvements made in experimental projects are available in production models Incremental improvements to automate experimental setups to reduce/eliminate manual steps in bringing scientists’ model to production. Service reliability tasks usually involves: Setting up alerts and monitoring users’ application usage and if/when it machine learning model fails/break Specifying and limiting users’ access to model to comply with internal/national/regional regulations Making the service accessible to increasing users and load In modern days, sometimes these engineering responsibilities is known as Machine Learning Operations (MLOps), Chip Huyen has a for aspiring ML/Data/Research engineers. good blogpost that gives an overview on MLOps There are many other definitions of what machine learning, data, applied, research scientists/engineers do but the above is from my personal industry experience. Q: Should I go for Scientist or Engineer? It depends! And as discussed earlier, it varies from company to team and everyone should always ask the hiring manager about the expected responsibilities during the job application process. A good scientist should be able do some engineering tasks. Vice versa, a good engineer should be able to build some machine learning models. Personally, as a scientist, these are my advice that I give to aspiring/new scientists: Knowing some backend/frontend engineering helps Know what’s possible, what’s easy, what’s hard for the engineers Learn from engineers (dockers, databases, cloud, apps design/dev) And let engineers learn what you do And a final note that I always try to remind myself, P/S: An engineer might train a better model than a scientist do. Q: Let’s talk practical, is there a difference between Data, Research or Applied Scientist? Roles and responsibility wise, they are similar but in practical terms some companies might have clear demarcation between the different scientists positions, so always as the human resource (HR) personnel or hiring manager if it’s possible to share the “ ” specific to the position you are applying to and . role guidelines especially important to understand the expectations of your role once you joined the company and team Q: Yeah, that’s all nice and good about tech, career, tell me more about the dough ($$$ difference in practical terms) for data, research or applied scientist! I’m personally a “ ” in most cases, but when it comes to “the dough”, and asking friends/seniors in the companies are your best bet to know more about the company and their compensation. practicalist https://www.levels.fyi/ My personal opinion: “Don’t do it for the money” is over-rated. Do it for the love of doing it. I enjoy looking at numbers and the language data, thus NLP. But remember to get paid enough for doing it =) Onwards from the career discussion, now the tech part! I’ve discussed the differences between scientist and engineers in the machine learning field and now I’ll try to answer a pressing question that almost all scientists would ask: Q: I have problem X, which tool / method Y to solve it? This is the usually the worst form of StackOverflow questions as per the “ ” guide but I think it is something that the community should try to answer whenever we can. How to ask a good question My personal opinion: There is no “bad” question or “need more focus” to these practical questions. But it does inevitably sometimes attract malicious product/tech advertising. Here’s my 10-steps approach to answering X problem, Y approach, as a “ ”, … scientist Literature review The more you read, the more tools you have at hand But limit your time to avoid rabbit holes, maybe try “ ” =) Paper-Blitzing and what’s in them (noise, quirks, etc.) Know what are the datasets available Find which evaluation metric is task X usually evaluated on , read that paper Track the oldest relevant citation of the task , use that as your baseline Find the highest cited paper for the task Whenever possible, hunt down the datasets in that highest cited paper and latest shiniest paper (it might not be the standard eval metric for the task) Define your success criteria for the task industrially Try to replicate or reimplement the baseline . Can your engineer productionize it? Communicate your model/libraries to engineers Ask the business/project stakeholder whether it’s sufficient Did baseline meet the success criteria? Build it, test it, break it, repeat! Q: Wait a minute, does that mean that there is no “one true algorithm/tool Y” that I can learn to solve task X? Yes, there isn’t. From personal experience, the tool/model that makes it into your customers’ hand usually depends heavily on the Step 6 to 9 of the approach described above. Q: What’s next in Machine Learning and NLP (that you’re personally excited about)? At the moment, I’m spending my free time learning about 🤗 and not just about how to use the different components of the library but more so in understanding in the machine learning community. Huggingface what features make it a success and what’s the X-factor that made it gained traction And the next thing that I would invest my time into is quantum ML, if I have even more time =) https://developer.nvidia.com/cuquantum-sdk https://www.nature.com/articles/s41467-022-32550-3 https://github.com/XanaduAI/pennylane https://medium.com/xanaduai/training-quantum-neural-networks-with-pennylane-pytorch-and-tensorflow-c669108118cc So long and thank you for the fish! I hope the above Qs and As give you some insights to “ ”. And if there are more burning questions you want to ask, feel free to leave the comment under the post. what kind of a scientist I am Finally, I want to give a huge thanks the HackerNoon community, staffs and sponsors for the Noonie Awards nomination and if you enjoy this article, help at smash the vote button https://www.noonies.tech/2022/programming/2022-hackernoon-contributor-of-the-year-natural-language-processing

What to do When Reviewing Academic Papers

What Kind of Scientist Are You?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Generative Language Flashcards

10 Tips to Help You Become a Digital Marketing Professional Without a Degree

10 Tips For Junior Developers To Succeed in Code

10 Stories To Learn About Internships

10 Steps for Landing a Job at a Startup

10 Reasons To Keep Working From Home Long After Your Town Reopens

Generative Language Flashcards

10 Tips to Help You Become a Digital Marketing Professional Without a Degree

10 Tips For Junior Developers To Succeed in Code

10 Stories To Learn About Internships

10 Steps for Landing a Job at a Startup

10 Reasons To Keep Working From Home Long After Your Town Reopens

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps