12 Mistakes that Data Scientists Make and How to Avoid Them

Data analytics can transform how businesses operate. With companies having tons of data today , data analytics can help companies deliver valuable products and services to customers. Becoming a data scientist isn’t an easy task. It needs a mix of problem solving, structured thinking, coding and various technical skills among others to be truly successful. If you are from a non-technical and non-mathematical background, there’s a good chance a lot of your learning happens through books and video courses. Most of these resources don’t teach you what the industry is looking for in a data scientist. In this article I have discussed some of the top mistakes amateur data scientists make ( I have made some of them myself too ). And we will also look at steps you should take to avoid those pitfalls in your journey. 1. Spending huge time on theory without practical application Many beginners fall into the trap of spending too much time on theory, whether it be math related (linear algebra, statistics, etc.) or machine learning related (algorithms, derivations, etc.). It is good to get a grasp of the theory behind machine learning techniques. But if you don’t apply them, they are only theoretical concepts. When someone starts learning they study lots of books and go through a number of online courses but they rarely get the chance of applying theory into solving practical problems. In the beginning whenever ,I faced a problem and had the chance to apply what I have learnt, I couldn’t remember the half of it. This was all because of giving so much time to theory and very less time to solving practical problems. How to avoid this mistake? It is not a new concept that to get better understanding of what you are learning, there should be a proper balance between theory and practical. Google is the best place to find different datasets for practice. Below are some of the data problems you can practice (ordered from easiest to hardest): Wine Quality Dataset Heights and Weights Dataset Human Activity Dataset Song Year Prediction Dataset Movie Lens Dataset VoxCeleb Dataset Chicago Crime Dataset 2. Coding too many algorithms without learning the prerequisites Straightforward diving into the deep areas of data science is a common practice majority of aspiring data scientists make, resulting in lack of knowledge about basic stuff and ultimately you will face problem in solving practical problems. If you do code an algorithm from scratch, do so with the intention of learning instead of perfecting your implementation. At the start, you really don’t need to code every algorithm from scratch. How to avoid this mistake? You need to clear four basic concepts before diving deep and these four concepts are Linear Algebra, Statistics, Probability and Calculus. Data science is sum of all individual parts. Till the time you don’t have a clear picture about these four concepts, don’t even think of diving deep into the core of data science. Though you can find tons of courses to learn these concepts online, below are some of the resources I have listed you can take help from: Statistics & R Probability & Statistics using Python Calculus Course Introdution to Data Statistical Thinking for Data Science 3. Jumping into Deep End As they say, “Rome wasn’t build in a day.” The same goes for data science too. I understand that you want to build the technology of future , self driving cars, robots and what not! Things like this require techniques such as deep learning and natural language processing. Before going into this typical stuff you must first master the fundamental of machine learning etc. How to avoid this mistake? First, master the techniques and algorithms of “classical” machine learning, which serve as building blocks for advanced topics. It is a common practice that people just practice 2–3 problems and after solving them they begin to think that they have mastered the concepts but this isn’t true. The more you practice the more groomed you become. Below I am inserting a link where you can find some quality 20 machine learning problems, on which you can try your hands on. You can find a tons of problems like this, just give a search on Google. 20 Machine Learning Problems 4. Focusing on Accuracy over Understanding how model works How a predictive model makes a prediction is an very common overlooked part of the data science workflow. Accuracy isn’t always everything. A predictive model which can predict with 95% accuracy is obviously good but if you can’t explain it to the other person, how the model got there, which features led it there, and what your thinking was when building the model, your client will reject it. How to avoid this mistake? The best way to prevent yourself from making this mistake is speaking to people working in the industry. There is no better teacher than experience. You can practice making simpler models and then try explaining them to non-technical people. Then slowly add complexity to your model and keep doing this until even you don’t understand what’s going on beneath your data model. This will teach you when to stop, and why simple models are always given preference in real-life applications. Additional reading link for assistance : Click me 5. Giving Preference to Tools over Business Problem So this is an arguable point. Let’s take an example to better understand this. Imagine you’ve been given a dataset on house prices and you need to predict the value of future real estate. There are over 200 variables, including number of buildings, rooms, number of tenants, family size, size of the courtyard, whether faucets are available, etc. There’s a good chance you might not be aware of what some variables mean. You can still build a model with a good accuracy, but you have no idea why a certain variable was dropped. As it turns out, that variable was a crucial element in a real-world scenario. It’s a calamitous mistake. Having knowledge of tools and libraries is a very good thing but combining that knowledge with the business problem posed by the domain is where a true data scientist steps in. How to avoid this mistake? When you are applying for a data scientist role in a particular industry read up on how companies in that domain are using data science. Search for data sets and problems relating to that industry and try solving them , this will give you a massive boost. Read this excellent write-up on Forbes here 6. Overestimating Value of Academic Degrees Ever since data science became ultra popular, certifications and degrees have cropped up just about everywhere. A strong degree in a related field can definitely boost your chances but it’s neither sufficient nor is it usually the most important factor. I am not saying that getting a degree or certification is an easy feat but one should not solely rely on them. In most cases, what’s taught in an academic setting is simply too different from the machine learning applied in businesses. While working in a real environment you have to cope up with lots of deadlines, technical roadblocks, clients etc , these are just some of the things you will need to overcome to become a good data scientist. Just a certification or degree will not qualify you for it. How to avoid this mistake? Certifications are valuable, but only when you apply that knowledge outside the classroom and put it out in the open. Take relevant internships, even if they are part-time. Reach out to local data scientists on LinkedIn for coffee chats. Always be open for learning. Go out in the real world and try to learn how the industry works. For additional learning you can go to the below link and try out few fun machine learning projects: 8 fun machine learning projects for beginners 7. Thinking that if You don’t code well, You can’t be a Data Scientist Everyone has their own skills and everyone has their special talents. When I started learning Python I struggled a lot in the beginning because I had never ever coded even in my entire life. Back then I didn’t knew about various tools available in the market. I spent a lot of time improving my coding skills. How to avoid this mistake? Gone are the days where coding used to a compulsory part of data analysis. Now, you don’t have to put hours trying to learn code. I would say if you learn coding , it will obviously be a star on your skills. On the other hand if you are facing any difficulties with the coding part, there are nowadays a number of Exploratory Analysis tools available in the market to perform data analysis. If you don’t code well, these tools I have mentioned below doesn’t require you to code explicitly but simple drag & drop clicks does the job. (all of these are free to download) Trifacta Rapid Miner Qlikview Knime Tableau Open Refine Talend H2O Excel/spreadsheet 8. Using too many Technical Terms in your Resume The biggest mistake many applicants make when writing their resume is suffocating it with technical terms. If your resume currently has this problem, rectify it immediately! Your resume should tell the story how and what you can bring to the organisation. When a recruiter looks at your resume, the recruiter wants to understand your background and what all you have accomplished in a neat and summarized manner. If half the page is filled with vague and heavy data science terms without any explanation, your resume might not clear the screening round. How to avoid this mistake? Do not simply list the programming languages or libraries you’ve used. Try to describe where you have used them while solving practical problems. Only list the techniques which you have used to accomplish something and remove other distractions. The simplest way to eliminate resume clutter is to use bullet points. Your resume needs to reflect what potential impact you can add to the business. Setting a resume template is an important task, make a resume master template which you can spin off to different versions for different roles you will be applying for. Having a set template will help — just change the story to relfect your interest in that particular industry. Read this below article on Forbes for further assistance: How to write a Data Science Resume 9. Learning Multiple Tools at Once Because of the different features, uses and unique features each tool offers, people tend to attempt learning all the tools at once. This is a very bad idea, you will end up mastering none of them. Going behind multiple tools will create a lots of confusion and will severely affect your problem solving skills at the beginning stage. How to avoid this mistake? If you are learning a tool just stick to it and master the every aspect of it. If you are learning R, then don’t be tempted by Python in the middle. Stick with R, learn it end-to-end and only then try to incorporate another tool into your skillset. You will learn more with this approach. Read these following articles to get a better understanding: Python & R vs SPSS & SAS SAS vs R vs Python 10. Not Having Structured Thinking while Solving Problems Structured thinking is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, it also helps by identifying areas which require deeper understanding. Without structure, an analyst is like a tourist with out a map. He might understand where he wants to go, but he doesn’t know how to get there. He would not be able to judge which tools and vehicles he would need to reach the desired place. When you are going for a data science interview, you will inevitably be given a case study, prediction problem etc. Because of the pressure filled atmosphere in an interview room and the time constraint, the interviewer looks at how well you structure your thoughts to arrive at a final result. This can be a deal breaker or deal sealer for getting the job. How to avoid this mistake? You can acquire a structured thinking mindset through simple training and and a disciplined approach. To improve your structured thinking follow these four steps: Start small, aim big Tackle the techniques from the top down Use one-pagers early to boost productivity Avoid getting sloppy with your logic. You can go through the following courses for further knowledge: Data Visualization and Communication Managing Data Analysis 11. Not Working Consistently The benefits of continuous learning and growth at a personal and professional level mean you will continue to place a strong importance on learning new skills and sharing your knowledge base which will assist in all areas of life, from creating more meaningful relationships, to better organisation and time management skills. Remember that practicing for 2 hours daily is much better than practicing for 14 hours on weekends. We study for a period of time, then we give it a break for the next 2 months. Trying to get back into the groove of things after that is a nightmare. Most of the earlier concepts are forgotten, notes are lost and it feels like we just wasted the last few months. How to avoid this mistake? Plan how and what you want to study and set deadlines for yourself. For example, if you want to learn a particular concept, set a plan and give yourself a set number of days/weeks to learn that topic and then practice it by competing in hackathons/workshops etc. You have decided to become a data scientist so you should be ready to put in the hours. If you continually keep finding excuses not to study, this might not be the field for you. If you practice continuously it will give you a clarity in decision making, gives you control over future and a sense of personal satisfaction. 12. Not giving Importance to Communication Skills & Running Away from Discussions Some data scientists call this “storytelling”. The important thing here is to communicate insights in a clear, concise, and valid way, so that others in the company can effectively act on those insights. Companies searching for a strong data scientist are looking for someone who can clearly and fluently translate their technical findings to a non-technical team, such as the Marketing or Sales departments. I am yet to come across a course that places a solid emphasis on this. You can learn all the latest techniques, master multiple tools and make the best graphs, but if you cannot explain your analysis to your client, you will fail as a data scientist. Data science is a field where discussions, ideas and brainstorming is of utter importance. You cannot sit in a silo and work — you need to collaborate and understand other data scientists’ perspective. Working as a data scientist almost always means working as a team, including working with engineers, designers, product managers, operations, and more. How to avoid this mistake? Most of the data scientists are generally from computer science background, so I understand that this can sometimes be a tough skill to acquire. But you need to polish your communication skills if you want to succeed as a data scientist. If you’re only marginally familiar with PowerPoint, take a little time to learn Microsoft’s presentation software. By learning to communicate your concepts visually, you have taken an important first step in learning to present them to an audience. The process of putting together your slideshow will also help you organize your thoughts and better understand the work you’re doing. Other thing is , try explaining data science terms to a non-technical person. It will help you to better analyze the problem and find out how well you have worked on it. If you’re working in a small to medium-sized company, find a person in the marketing or sales department and do this exercise with them. It will help you immensely in the long term. Remember Practice is the Key Conclusion These are some of the important lessons I have learned in my career in data science. Most of them I didn’t know at the beginning and I hope they will help you with your career. Data science isn’t easy to learn but if you learn it with a proper plan you will come out as a champion. If you need any guidance or assistance regarding data science, connect with me on Linkedin: https://www.linkedin.com/in/gauravneuer/ Data analytics can transform how businesses operate. With companies having tons of data today , data analytics can help companies deliver valuable products and services to customers. Becoming a data scientist isn’t an easy task. It needs a mix of problem solving, structured thinking, coding and various technical skills among others to be truly successful. If you are from a non-technical and non-mathematical background, there’s a good chance a lot of your learning happens through books and video courses. Most of these resources don’t teach you what the industry is looking for in a data scientist. In this article I have discussed some of the top mistakes amateur data scientists make ( I have made some of them myself too ). And we will also look at steps you should take to avoid those pitfalls in your journey. In this article I have discussed some of the top mistakes amateur data scientists make ( I have made some of them myself too ). And we will also look at steps you should take to avoid those pitfalls in your journey. 1. Spending huge time on theory without practical application Many beginners fall into the trap of spending too much time on theory, whether it be math related (linear algebra, statistics, etc.) or machine learning related (algorithms, derivations, etc.). It is good to get a grasp of the theory behind machine learning techniques. But if you don’t apply them, they are only theoretical concepts. When someone starts learning they study lots of books and go through a number of online courses but they rarely get the chance of applying theory into solving practical problems. In the beginning whenever ,I faced a problem and had the chance to apply what I have learnt, I couldn’t remember the half of it. This was all because of giving so much time to theory and very less time to solving practical problems. How to avoid this mistake? It is not a new concept that to get better understanding of what you are learning, there should be a proper balance between theory and practical. Google is the best place to find different datasets for practice. Below are some of the data problems you can practice (ordered from easiest to hardest): Wine Quality Dataset Heights and Weights Dataset Human Activity Dataset Song Year Prediction Dataset Movie Lens Dataset VoxCeleb Dataset Chicago Crime Dataset Wine Quality Dataset Dataset Heights and Weights Dataset Dataset Human Activity Dataset Dataset Song Year Prediction Dataset Dataset Movie Lens Dataset Dataset VoxCeleb Dataset Dataset Chicago Crime Dataset Dataset 2. Coding too many algorithms without learning the prerequisites Straightforward diving into the deep areas of data science is a common practice majority of aspiring data scientists make, resulting in lack of knowledge about basic stuff and ultimately you will face problem in solving practical problems. If you do code an algorithm from scratch, do so with the intention of learning instead of perfecting your implementation. At the start, you really don’t need to code every algorithm from scratch. How to avoid this mistake? You need to clear four basic concepts before diving deep and these four concepts are Linear Algebra, Statistics, Probability and Calculus. Data science is sum of all individual parts. Till the time you don’t have a clear picture about these four concepts, don’t even think of diving deep into the core of data science. Though you can find tons of courses to learn these concepts online, below are some of the resources I have listed you can take help from: Statistics & R Probability & Statistics using Python Calculus Course Introdution to Data Statistical Thinking for Data Science Statistics & R Statistics & R Probability & Statistics using Python Probability & Statistics using Python Calculus Course Calculus Course Introdution to Data Introdution to Data Statistical Thinking for Data Science Statistical Thinking for Data Science 3. Jumping into Deep End As they say, “Rome wasn’t build in a day.” The same goes for data science too. I understand that you want to build the technology of future , self driving cars, robots and what not! Things like this require techniques such as deep learning and natural language processing. Before going into this typical stuff you must first master the fundamental of machine learning etc. How to avoid this mistake? First, master the techniques and algorithms of “classical” machine learning, which serve as building blocks for advanced topics. It is a common practice that people just practice 2–3 problems and after solving them they begin to think that they have mastered the concepts but this isn’t true. The more you practice the more groomed you become. Below I am inserting a link where you can find some quality 20 machine learning problems, on which you can try your hands on. You can find a tons of problems like this, just give a search on Google. 20 Machine Learning Problems 20 Machine Learning Problems 20 Machine Learning Problems 4. Focusing on Accuracy over Understanding how model works How a predictive model makes a prediction is an very common overlooked part of the data science workflow. Accuracy isn’t always everything. A predictive model which can predict with 95% accuracy is obviously good but if you can’t explain it to the other person, how the model got there, which features led it there, and what your thinking was when building the model, your client will reject it. How to avoid this mistake? The best way to prevent yourself from making this mistake is speaking to people working in the industry. There is no better teacher than experience. You can practice making simpler models and then try explaining them to non-technical people. Then slowly add complexity to your model and keep doing this until even you don’t understand what’s going on beneath your data model. This will teach you when to stop, and why simple models are always given preference in real-life applications. Additional reading link for assistance : Click me Click me 5. Giving Preference to Tools over Business Problem So this is an arguable point. Let’s take an example to better understand this. Imagine you’ve been given a dataset on house prices and you need to predict the value of future real estate. There are over 200 variables, including number of buildings, rooms, number of tenants, family size, size of the courtyard, whether faucets are available, etc. There’s a good chance you might not be aware of what some variables mean. You can still build a model with a good accuracy, but you have no idea why a certain variable was dropped. As it turns out, that variable was a crucial element in a real-world scenario. It’s a calamitous mistake. Having knowledge of tools and libraries is a very good thing but combining that knowledge with the business problem posed by the domain is where a true data scientist steps in. How to avoid this mistake? When you are applying for a data scientist role in a particular industry read up on how companies in that domain are using data science. Search for data sets and problems relating to that industry and try solving them , this will give you a massive boost. Read this excellent write-up on Forbes here here 6. Overestimating Value of Academic Degrees Ever since data science became ultra popular, certifications and degrees have cropped up just about everywhere. A strong degree in a related field can definitely boost your chances but it’s neither sufficient nor is it usually the most important factor. I am not saying that getting a degree or certification is an easy feat but one should not solely rely on them. In most cases, what’s taught in an academic setting is simply too different from the machine learning applied in businesses. While working in a real environment you have to cope up with lots of deadlines, technical roadblocks, clients etc , these are just some of the things you will need to overcome to become a good data scientist. Just a certification or degree will not qualify you for it. How to avoid this mistake? Certifications are valuable, but only when you apply that knowledge outside the classroom and put it out in the open. Take relevant internships, even if they are part-time. Reach out to local data scientists on LinkedIn for coffee chats. Always be open for learning. Go out in the real world and try to learn how the industry works. For additional learning you can go to the below link and try out few fun machine learning projects: 8 fun machine learning projects for beginners 8 fun machine learning projects for beginners 8 fun machine learning projects for beginners 7. Thinking that if You don’t code well, You can’t be a Data Scientist Everyone has their own skills and everyone has their special talents. When I started learning Python I struggled a lot in the beginning because I had never ever coded even in my entire life. Back then I didn’t knew about various tools available in the market. I spent a lot of time improving my coding skills. How to avoid this mistake? Gone are the days where coding used to a compulsory part of data analysis. Now, you don’t have to put hours trying to learn code. I would say if you learn coding , it will obviously be a star on your skills. On the other hand if you are facing any difficulties with the coding part, there are nowadays a number of Exploratory Analysis tools available in the market to perform data analysis. If you don’t code well, these tools I have mentioned below doesn’t require you to code explicitly but simple drag & drop clicks does the job. (all of these are free to download) Trifacta Rapid Miner Qlikview Knime Tableau Open Refine Talend H2O Excel/spreadsheet Trifacta Trifacta Rapid Miner Rapid Miner Qlikview Qlikview Knime Knime Tableau Tableau Open Refine Open Refine Talend Talend H2O H2O Excel/spreadsheet Excel/spreadsheet 8. Using too many Technical Terms in your Resume The biggest mistake many applicants make when writing their resume is suffocating it with technical terms. If your resume currently has this problem, rectify it immediately! Your resume should tell the story how and what you can bring to the organisation. When a recruiter looks at your resume, the recruiter wants to understand your background and what all you have accomplished in a neat and summarized manner. If half the page is filled with vague and heavy data science terms without any explanation, your resume might not clear the screening round. How to avoid this mistake? Do not simply list the programming languages or libraries you’ve used. Try to describe where you have used them while solving practical problems. Only list the techniques which you have used to accomplish something and remove other distractions. The simplest way to eliminate resume clutter is to use bullet points. Your resume needs to reflect what potential impact you can add to the business. Setting a resume template is an important task, make a resume master template which you can spin off to different versions for different roles you will be applying for. Having a set template will help — just change the story to relfect your interest in that particular industry. Read this below article on Forbes for further assistance: How to write a Data Science Resume How to write a Data Science Resume How to write a Data Science Resume 9. Learning Multiple Tools at Once Because of the different features, uses and unique features each tool offers, people tend to attempt learning all the tools at once. This is a very bad idea, you will end up mastering none of them. Going behind multiple tools will create a lots of confusion and will severely affect your problem solving skills at the beginning stage. How to avoid this mistake? If you are learning a tool just stick to it and master the every aspect of it. If you are learning R, then don’t be tempted by Python in the middle. Stick with R, learn it end-to-end and only then try to incorporate another tool into your skillset. You will learn more with this approach. Read these following articles to get a better understanding: Python & R vs SPSS & SAS SAS vs R vs Python Python & R vs SPSS & SAS Python & R vs SPSS & SAS SAS vs R vs Python SAS vs R vs Python 10. Not Having Structured Thinking while Solving Problems Structured thinking is a process of putting a framework to an unstructured problem. Having a structure not only helps an analyst understand the problem at a macro level, it also helps by identifying areas which require deeper understanding. Without structure, an analyst is like a tourist with out a map. He might understand where he wants to go, but he doesn’t know how to get there. He would not be able to judge which tools and vehicles he would need to reach the desired place. When you are going for a data science interview, you will inevitably be given a case study, prediction problem etc. Because of the pressure filled atmosphere in an interview room and the time constraint, the interviewer looks at how well you structure your thoughts to arrive at a final result. This can be a deal breaker or deal sealer for getting the job. How to avoid this mistake? You can acquire a structured thinking mindset through simple training and and a disciplined approach. To improve your structured thinking follow these four steps: Start small, aim big Tackle the techniques from the top down Use one-pagers early to boost productivity Avoid getting sloppy with your logic. Start small, aim big Tackle the techniques from the top down Use one-pagers early to boost productivity Avoid getting sloppy with your logic. You can go through the following courses for further knowledge: Data Visualization and Communication Managing Data Analysis Data Visualization and Communication Data Visualization and Communication Managing Data Analysis Managing Data Analysis 11. Not Working Consistently The benefits of continuous learning and growth at a personal and professional level mean you will continue to place a strong importance on learning new skills and sharing your knowledge base which will assist in all areas of life, from creating more meaningful relationships, to better organisation and time management skills. Remember that practicing for 2 hours daily is much better than practicing for 14 hours on weekends. We study for a period of time, then we give it a break for the next 2 months. Trying to get back into the groove of things after that is a nightmare. Most of the earlier concepts are forgotten, notes are lost and it feels like we just wasted the last few months. Remember that practicing for 2 hours daily is much better than practicing for 14 hours on weekends. How to avoid this mistake? Plan how and what you want to study and set deadlines for yourself. For example, if you want to learn a particular concept, set a plan and give yourself a set number of days/weeks to learn that topic and then practice it by competing in hackathons/workshops etc. You have decided to become a data scientist so you should be ready to put in the hours. If you continually keep finding excuses not to study, this might not be the field for you. If you practice continuously it will give you a clarity in decision making, gives you control over future and a sense of personal satisfaction. 12. Not giving Importance to Communication Skills & Running Away from Discussions Some data scientists call this “storytelling”. The important thing here is to communicate insights in a clear, concise, and valid way, so that others in the company can effectively act on those insights. Companies searching for a strong data scientist are looking for someone who can clearly and fluently translate their technical findings to a non-technical team, such as the Marketing or Sales departments. I am yet to come across a course that places a solid emphasis on this. You can learn all the latest techniques, master multiple tools and make the best graphs, but if you cannot explain your analysis to your client, you will fail as a data scientist. Data science is a field where discussions, ideas and brainstorming is of utter importance. You cannot sit in a silo and work — you need to collaborate and understand other data scientists’ perspective. Working as a data scientist almost always means working as a team, including working with engineers, designers, product managers, operations, and more. How to avoid this mistake? Most of the data scientists are generally from computer science background, so I understand that this can sometimes be a tough skill to acquire. But you need to polish your communication skills if you want to succeed as a data scientist. If you’re only marginally familiar with PowerPoint, take a little time to learn Microsoft’s presentation software. By learning to communicate your concepts visually, you have taken an important first step in learning to present them to an audience. The process of putting together your slideshow will also help you organize your thoughts and better understand the work you’re doing. Other thing is , try explaining data science terms to a non-technical person. It will help you to better analyze the problem and find out how well you have worked on it. If you’re working in a small to medium-sized company, find a person in the marketing or sales department and do this exercise with them. It will help you immensely in the long term. Remember Practice is the Key Remember Practice is the Key Conclusion These are some of the important lessons I have learned in my career in data science. Most of them I didn’t know at the beginning and I hope they will help you with your career. Data science isn’t easy to learn but if you learn it with a proper plan you will come out as a champion. If you need any guidance or assistance regarding data science, connect with me on Linkedin: https://www.linkedin.com/in/gauravneuer/ https://www.linkedin.com/in/gauravneuer/