For those looking to build a career in Data Science from scratch, here is a guide for you! This article will explain the advances you can make in your Data Science career, as well as a scattering of links to useful resources. 1. Decide who you want to become 💭 The field of data science is developing vigorously. But data science , but also classical statistics and machine learning algorithms (which is more understandable for business processes), and overall everything related to the analysis, processing, and presentation of information in digital form. is not only neural networks It cannot yet be said that there is a clear division of labor in Data Science — this is a non-specialized profession. A rough analogy: just as there were pure who are engaged in everything related to data. The marker of the first movement towards specialization of labor is the sphere of online education. Computer Scientists (computer scientists and programmers) who understood everything related to computers, so now there are Data Scientists One way or another, a data scientist works at the intersection of several areas: ▶️ (including linear algebra, machine learning algorithms) Mathematics ▶️ (e.x. Python, R, SQL is usually a minimum requirement) Programming ▶️ (yes, apart from Computer Science, you should understand what are business processes and how you can improve it) Business problems Depending on your role in the team, some of these things will have to be done more. When choosing a vector of development, start from your own interests — learning will require significant resources, and without love for your work, you will . A mathematical base is necessary, but it is likely that the personal circle of tasks will be reduced to the use of existing tools and knowledge, and not to the invention of something new. As K. V. Vorontsov said in : quickly burn out one interview People who know how to use ready-made algorithms need 50–100–500 times more. It seems that the problem of how to teach Computer Science and the problem of “more math or more engineering” has the following answer: you need both, but you have to teach mathematics to a carefully selected multitude of people who have realized themselves as creators, designers of new methods 2. Pull up the Math base ➕ If you want truly understand machine learning algorithms, you need first to understand Linear Algebra, Multivariable Calculus, probability theory, and mathematical statistics: Linear Algebra for Data Science in R (4 hours of lessons) Introduction to Calculus (48 hours) Foundations of Probability in Python (5 hours) , , Basics of Statistics part 2 part 3 (total 43 hours) If illustrations, visualization are not enough, I highly recommend taking a look at the wonderful channel . Here are some YouTube playlists for , , . 3Blue1Brown linear algebra analysis differential equations By the way, there is a multivariate mathematical analysis on the Khan Academy YouTube channel. When taking video lectures, do not forget about the possibility of fast-forwarding. To use motor memory and work deeper into the material, take notes. detailed course of YouTube 175 videos on 3. Learn to program 👨💻️ Besides mathematics, you need to be able to program. Usually, Python or R is chosen as the main language for data analysts. There are many good courses in both languages, including with an emphasis on data analysis: Datacamp - Python Programming Track Datacamp - . R Programming Track , . Stepik - Analyzing Data in R part 2 Newcomers to Data Science often have a question about which language to choose the main — created specifically for data processing . Although this is a hot topic, I personally started with R (in computational biology people like it more), however, now I know both languages and highly recommend , since a transition Python -> R is more smooth, compared to backward direction. R or universal Python starting first with Python if you are planning a career in Data Science, I recommend . Knowing R concepts and libraries will keep you one step ahead of Python-only users, and vice versa. Here’s how data analyst writes about it: In short: you master both languages Irina Goloshchapova "By combining the most powerful and stable R and Python libraries in some cases, you can improve the efficiency of calculations or avoid the invention of bicycles for the implementation of any statistical models. Secondly, this is an increase in the speed and convenience of project execution, if different people in your team (or yourself) have good knowledge of different languages. A reasonable combination of existing R and Python programming skills can help." But if you want to go, albeit not a simple, but easier way, then one Python is enough — you will find more courses and on Stackoverflow. answers to all sorts of questions on it 4. Learn to use the tools 🛠️ One of the most popular tools for sharing data analysis results is Jupyter notebooks: Jupyter Notebooks and the Jupyter Lab Platform allow you to combine code, text in Markdown, and formulas in LaTeX, testing, and profiling in a single document. Alternatively, you can collaborate on notebooks using or . Google Colab JupyterHub Learn to as soon as possible. In the process, you will have to choose between a variety of models and architectural solutions — version control is very useful here. use Git Plus, there are on GitHub. Remember that open source is one of the easiest ways to gain the necessary teamwork experience and contribute to a common cause. many great Data Science projects You will naturally come across other popular tools as you progress through the courses. For example, in Python for high-speed processing of data arrays, knowledge of is required, for tabular data presentation, data frames are usually used, for visualization — or , ready-made classes of popular machine learning models are imported from . NumPy Pandas MatplotLib Plotly Scikit-learn Few courses focus on this, but in practice, data is usually stored in databases — SQL or NoSQL. For further work, you will need to learn how to communicate with them: Datacamp - Introduction to Databases in Python Datacamp - Introduction to Relational Databases in SQL Stepik - Hadoop. A system for processing large amounts of data For deep learning, you need to get familiar with frameworks — TensorFlow or PyTorch. There are others — we compared them in the article . “Write your first Generative Adversarial Network Model on PyTorch” 5. Take as many Data Science courses as you can 🎓 Courses: is one of the most popular MOOCs out there. It is worth taking if only because it is often referred to other advanced courses. However, Octave / Matlab is used instead of standard Python and R. Andrew Ng’s Machine Learning Course on Coursera Leskovets et al. . There is a breakdown by chapters: pdf, exercises, presentations, videos. Mining of massive datasets Courses on DataCamp Course (eDX) Harvard Data Science Probabilistic Programming and Bayesian Methods for Hackers Dive into Deep Learning: Free Interactive Book with Code, Math and Discussion http://d2l.ai Textbooks: Hasti et al. Elements of Statistical Learning Academic Hal Daumé III Machine Learning Course Shalev-Schwartz and Ben-David. Understanding Machine Learning: From Theory to Algorithms David Barber. Bayesian Decision Theory and Machine Learning Tom Mitchell. Machine Learning Devroy et al. Probabilistic theory of pattern recognition Neatly designed editions with easy copying of and R in action: data analysis and graphing with R Machine Learning in action Cheat Sheet on Key Concepts and Machine Learning Algorithms 6. Join the Open Data Science community 👥 A lot of interesting things can be learned from the English-language news aggregators from the world of data science: / r / datascience Towards Data Science KDnuggets DataTau Data Science Weekly 7. Take part in competitions 🤼 Register on . Not only is it the most famous platform with cash prizes , but it is also a large community with a registry of datasets, , , and discussions. Participating in the Kaggle ranking on your resume can give you extra credit for your interview. Kaggle machine learning competition Jupyter notebooks mini-courses 8. Explore specific Data Science questions 👁️🗨️ Data science is an incredibly broad interdisciplinary field, and special skills are required to solve specific problems. After familiarizing yourself with Kaggle, it will become clearer to you in which demanded knowledge you have gaps. Also, pay attention to the following courses: . Introduction to Deep Learning in Python . Deep Learning for NLP in Python . Introduction to Natural Language Processing . Probabilistic Graphical Models Specialization . Data Structures course (useful for working with models that process images). Computer graphics: the basics YouTube channels also come in handy: On the YouTube channel of the courses in special sections are conveniently organized into playlists: Computer Science Center, ( ) Machine Learning second part ( ) Image and Video Analysis second part Introduction to Natural language processing ( ) Data analysis in Python in examples and tasks continued Data analysis in R Technologies for storing and processing large amounts of data . Mathematical statistics Don’t stop learning. Browse the top and sidebar for topics related to machine learning: subreddits / r / analyzit / r / bigdata / r / computervision / r / datacleaning / r / datagangsta / r / dataisbeautiful / r / dataisugly / r / datascience / r / datasets / r / dataviz / r / JupyterNotebooks / r / LanguageTechnology / r / learnmachinelearning / r / MachineLearning / r / opendata / r / rstats / r / probabilitytheory / r / pystats / r / SampleSize / r / semanticweb / r / statistics / r / textdatamining 9. At the end of each course, do a project 🏗️ Use new knowledge in the field of Data Science to benefit yourself and others. Create something that will make others say “wow”! Lots of project ideas are listed in Awesome-ai-usecases 51 toy data proble Practical-pandas-projects You can start not from the project, but from an interesting dataset. List of popular registries: Open data registry at data.gov.ru Google public datasets Kaggle datasets (40 thousand) Reddit / r / datasets branch UCI Machine Learning Repository Aggregate list of open-source datasets awesome-public-datasets List of large public datasets List of quality Webhose.io datasets Datasets of the IEEE Society Wolfram Data Drop Accumulator Statistics database on finance, sports, geography, industry Lots of discussions with project ideas can be found on Quora: I am studying Machine Learning and Statistics and am looking for something socially significant using public datasets and APIs What Data Science Problems Can Be Solved Over the Weekend by One Programmer? What tools/technologies/algorithms are best used to build an engine? How to check the effectiveness of the recommendations? How do I start building a recommendation system? Create a public repository on GitHub for each project. Brush up the results, share them on your blog and community. Contribute to side projects, post your ideas and thoughts. All this will help you build a portfolio and get to know people working on related tasks. 10. Read scientific articles🔬 The main languages of data science are not Python or R, but English and the language of mathematics. Preprints of articles are published on the . The most useful sections for data scientists: ArXiv website stat.ML stat cs.LG It is simply impossible to keep track of all publications. The Reddit branches listed above will help to isolate the most important texts (since the author became the head of the AI department at Tesla, the site began to break more often, but it’s still the best tool). There is also and recordings of webinars from the YouTube channel Kaggle . such a list of articles with comments with parsing of scientific articles related to data science algorithms 11. Take a Data Science Internship / Job🕴 Data Science is a highly competitive profession in demand. But even the results of interviews are turned into data by community members. There are many lists of questions to prepare for a data scientist interview: Data Science Interview Questions How Do I Prepare for a Data Science Interview How to Prepare for Statistics Questions What Types of A / B Testing Questions to Expect in an Interview This year it is more difficult, but we hope that summer schools and internships will return soon: Which companies offer Data Science internships for students What tips to follow if I want to apply for an internship in Data Science or Software Engineering When is the Best Time to Apply for Summer Data Science Internships Be sure to use your data mining skills to analyze the job market — analyze which skills are found in jobs more often to hone them as much as possible. Estimate how much income you can expect, taking into account spending on the site, rental housing, and moving to another city. 12. Share your experience with the community 📢 Share your project or find it with the Data Science community. Prepare a talk and speak at a local meetup. Start a blog where you will share your finds, your own ideas, and repositories. Last but not least, enjoy how your skills help make the world a better place! 13. Read More If you found this article helpful, share the article on Facebook so your friends can benefit from it too. Also published at https://dev.to/mikhailraevskiy/data-scientist-12-steps-from-beginner-to-pro-3fh6