Feature image was generated with Midjourney Diffusion with the prompt “A python prevails, digital fantasy art”.
Data science is where statistics, programming, and communication intersect. A data scientist asks a question and uses data to answer that question through mechanisms of various complexity. They have the knowledge and toolkit to know which tests and methods to apply to each data type. And they have the ability to extract answers from data and relay those answers in a general everyday form of communication.
Data can range from simple to wildly complex. It can be “clean” and it can be “messy”. Sometimes we have a question, but we don’t have the data. A data scientist and/or analyst must conform messy data into clean data by using specialized tools. They can also develop ‘
HackerNoon’s weekly polls (10/4/2023 to 16/4/2023) were used to assess where our readers fall on this topic. The HackerNoon community was asked what their workhorse data science tool is, given some of the most popular options and 374 people responded. The results can be seen in the image below:
Why are there so many tools to choose from?
Let’s see some highlights of each tool from the poll. Of course, there are more tools not discussed here 😆
❌ Open-source.
✅ Is user-friendly in the Microsoft way!
❌ Is not advanced enough for complex data science projects.
✅ Generates stylish charts and graphs that can be easily exported.
We’re all familiar with Excel. Sure, it’s great for everyday tasks like data manipulation, cleaning, and visualizations, but it doesn’t cut it for more advanced projects. You can get crafty with creating dashboards and reports, and you can even set up specialized APIs within Excel.
✅ Open-source.
❌ Has a significant learning curve.
✅ Can generate tidy and customizable graphs, tables, and outputs.
❌ Can be limited in some of the more advanced machine learning tools.
✅ Specialized for statistical based problems.
A versatile open-source program that’s excellent for data analytics and data science is
**“Our mission is to create open-source software for data science, scientific research, and technical communication. We do this to enhance the production and consumption of knowledge by everyone, regardless of economic means.” -- Posit \ Similar to Python, the versatility of the R programming language is vast, allowing data scientists to perform complex tasks using multiple approaches. Libraries and packages are constantly being developed to take on specialized tasks which programmers can take advantage of. And if they don’t have the package you’re looking for, develop one yourself!
You can use R and Python in tandem with one another. Look into this if you’re working on a collaborative project with R and Python programmers.
❌ Open-source.
✅ Creates beautiful reports.
❌ Can appear to be easy to use, but has hidden complexity.
✅ Great for data wrangling and manipulation.
❌ Limited abilities for complex data science projects.
✅ Can scrape data from various sources.
Power BI really shines as a
✅ Open-source.
❌ Has a significant learning curve.
✅ Can create tidy graphs, tables, and outputs.
✅ Has numerous data science libraries like TensorFlow, Scikit-learn, NumPy, Pandas, PyTorch, etc.
✅ Is a multi-purpose programming language making your learning efforts reach further.
To work with Python, you’ll want to learn how to set up a virtual environment and you’ll likely want to choose a computing platform such as Jupyter Notebook to perform your work in.
❌ open-source.
✅ Creates beautiful dashboards.
❌ Limited data pre-processing abilities such as cleaning and wrangling.
✅ Great for data analytics.
❌ Limited abilities for complex data science projects.
✅ Reports and dashboards are easily shareable with others.
Our poll showed that Python rose to the top of the given choices for data science tools. Given its versatility, both in and out of the data science field, this was no surprise. Python is touted as an easy to learn programming language. Let’s be honest, if you’re a complete beginner to computer coding, it won’t be “easy” at first, but with practice, it will eventually become second nature to you.
Please share your thoughts in the comments and keep an eye out for other HackerNoon Polls to participate.