paint-brush
Python Prevails: 57% Choose Python As Their Go-to Data Science Toolby@jessblaq
1,168 reads
1,168 reads

Python Prevails: 57% Choose Python As Their Go-to Data Science Tool

by Jessica BlaquiereApril 19th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

The HackerNoon community was asked what their workhorse data science tool is, given some of the most popular options and 374 people responded. Python was chosen as the go-to tool for data science by over 50% of readers. RStudio was only selected by 9% of respondents.
featured image - Python Prevails: 57% Choose Python As Their Go-to Data Science Tool
Jessica Blaquiere HackerNoon profile picture


Feature image was generated with Midjourney Diffusion with the prompt “A python prevails, digital fantasy art”.


About Data Science

Data science is where statistics, programming, and communication intersect. A data scientist asks a question and uses data to answer that question through mechanisms of various complexity. They have the knowledge and toolkit to know which tests and methods to apply to each data type. And they have the ability to extract answers from data and relay those answers in a general everyday form of communication.


Data can range from simple to wildly complex. It can be “clean” and it can be “messy”. Sometimes we have a question, but we don’t have the data. A data scientist and/or analyst must conform messy data into clean data by using specialized tools. They can also develop ‘scraping’ programs that are designed to go and fetch data if they don’t have what they need to answer their question. Once the data is obtained and is in usable form, it is pushed into statistical tests and models via programs and tools such as Python, RStudio, etc. But which tools are the best?


Readers of HackerNoon


Source: Giphy


HackerNoon’s weekly polls (10/4/2023 to 16/4/2023) were used to assess where our readers fall on this topic. The HackerNoon community was asked what their workhorse data science tool is, given some of the most popular options and 374 people responded. The results can be seen in the image below:



  • [ ]Over 50% of HackerNoon readers, which are largely from the technology community, chose Python as their go-to data science tool. This isn’t all that surprising. Python is open-source which makes it accessible to all 🙂🙃🙂🙃🙂🙃🙂🙃🙂🙃🙂🙃


Source: Giphy


  • [ ]18% selected Excel as their top choice data science tool.
  • [ ] Power BI was only selected by 9% of poll respondents**.**
  • [ ]Also an open-source tool, RStudio only took 9% of the votes.
  • [ ]Finally, only 5% chose Tableau as their go-to tool for data science



Tools

Why are there so many tools to choose from? This field has been gaining in complexity over time, so naturally the choice of tools has been as well. There are so many streams of data science that each individual must decide for themselves which tool is right for them. And realistically, you’ll use multiple tools in tandem.


Let’s see some highlights of each tool from the poll. Of course, there are more tools not discussed here 😆


Excel

❌ Open-source.

✅ Is user-friendly in the Microsoft way!

❌ Is not advanced enough for complex data science projects.

✅ Generates stylish charts and graphs that can be easily exported.


Except this poll, apparently! Source: Giphy


We’re all familiar with Excel. Sure, it’s great for everyday tasks like data manipulation, cleaning, and visualizations, but it doesn’t cut it for more advanced projects. You can get crafty with creating dashboards and reports, and you can even set up specialized APIs within Excel.


Rstudio

✅ Open-source.

❌ Has a significant learning curve.

✅ Can generate tidy and customizable graphs, tables, and outputs.

❌ Can be limited in some of the more advanced machine learning tools.

✅ Specialized for statistical based problems.


A versatile open-source program that’s excellent for data analytics and data science is RStudio, which is now going by the shiny new name Posit.


**“Our mission is to create open-source software for data science, scientific research, and technical communication. We do this to enhance the production and consumption of knowledge by everyone, regardless of economic means.” -- Posit \ Similar to Python, the versatility of the R programming language is vast, allowing data scientists to perform complex tasks using multiple approaches. Libraries and packages are constantly being developed to take on specialized tasks which programmers can take advantage of. And if they don’t have the package you’re looking for, develop one yourself!


You can use R and Python in tandem with one another. Look into this if you’re working on a collaborative project with R and Python programmers.


Power BI

❌ Open-source.

✅ Creates beautiful reports.

❌ Can appear to be easy to use, but has hidden complexity.

✅ Great for data wrangling and manipulation.

❌ Limited abilities for complex data science projects.

✅ Can scrape data from various sources.


Power BI really shines as a data visualization and reports tool rather than a workhorse tool for data science. It holds the capacity to perform specialized data manipulations through tailored coded operations such as regular expressions, etc. But chances are, if you’re working on a complex data science project, you would use Power BI in the end stage of the project as more of a presentation tool.


Python

✅ Open-source.

❌ Has a significant learning curve.

✅ Can create tidy graphs, tables, and outputs.

✅ Has numerous data science libraries like TensorFlow, Scikit-learn, NumPy, Pandas, PyTorch, etc.

✅ Is a multi-purpose programming language making your learning efforts reach further.


Python is an object oriented, multi-purpose programming language. It’s known for being an easy to learn and versatile programming language. Because of its versatility, there’s a massive community of programmers, thus the educational resources are never ending. There are a plethora of data science libraries that are ready to use.


To work with Python, you’ll want to learn how to set up a virtual environment and you’ll likely want to choose a computing platform such as Jupyter Notebook to perform your work in.


Tableau

❌ open-source.

✅ Creates beautiful dashboards.

❌ Limited data pre-processing abilities such as cleaning and wrangling.

✅ Great for data analytics.

❌ Limited abilities for complex data science projects.

✅ Reports and dashboards are easily shareable with others.


Tableau is an excellent data analytics and visualization software that is often used in larger teams due to its cost. It can create beautiful and intuitive presentation style dashboards which can highlight various aspects of your data. It is certainly not a workhorse tool however, as it is more specialized in the reporting stages and not the beginning and middle stages of a data project.



Final Thoughts

Our poll showed that Python rose to the top of the given choices for data science tools. Given its versatility, both in and out of the data science field, this was no surprise. Python is touted as an easy to learn programming language. Let’s be honest, if you’re a complete beginner to computer coding, it won’t be “easy” at first, but with practice, it will eventually become second nature to you.



Please share your thoughts in the comments and keep an eye out for other HackerNoon Polls to participate.