The question of from-scratch implementation vs. Python library comes up once in a while, no matter the goal of your project. When it comes up again, I've got you covered.
I'm working on a data science project to develop a forecasting model in a public health context. It's a proof of concept, so it's not clear whether my prototype will be implemented in practice, but it is part of my PhD thesis. This brought me to ask the from-scratch vs. Python library question quite a few times.
So I searched the 'depths' of the internet to combine the collective knowledge of the community.
I found one discussion on Hacker News and one on StackOverflow, but both provide a developer perspective. Based on Python Developer Survey from 2020, Python is increasingly being used for web development, but the proportion of users using it for data analysis is still considerable.
Additionally, project management offers some valuable tools for decision-making and frameworks, so I will first go through what a project management triangle is. If you already know about it, feel free to skip the next paragraph.
The project management triangle is a conceptual model of 3 project management constraints - scope, cost, and time. Together, they lead to a certain level of quality. For example, if a project manager suddenly finds out they need to deliver the project earlier, they need to adjust the scope or the cost. Otherwise, this deadline change will negatively impact the quality. You can read more on Wikipedia.
How is this relevant to deciding on whether to implement something from scratch?
Well, if you have a specific goal and you have a deadline, then what you have is a project, and so you can map the concept of the project triangle to your decision-making. The time simply represents the time you or someone else allocated to solving the problem or a challenge. The scope describes what you are trying to do, e.g., find out whether a password was leaked or apply a machine-learning algorithm to classify cats and dogs. The cost or a resource is you and your capacity and experience.
This is a nice start, but it doesn't fit our purpose of deciding whether to use a Python library or implement a solution from scratch completely.
So, I tried to combine the developer perspective with the data science point of view and project management principles.
Disclaimer from Ms. Obvious: The assumption is that you found a Python library with the potential to solve your problem, and the library is active (i.e. the library is regularly updated). If you can't find a good library, you don't have the option of using it.
In terms of the project triangle, the time corresponds to time. Then, the following three aspects would fall under the scope and the last three under costs or resources.
In my example at the beginning, I mentioned I was working on a proof of concept data science project. This is part of my PhD, so I'm supposed to learn and push myself outside my comfort zone. My time is limited but still somewhat flexible. I found a few libraries, but they were large, and I only needed one function. Also, I had no experience with these libraries, so I wasn't sure how long it would take to get it all working with my analytic pipeline.
Additionally, I didn't want to introduce more libraries. I don't know whether my code will eventually end up in production. Still, I tried to limit dependencies just in case. The task had medium complexity, and I knew I had the required level of experience. I decided to attempt the implementation from scratch.
As you can see from this example, the 7 aspects are not equally important. In my case, the learning objective combined with my level of experience and not having a hard deadline were crucial in my decision-making. The rest was secondary. Different aspects can be important in other contexts, but you can always go back and map everything to the 7 factors and the project triangle.
Consider using a library when
If you have enough time and want to learn about a topic, you should lean towards implementing your solution from scratch. Python offers thousands of libraries, and they are generally easy to use, so it can be tempting to just go for the library without considering the other option. Rember, when there is no deadline, it's not a project, so you don't have to optimize for time, instead focus on achieving the goal of understanding.
On the other hand, if you need to use a complex neural network, it's probably better to use an existing Python library.
Consider implementing from scratch when
If you've only started learning Python and you've never coded or programmed before, then attempting to implement an algorithm can be too challenging. So much so that it can kill your enthusiasm. However, if you are a seasoned developer and write code for production, you probably want to avoid being dependent on 3rd party implementations. You also know all this already.
Additionally, if the library contains many functionalities that you don't need, consider taking out only the bits you need from the library source code.
You might be interested in reading these too:
Implementation of Gaussian Naive Bayes in Python from Scratch