Python is a common language that is used by both data engineers and data scientists. This is because it can automate the operational work that data engineers need to do and has the algorithms, analytics, and data visualization libraries required by data scientist.
In both rolls, the need to manage, automate and analyze data is made easier by only a few lines of code. So much so that one of the books we have read and seen in many data focused practitioners libraries in the book Automate The Boring Stuff With Python.
The book covers python basics and some simple automation tips. This is especially good for business analysts who work heavily in Excel.
There are also books by O’Reilly that are also a great overview of the basics.
You can start your list of books with the Python Cookbook. This book covers very important topics like File/IO, data structures, networking, algorithms, etc. All of these topics are a great base for any tech-driven career. It is broad and will give you a good understanding of what you can do with python while also teaching you about common programming principles like objects, classes, data structures and algorithms. If you prefer this book, then there is no need to buy the automation book. This will cover most of the topics besides excel. If you really want to learn more about python and excel you can always read about openpyxl. Honestly, if you are a person who likes reading technical documentation vs. reading books, that is probably the way to go (we tend to be book people).
If you’re not into reading books and like free youtube tutorials, then here is one of our favorites.
The creator Corey Schafer, a youtuber. Has tons of full tutorials on Python, Django, GitHub, Linux and more. All of these are very practical skills for someone interested in going into the tech field. In addition, his videos are really easy to follow along with. The video below is on objects and classes, which might be a little more advanced. If you are just starting to check out this video.
Now, some python tutorials are just funny. Ok, actually this next one is created by the TechLead. For those of you who don’t know, the TechLead and are not programmers, it is hard to tell, but he is hilarious. Most of his videos are just on the edge of a joke with some seriousness. Being able to tell the nuance…requires working in the tech industry. For instance, in this video, he actually is mocking python tutorials sort of. Part of being in the tech industry is learning a whole other side of humor. It tends to be very niche and meme based.
Once you are comfortable with python in general, it’s then it becomes much easier to learn more and more libraries.
Pandas is a data manipulation library that allows you to run transformations and basic analytics on data sets. Our one piece of advice is as a user you need to think about where Pandas provides value vs. SQL. Using Pandas over SQL is not always beneficial. This can be for many reasons. If you are running Pandas in a Jupyter notebook that runs on your computer, then whatever data processing you are doing is happening on the limited ram of your computer.
Most larger companies or at least tech companies will use some form of cloud computing for running your Jupyter Notebook. Still, python isn’t always the best for fast data transformations. We have seen someone develop a date_diff function in python that took 5 minutes to run for 1 million rows whereas if they would have run it in SQL it probably would have taken 1 second. This is an important note because imagine if instead of 1 million rows, it was 1 billion. That is 1000x time increase of run time (ok it’s not that simple when it comes to computing but the point is…much longer).
Pandas still have its place and are very useful for preparing and analyzing your data. Here are some great resources for Pandas:
Data Analysis With Python And Pandas
Sentdex is a great youtube who really makes python in general simple. He creates python tutorials for several topics besides just Pandas. But we really appreciate his down to earth style. He assumes you are starting at ground zero and builds from there.
That’s what makes his videos probably some of the most effective video tutorials on Youtube in our opinion. Plus, his style is easy to follow.
Python Data Analysis with Pandas in 10 Minutes | Udemy Instructor, Frank
We wanted to share a free Udemy video created by Frank Kane. You will see us reference this instructor several more times. He is one of the most professional course creators. Now, this video is free. However, Frank Kane does produce a lot of high-quality courses that we have paid for and enjoyed. He has courses from beginner to free in python and many other modern tech concepts. He also has created a book for Python And Machine Learning
Finally, there aren’t a lot of long-form videos we enjoy. Often times it can be hard to follow these videos for an hour at a time. There is one that we enjoyed and wanted to share. If you are just getting started and you want a crash course on Pandas to check out the video below. Our one comment with the video below is you make sure to change the video Quality. For us, it starts out terrible until you shift it to 720p.
There aren’t a lot of free books we like online for Pandas. So, in this case, we will be recommending some books you can pick off on Amazon or O’Reilly with their 10-day free trial. This is actually at the very least, a great way to find books you might enjoy buying!
Here, you can’t go wrong with the classic Python For Data Analysis.
This book will cover all the basics, like data aggregations and time series while at the same time take through basic python exercises that will help you learn how to apply Pandas to actual problems. This is one of the traits we look for in books or courses. We tend to need actual problems that we can apply our skills on. It makes it easier to frame and approach the problems.
This book, as it states, really will take you through the “nuts and bolts of manipulating, processing, cleaning, and crunching data in Python”. These skills are crucial as a data scientist because most of your time will be focused on cleaning and processing data.
This is probably one of the few Pandas books we would recommend. There are many other data science and machine learning books we will also add to this list shortly. But there aren’t that many more books that are purely focused on Pandas that we would recommend.
Python has several other libraries that we have skipped over thus far. If you are looking to get involved in machine learning and deep learning they are core libraries that make programming complex models, algorithms and neural networks easy.
Pandas do a lot of basic analytical functions. It aggregates and runs basic descriptive statistical processes. In order to do apply more advanced models easily. You will need to learn Scikit learn. Now, we say this…sparingly. Simply learning how the library works do not make you a machine learning engineer. It is one of the important libraries to know because it contains most of the model’s data scientists will use in python.
Again, Sentdex has a great set of tutorials that are really down to earth.
Scikit Learn Machine Learning SVM Tutorial
Sentdex has been around for a while, so for newer content, you can check out Simplilearn. It has some newer videos and they do a great job of going much more in depth. We do wish they would have gotten better audio quality but other than that it is a great set of videos for learning Scikit learn.
Tensorflow And Deep Learning
Another library Python has is Tensorflow. This library allows you to set up neural networks pretty easily. No need to create perceptron classes or any other form of object/class that would be required if you were developing it from scratch
In fact, Ten follow is written in such a method that it acts more like a graph when compiled that is then translated into c code.
One great, slightly hidden video series is created by _Hvass Laboratories. What is great about this series is that it not only walks you through using Tensorflow. It also references a GitHub you can use to easily follow along. All his code is already written out. This makes it easy for you to follow along.
Again, we have to do another shout out to Sentdex just has the easiest to follow Tensorflow introduction.
Python Machine Learning Book Recommendations
For books, we would recommend Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning.
This book will start light with topics like Linear regression and KNN and then go into more deep learning concepts like neural networks. Also, like many other O’Reilly books, it has a lot of great practical examples that are well explained.
If you would prefer learning about Tensorflow, then skip the book above and just use this one instead. It is slightly lighter on the machine learning (but still very thorough) but it also has a second half 100% dedicated to neural networks. It covers topics like convolutional neural networks, autoencoders, drop-out and other very important topics too deep learning with Tensorflow.
We have a few courses we would love to recommend this topic as well.
We have already referenced Frank but we also think Kirill Eremenko is another great instructor. They not only are great teachers, they clearly have a depth of understanding when it comes to these topics.
Frank Kane has a great course on machine learning where he will walk you from linear regression to support vector machines. He will also discuss Ensemble Learning and bias trade-offs and much more. Plus, if you are a visual learner, this will probably benefit you more. There’s also an entire section on machine learning with Apache Spark, which lets you scale up these techniques to “big data” analyzed on a computing cluster.
Frank Kane has a list of other courses, and there isn’t one we haven’t purchased and enjoyed yet. So we hope you can enjoy them as well!
Another great course is 12. Machine Learning A-Z™: Hands-On Python & R In Data Science.. This course is comprehensive and discusses both Python and R. This isn’t just focused on Scikit learn but machine learning in general. In addition, the creator of this course is the owner of SuperDataScience.com this is a great site with a podcast, lessons and more. So if you don’t want to pay for the course, you can always listen to the podcasts for free!
Python, of course, is not the only language for data science. Another popular language is R (also, these aren’t the only 2 languages, there are other languages people like to use…except Matlab..we don’t talk about Matlab).
At the moment, python is the everyman’s language. It is easy to write, implement and use. Are there tradeoffs, yes, does it do almost everything you need well enough, yes.
We hope this list was helpful. Please comment with some of your favorite books or free resources about python below!
For further reading and videos on data science, SQL and Python:
How To Develop Robust Algorithms
Dynamically Bulk Inserting CSV Data Into A SQL Server
SQL Best Practices — Designing An ETL — Part 1
How Algorithms Can Become Unethical and Biased
4 Must Have Skills For Data Scientists
What is A Decision Tree
Create your free account to unlock your custom reading experience.