10 Amazing Articles On Python Programming And Machine Learning [Week 3]

This weeks post on our top 10 python articles has Streaming data pipelines, fklearn, cross-validation and more! We focused more on looking for articles that covered topics likes machine learning and data science and hope you enjoy them as much as we did!

Let’s Build a Streaming Data Pipeline

Apache Beam and DataFlow for real-time data pipelines

By Daniel Foley

Today’s post is based on a project I recently did in work. I was really excited to implement it and to write it up as a blog post as it gave me a chance to do some data engineering and also do something that was quite valuable for my team. Not too long ago, I discovered that we had a relatively large amount of user log data relating to one of our data products stored on our systems.

Read More Here

Learn Blockchains by Building One

The fastest way to learn how Blockchains work is to build one

By Daniel van Flymen

Before you get started…

Remember that a blockchain is an immutable, sequential chain of records called Blocks. They can contain transactions, files or any data you like, really. But the important thing is that they’re chained together using hashes.

Read More Here

Introducing fklearn: Nubank’s machine learning library (Part I)

By Lucas Estevam

Nubank has just open-sourced fklearn, our machine learning python library!

At Nubank we rely heavily on machine learning to make scalable data-driven decisions. While there are many other ML libraries out there (we use Xgboost, LGBM, and ScikitLearn extensively for example), we felt the need for a higher level abstraction that would help us more easily apply these libraries to the problems we face. Fklearn effectively wraps these libraries into a format that makes their use in production more effective.

Read More Here

Cross-Validation strategies for Time Series forecasting

Time series modeling and forecasting are tricky and challenging. The i.i.d (identically distributed independence) assumption does not hold well to time series data. There is an implicit dependence on previous observations and at the same time, a data leakage from response variables to lag variables is more likely to occur in addition to inherent non-stationarity in the data space. By non-stationarity, we mean flickering changes of observed statistics such as mean and variance. It even gets trickier when taking inherent nonlinearity into consideration.

Read More Here

An Essential Guide to Numpy for Machine Learning in Python

The Quintessential Library for ML!

By Siddharth Dikshit

Why would this be useful to you?

Well since most of us tend to forget(In case of those already who already implemented ML algorithms) the various library functions and end up writing code for pre-existing functions using sheer logic which is a waste of both time and energy, in such times it becomes essential if one understands the nuances of the Library being used efficiently. So Numpy being one of the es.ential libraries for Machine Learning requires an article of its own.

Read More Here

If you like to travel, let Python help you scrape the best cheap flights!

By Fábio Neves

Simply put

The goal of this project is to build a web scraper that will run and perform searches on flight prices with flexible dates (up to 3 days before and after the dates you select first), for a particular destination. It saves an excel with the results and sends an email with the quick stats. Obviously, the objective is to help us find the best deals!

Read More

Speed Up Your Exploratory Data Analysis With Pandas-Profiling

Get an intuition of your data’s structure with just one line of code

By Lukas Frei


When importing a new data set for the very first time, the first thing to do is to get an understanding of the data. This includes steps like determining the range of specific predictors, identifying each predictor’s data type, as well as computing the number or percentage of missing values for each predictor.

Read More Here

The Next Level of Data Visualization in Python

How to make great-looking, fully-interactive plots with a single line of Python

By Will Koehrsen

The sunk-cost fallacy is one of many harmful cognitive biases to which humans fall prey. It refers to our tendency to continue to devote time and resources to a lost cause because we have already spent — sunk — so much time in the pursuit.

Read More Here

Distributed systems with RabbitMQ

In this article we’re going to talk about the benefits of distributed systems and how to move to distributed systems using RabbitMQ. Then we will learn the fundamentals of RabbitMQ and how to interact with it using Python.

Read More Here

Format Python however you like with Black

Learn more about solving common Python problems in our series covering seven PyPI libraries.

By Moshe Zadka (Community Moderator)

Python is one of the most popular programming languages in use today — and for good reasons: it’s open source, it has a wide range of uses (such as web programming, business applications, games, scientific programming, and much more), and it has a vibrant and dedicated community supporting it. This community is the reason we have such a large, diverse range of software packages available in the Python Package Index (PyPI) to extend and improve Python and solve the inevitable glitches that crop up.

Read More Here

Other Great Data Science And Data Engineering Resources

The Interview Study Guide For Software Engineers

Learning Data Science: Our Top 25 Data Science Courses

The Best And Only Python Tutorial You Will Ever Need To Watch

Dynamically Bulk Inserting CSV Data Into A SQL Server

4 Must Have Skills For Data Scientists

Engineering Dashboards, Metrics And Algorithms Part 2

Read Last Weeks Top Ten Article For Python Libraries

How Algorithms Can Become Unethical and Biased