10 Amazing Articles On Python Programming And Machine Learning [Week 3]
This weeks post on our top 10 python articles has Streaming data pipelines, fklearn, cross-validation and more! We focused more on looking for articles that covered topics likes machine learning and data science and hope you enjoy them as much as we did!
Let’s Build a Streaming Data Pipeline
Apache Beam and DataFlow for real-time data pipelines
By Daniel Foley
Today’s post is based on a project I recently did in work. I was really excited to implement it and to write it up as a blog post as it gave me a chance to do some data engineering and also do something that was quite valuable for my team. Not too long ago, I discovered that we had a relatively large amount of user log data relating to one of our data products stored on our systems.
Learn Blockchains by Building One
The fastest way to learn how Blockchains work is to build one
Before you get started…
Remember that a blockchain is an immutable, sequential chain of records called Blocks. They can contain transactions, files or any data you like, really. But the important thing is that they’re chained together using hashes.
Introducing fklearn: Nubank’s machine learning library (Part I)
Nubank has just open-sourced fklearn, our machine learning python library!
At Nubank we rely heavily on machine learning to make scalable data-driven decisions. While there are many other ML libraries out there (we use Xgboost, LGBM, and ScikitLearn extensively for example), we felt the need for a higher level abstraction that would help us more easily apply these libraries to the problems we face. Fklearn effectively wraps these libraries into a format that makes their use in production more effective.
Cross-Validation strategies for Time Series forecasting
Time series modeling and forecasting are tricky and challenging. The i.i.d (identically distributed independence) assumption does not hold well to time series data. There is an implicit dependence on previous observations and at the same time, a data leakage from response variables to lag variables is more likely to occur in addition to inherent non-stationarity in the data space. By non-stationarity, we mean flickering changes of observed statistics such as mean and variance. It even gets trickier when taking inherent nonlinearity into consideration.
An Essential Guide to Numpy for Machine Learning in Python
The Quintessential Library for ML!
Why would this be useful to you?
Well since most of us tend to forget(In case of those already who already implemented ML algorithms) the various library functions and end up writing code for pre-existing functions using sheer logic which is a waste of both time and energy, in such times it becomes essential if one understands the nuances of the Library being used efficiently. So Numpy being one of the es.ential libraries for Machine Learning requires an article of its own.
If you like to travel, let Python help you scrape the best cheap flights!
By Fábio Neves
The goal of this project is to build a web scraper that will run and perform searches on flight prices with flexible dates (up to 3 days before and after the dates you select first), for a particular destination. It saves an excel with the results and sends an email with the quick stats. Obviously, the objective is to help us find the best deals!
Speed Up Your Exploratory Data Analysis With Pandas-Profiling
Get an intuition of your data’s structure with just one line of code
By Lukas Frei
When importing a new data set for the very first time, the first thing to do is to get an understanding of the data. This includes steps like determining the range of specific predictors, identifying each predictor’s data type, as well as computing the number or percentage of missing values for each predictor.
The Next Level of Data Visualization in Python
How to make great-looking, fully-interactive plots with a single line of Python
The sunk-cost fallacy is one of many harmful cognitive biases to which humans fall prey. It refers to our tendency to continue to devote time and resources to a lost cause because we have already spent — sunk — so much time in the pursuit.
Distributed systems with RabbitMQ
In this article we’re going to talk about the benefits of distributed systems and how to move to distributed systems using RabbitMQ. Then we will learn the fundamentals of RabbitMQ and how to interact with it using Python.
Format Python however you like with Black
Learn more about solving common Python problems in our series covering seven PyPI libraries.
Python is one of the most popular programming languages in use today — and for good reasons: it’s open source, it has a wide range of uses (such as web programming, business applications, games, scientific programming, and much more), and it has a vibrant and dedicated community supporting it. This community is the reason we have such a large, diverse range of software packages available in the Python Package Index (PyPI) to extend and improve Python and solve the inevitable glitches that crop up.
Other Great Data Science And Data Engineering Resources