SeattleDataGuy

@SeattleDataGuy

10 Great Articles On Data Science And Data Engineering

April 9th 2019

Data science and programming are such rapidly expanding specialities it is hard to keep up with all the articles that come out from Google, Uber, Netflix and one off engineers. We have been reading several over the past few weeks and wanted to share some of our top blog posts for this week April 2019!

We hope you enjoy these articles.

Building and Scaling Data Lineage at Netflix

By: Di Lin, Girish Lingappa, Jitender Aswani

Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question — “Can I run a check myself to understand what data is behind this metric?”
Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical customer facing services (e.g. billing). You are about to make structural changes to the data and want to know who and what downstream to your service will be impacted.

Read More Here

DeepMind and Google: the battle to control artificial intelligence

By Hal Hodson

One afternoon in August 2010, in a conference hall perched on the edge of San Francisco Bay, a 34-year-old Londoner called Demis Hassabis took to the stage. Walking to the podium with the deliberate gait of a man trying to control his nerves, he pursed his lips into a brief smile and began to speak: “So today I’m going to be talking about different approaches to building…” He stalled, as though just realizing that he was stating his momentous ambition out loud. And then he said it: “AGI”.

Read More Here

Learning Data Science: Our Favorite Resources From Free To Not

Today we wanted to cover some of our favorite resources for data science. As the title suggests, these resources will be from free to not. Some people like buying books and other people prefer online courses. So we have created this list of data resources that range from books to courses, from free to not.

Data science has many facets. Statistics, data cleansing, programming, system design and really…almost anything else data related depending on how large the company is.

This post will discuss our favorite resources for these topics. Now, most of these courses and books are primers for topics like statistics, Python and data science in general. They really will only provide the base knowledge. At the end of the day, real practical experience is one for the few things that will really train your data science knowledge. You should learn as much as you can from these resources and then apply for as many internships and entry-level positions as possible and study for interviews.

Read More Here

Object Detection with 10 lines of code

By Moses Olafenwa

One of the important fields of Artificial Intelligence is Computer Vision. Computer Vision is the science of computers and software systems that can recognize and understand images and scenes. Computer Vision is also composed of various aspects such as image recognition, object detection, image generation, image super-resolution and more. Object detection is probably the most profound aspect of computer vision due the number practical use cases. In this tutorial, I will briefly introduce the concept of modern object detection, challenges faced by software developers, the solution my team has provided as well as code tutorials to perform high performance object detection.

Read More Here

How Apache Airflow Distributes Jobs on Celery workers

By Hugo Lime

Discover what happens when Apache Airflow performs task distribution on Celery workers through RabbitMQ queues.

Apache Airflow is a tool to create workflows such as an extract-load-transform pipeline on AWS. A workflow is a directed acyclic graph (DAG) of tasks and Airflow has the ability to distribute tasks on a cluster of nodes. Let’s see how it does that.

Read More Here

Capturing Special Video Moments with Google Photos

Recording video of memorable moments to share with friends and loved ones has become commonplace. But as anyone with a sizable video library can tell you, it’s a time consuming task to go through all that raw footage searching for the perfect clips to relive or share with family and friends. Google Photos makes this easier by automatically finding magical moments in your videos — like when your child blows out the candle or when your friend jumps into a pool — and creating animations from them that you can easily share with friends and family.
Read More Here

Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs

By Scott Short

As part of our effort to create better user experiences on our platform, members of our Maps Data Collection team use a dedicated mobile application to collect imagery and its associated metadata to enhance our maps. For example, our team captures images of street signs to improve the efficiency and quality of our maps data in order to facilitate a more seamless trip experience…

Read More Here

You created a machine learning application. Now make sure it’s secure.

By Ben Lorica and Mike Loukides

In a recent post, we described what it would take to build a sustainable machine learning practice. By “sustainable,” we mean projects that aren’t just proofs of concepts or experiments. A sustainable practice means projects that are integral to an organization’s mission: projects by which an organization lives or dies. These projects are built and supported by a stable team of engineers, and supported by a management team that understands what machine learning is, why it’s important, and what it’s capable of accomplishing.

Read More Here

Developing A Data Science Career Framework

By Adam McElhinney

At Uptake, Data Scientists are at the core of what we do. To that end, it’s very important that we have a good definition of the following: what does a Data Scientist do; how is a Data Scientist’s performance evaluated; and how does a Data Scientist progress in their career. Once you have these definitions, they can be used as the basis for all of your hiring, development, compensation, exit and promotion decisions.

Read More Here

Diagnosing Heart Disease Using ML Explainability Tools and Techniques

By Rob Harrand

IntroductionOf all the applications of machine-learning, diagnosing any serious disease using a black box is always going to be a hard sell. If the output from a model is the particular course of treatment (potentially with side-effects), or surgery, or the absence of treatment, people are going to want to know why.

This dataset gives a number of variables along with a target condition of having or not having heart disease. Below, the data is first used in a simple random forest model, and then the model is investigated using ML explainability tools and techniques.
Read More Here

Thank you so much for reading. If you are interested in getting updates about our favorite articles then sign up here for weekly newsletters.

More by SeattleDataGuy

More Related Stories