Lambda Architecture Batch Layer: Visualizing All Time Taxi Data [Part 3]

TLDR

In this part i would be talking about the batch layer of the Lambda Architecture. Batch layer is computed by applying a function to the whole historical dataset, to answer some high level questions which cannot be answered by either speed layer or serving layer. The computations typically take hours or days to run, and the results are stored usually in a distributed file system (although this is not a requirement). For example, the queries that might need to be answered would range from the beginning of the dataset to now, in our case, till date how many cabs have served how many passengers, or what is the total distance driven by all the cabs. In this article i would try to answer questions like these based on the dataset that i have. The code for the article can be found here.via the TL;DR App

no story

Written by srivassid | Data Engineer

Published by HackerNoon on 2020/12/20