Performance Benchmark: Apache Spark on DataProc Vs. Google BigQuery

Written by Raghavendra_Singh | Published 2020/06/30
Tech Story Tags: data-engineering | big-data-processing | google-bigquery | apache-spark | gcp | cloud-computing | apache | big-data

TLDR Research undertaken to provide interactive business intelligence reports and visualisations for thousands of end users. We need to design a system that can analyse billions of data points in real time. The solution took into consideration following 3 main characteristics of the desired system of desired system: Analysing and classifying expected user queries and their frequency. Developing various pre-aggregations and projections to reduce data churn while serving various classes of user queries. Serving up to 60 concurrent queries to the platform users with a combination of aggregated datasets. Developing state of the art ‘Query Rewrite Algorithm’ to serve the user queries using a combination.via the TL;DR App

no story

Written by Raghavendra_Singh | Raghavendra works for Sigmoid.
Published by HackerNoon on 2020/06/30