Building a Large-Scale Interactive SQL Query Engine with Open Source Software

Written by bin-fan | Published 2020/03/07
Tech Story Tags: big-data | ecommerce-platform | open-source | scaling | enterprise-software | sql | data-engineering | performance

TLDR JD.com is running a data platform with more than 40,000 servers, running more than 1 million jobs per day, managing over 650PB of data. This article describes how JD built an interactive OLAP platform combining two open-source technologies: Presto and Alluxio. The enormous scale caused issues in achieving good data locality, which significantly impacts the performance of jobs running on Presto when reading from HDFS. The platform is over-utilized to the point where YARN is unable to schedule Presto jobs on its local HDFS datanode.via the TL;DR App

no story

Written by bin-fan | VP of Open Source and Founding Member @Alluxio
Published by HackerNoon on 2020/03/07