Too Long; Didn't Read
JD.com is running a data platform with more than 40,000 servers, running more than 1 million jobs per day, managing over 650PB of data. This article describes how JD built an interactive OLAP platform combining two open-source technologies: Presto and Alluxio. The enormous scale caused issues in achieving good data locality, which significantly impacts the performance of jobs running on Presto when reading from HDFS. The platform is over-utilized to the point where YARN is unable to schedule Presto jobs on its local HDFS datanode.