Too Long; Didn't Read
Vipshop is the third largest e-commerce site in China and processes large amounts of data collected daily to generate targeted advertisements for its consumers. The site runs tens of thousands of queries to derive insights for targeted ads from a dozen of Hive tables stored in HDFS. The major challenge when running jobs in the architecture shown in Figure 1 is inconsistent performance due to multiple reasons. With a large number of nodes in the cluster, it is unlikely that the data needed by a computation process is served by the local storage process. Remote requests from other storage processes created bottlenecks on certain data nodes. With Alluxio, we separate storage and compute by moving HDFS to an isolated cluster. Resources on the compute cluster are scaled independently of storage capacity.
Share Your Thoughts