In this blog, guest writer Derek Tan, Executive Director of Infra & Simulation at WeRide, describes how engineers leverage Alluxio as a hybrid cloud data gateway for applications on-premises to access public cloud storage like AWS S3.
The new data access architecture provides a localized cache per location to eliminate redundant requests to S3. In addition to removing the complexity of manual data synchronization, Alluxio directly serves data to engineers working with the same data in the same office, circumventing transfer costs associated with S3. The original content was published on Alluxio's Engineering Blog (Disclaimer: The author is a Founding Member @Alluxio).
WeRide is a company that creates L4 autonomous driving algorithms in the smart mobility industry. Like all self-driving cars companies, data is continuously collected from live road tests for model training, algorithm testing, and simulations.
Thus far, WeRide has accumulated two million kilometers of autonomous driving mileage and the rate of data collection will only increase as more testing vehicles are in service. In 2020, data on the scale of terabytes is generated daily and we foresee this to grow by a factor of 10 in the following year.
In addition to data collected from test drives, applications such as simulation, SIL (Software in the loop) tests, and model benchmarking also produce terabytes of data daily. As our technology advances, the output from these additional applications will also continue to grow to cover larger datasets with more corner cases to handle.
WeRide is a globally distributed company with offices located in multiple cities including San Jose in the US and Guangzhou, Beijing, Shanghai, and Anqing in China. Data is generated and consumed in parallel by different teams across offices. We use AWS S3 as the data lake to share across different offices.
When designing a new algorithm for our self-driving cars or fixing a bug in an existing one, our engineers need to test the algorithm against existing data. Given our data architecture, this caused bottlenecks such as:
Previous architecture is shown below
After some investigation, we realize the following architecture will provide great benefit:
However, building an in-house caching system from scratch can be expensive and unnecessary for WeRide’s business needs. We decided to explore existing technologies to meet our needs and fulfill the following requirements:
With the above criteria in mind, Alluxio became a top choice to accelerate our data access. In addition to being compatible with S3, it provides an easy access interface via its POSIX and HTTP endpoints. As an open source technology, we can incorporate it into our system without adding additional business costs.
The new architecture with Alluxio is shown below
In each office, we deployed Alluxio as a small on-premise cluster, using S3 as the source of truth. Road test data is directly uploaded to the local Alluxio cluster, which can be immediately used by the engineers in the same office. Meanwhile, Alluxio automatically uploads the road test data to S3 in the background. As engineers in other offices want to use road test data, they can make a request via their local Alluxio cluster. The data will either be returned immediately if cached by Alluxio or fetched from S3 if not. To further reduce the fetch time of new data from S3, we worked with the Alluxio team to implement a distributed load command which can open multiple simultaneous connections to download data. This feature was added in the Alluxio 2.1.0 release.
With Alluxio, application data fetched from the cloud is also cached locally. This was previously not possible if the data was not uploaded from the same office. In the common scenario where an engineer wants to review a simulation result by another engineer in the same office, the data is immediately available.
Using the new implementation with Alluxio, we observe the following improvements:
Data transfer via Alluxio is now a critical component of connecting in-office local data with data in the cloud. To further improve the system, we are working with the Alluxio team to add features relating to data transfer policies. Capabilities such as throttling upload bandwidth during working hours or prioritization of certain file types would help our engineers.
WeRide aims at delivering L4 autonomous driving technology for the future. Data access is a critical part of developing smart mobility. Adopting Alluxio as a localized cache layer eliminates redundant requests to S3 while removing the complexity of data synchronization, reduces $5 per issue per engineer in data transfer. We look forward to further collaboration with our friends at Alluxio to achieve our data access goal economically.
(Disclaimer: The author is a Founding Member @Alluxio)