Authors: (1) Tobias Betz, Technical University of Munich, Germany; (2) Long Wen, Technical University of Munich, Germany; (3) Fengjunjie Pan, Technical University of Munich, Germany; (4) Gemb Kaljavesi, Technical University of Munich, Germany; (5) Alexander Zuepke, Technical University of Munich, Germany; (6) Andrea Bastoni, Technical University of Munich, Germany; (7) Marco Caccamo, Technical University of Munich, Germany; (8) Alois Knoll, Technical University of Munich, Germany; (9) Johannes Betz, Technical University of Munich, Germany. Authors: Authors: (1) Tobias Betz, Technical University of Munich, Germany; (2) Long Wen, Technical University of Munich, Germany; (3) Fengjunjie Pan, Technical University of Munich, Germany; (4) Gemb Kaljavesi, Technical University of Munich, Germany; (5) Alexander Zuepke, Technical University of Munich, Germany; (6) Andrea Bastoni, Technical University of Munich, Germany; (7) Marco Caccamo, Technical University of Munich, Germany; (8) Alois Knoll, Technical University of Munich, Germany; (9) Johannes Betz, Technical University of Munich, Germany. Table of Links Abstract and I. Introduction Abstract and I. Introduction II. Related Work II. Related Work III. Microservice Architecture for an Autonomous Driving Software III. Microservice Architecture for an Autonomous Driving Software IV. Experiments IV. Experiments V. Results V. Results VI. Discussion VI. Discussion VII. Conclusion, Acknowledgments, and References VII. Conclusion, Acknowledgments, and References VI. DISCUSSION The results of our research uncover unexpected insights into the performance of containerized applications, particularly with respect to end-to-end latency and system utilization. Our results suggest that applications deployed in a container environment have a better latency compared to applications running directly on bare-metal. In the real-world application, end-to-end latency improvements of up to 5.2% were achieved. The developed microservice architecture showed an improvement of 5-8% in the mean. For the maximum values, it was apparent that the single-container had significantly reduced max values The DDS Communication benchmark showed that for smaller message sizes, containerization produced better results. This margin was considerably lower (almost absent) in the ros2_benchmark. However, in this benchmark, containerization-related jitter was lower than bare-metal. To better understand the root causes of such behaviors, we have performed several attempts to optimize the baremetal Linux system to achieve better results than containers. However, the complexity of the applications considered and their internal interaction is so high that it was not possible to have all parameters under control. We tried to improve the latency with different real-time scheduling algorithms and patches. Nevertheless, the Autoware software starved when we utilized the entire cores for the software. Reserving resources for the Linux processes led to a higher latency compared to the presented results. At their core, containers leverage kernel parameters and settings to isolate processes using namespaces and cgroups. Achieving better performance on bare metal typically involves tuning kernel parameters and settings. Isolation is likely a primary factor contributing to the improved performance of containers in our complex setup. By isolating processes, containers ensure that standard Linux processes do not interfere with those inside the container, thereby facilitating an optimized execution environment. Interestingly, our results showed that deploying applications in multiple containers enhances the improvements of single container configurations, particularly for average end-to-end latencies. This implies that distributing workloads across multiple containers can optimize the overall system performance, particularly in terms of latency. However, this approach can also result in significantly higher maximum execution times. Such trade-offs must be carefully considered when designing and optimizing a system, especially in real-time environments where small maximum execution time is critical. A critical factor in this discussion is the role of cgroup scheduling and task assignment to cores within the Linux CFS. Platforms like Kubernetes and Docker use this mechanism to effectively schedule container workloads. Cshares, a core component of this system, are influenced by various parameters such as predefined CPU limits and the number of processes or threads within a container. The CFS then allocates resources to Cshares, determining how resources are distributed among containers. One of the key advantages of this system is the relative isolation it offers. In a native system, without the protective containerization layer, processes could inadvertently impact each other. For instance, native processes could affect the performance of the Autoware software within the CFS scheduler. Containerization effectively segregates processes, ensuring that each operates within its own domain and remains unaffected by external entities. The inherent mechanisms of cgroup scheduling and its relation with the Linux CFS could play a crucial role in these results. Another positive effect is that containerization improves latencies by increasing second-order effects such as the locality of data by grouping related tasks on a smaller set of cores, preventing unregulated migrations to distant cores (our systems have 16 and 32 cores respectively). Unregulated migration could be the cause of the bimodal distribution observed for end-to-end latency in Fig. 4(a). However, a number of experiments and testbeds were implemented to confirm this. This reason could not be confirmed as the sole cause of the performance behavior. Our study also showed another interesting trend. As the complexity of the test cases increased, the number of processes or threads working within the container also increased significantly. This suggests that as the complexity increases, the container environment becomes more densely populated with processes and threads to handle the increased requirements. In summary, our exploration of the containerized applications domain has confirmed the potential benefits of such an approach, not only in terms of isolation but also in terms of performance optimization. This is also observed in the paper [24], where investigations of the start-up time of nodes to complete launch coincide with our observation of runtime. VII. CONCLUSION We presented a microservice architecture tailored to an opensource software for autonomous driving. Our study provided a comprehensive overview of the continuous integration and development process associated with this architecture. We analyzed multiple metrics for a real-world ROS 2 autonomous driving application based on Autoware and deployed on increasingly isolated container environments. In order to determine the impact of containerization on communication and simple ROS 2 examples, the analysis was complemented with dedicated benchmarks for DDS and ROS 2. Our findings indicate that the effect of containerization on runtime varies depending on the complexity of the scenario. In simpler scenarios, the impact of containerization was relatively minor, but, in more complex scenarios, such as that of Autoware, the influence—especially on end-to-end latency— was significant. Moreover, both CPU and memory usage were reduced, leading to improved software stability. These effects were observed and validated on two distinct systems: x86 and aarch64 compute platforms. This cross-system analysis enhances the generalizability of our results. While our study shows the positive impact of containerization, the complex interactions between containers, Linux CFS, cgroups and Autoware framework require more detailed investigation to determine the exact contributions of each of the mechanisms involved. However, to our knowledge, this work is the first to provide such in-depth insights into complex real-world autonomous driving setups and highlights the need for more detailed studies in the future. Looking ahead, there are several directions for further research. One is to explore strategies to optimize node assignment to containers and the impact of static container allocations to CPUs and setting bounds on CPU shares. Another interesting topic is hierarchical scheduling, which should be explored in depth to improve the performance of containerized ROS 2 applications. Furthermore, it is worth considering the generalizability of these results beyond ROS 2 applications. ACKNOWLEDGEMENTS T. Betz, as the first author, was the initiator of the research idea and is responsible for the presented concept and implementation. L. Wen, F. Pan, and A. Knoll contributed to implementation and design of the benchmarks. G. Kaljavesi contributed to the 10 implementation of the microservice architecture. A. Zuepke, A. Bastoni, and M. Caccamo contributed to the evaluation of the performance impacts and the design of experiments. J. Betz contributed to the conception of the research project and revised the paper critically for important intellectual content. He gave final approval of the version to be published and agrees to all aspects of the work. As a guarantor, he accepts responsibility for the overall integrity of the paper. M. Caccamo was supported by an Alexander von Humboldt Professorship endowed by the German Federal Ministry of Education and Research. REFERENCES [1] SOAFEE, “SOAFEE: Scalable open architecture for embedded edge,” 2022. [Online]. Available: https://www.soafee.io [2] M. Spencer, “How the SOAFEE architecture brings a cloud-native approach to mixed critical automotive systems,” white paper, Sept. 2021. [3] S.-C. Lin, Y. Zhang, C.-H. Hsu, M. Skach, M. E. Haque, L. Tang, and J. Mars, “The architectural implications of autonomous driving,” in Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2018, pp. 751–766. [4] T. Betz, P. Karle, F. Werner, and J. Betz, “An analysis of software latency for a high-speed autonomous race car—a case study in the indy autonomous challenge,” SAE Int. Journal of Connected and Automated Vehicles, vol. 6, no. 12-06-03-0018, 2023. [5] The Autoware Foundation, “Autoware - the world’s leading opensource software project for autonomous driving.” [Online]. Available: https://github.com/autowarefoundation/autoware [6] S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, p. eabm6074, 2022. [7] P. Karle, T. Betz, M. Bosk, F. Fent, N. Gehrke, M. Geisslinger, L. Gressenbuch, P. Hafemann, S. Huber, M. Hubner ¨ et al., “Edgar: An autonomous driving research platform–from feature development to real-world application,” arXiv preprint arXiv:2309.15492, 2023. [8] J. Arundel and J. Domingus, Cloud Native DevOps with Kubernetes: building, deploying, and scaling modern applications in the Cloud. O’Reilly Media, 2019. [9] Rancher Labs, “K3s - leightweight kubernetes.” [Online]. Available: https://github.com/k3s-io/k3s/ [10] D. Merkel, “Docker: lightweight linux containers for consistent development and deployment,” Linux journal, vol. 2014, no. 239, p. 2, 2014. [11] E. Sax, R. Reussner, H. Guissouma, and H. Klare, A survey on the state and future of automotive software release and configuration management. KIT Amsterdam, The Netherlands, 2017. [12] W. Haas and P. Langjahr, “Cross-domain vehicle control units in modern e/e architectures,” in Int. Stuttgarter Symposium: Automobilund Motorentechnik, 2016, pp. 1619–1627. [13] S. Kugele, D. Hettler, and S. M. Shafaei, “Elastic service provision for intelligent vehicle functions,” Int. Conf. on Intelligent Transportation Systems (ITSC), pp. 3183–3190, 2018. [14] J. Lotz, A. Vogelsang, O. Benderius, and C. Berger, “Microservice architectures for advanced driver assistance systems: A case-study,” in IEEE Int. Conf. on Softw. Archit. Companion (ICSA-C), 2019, pp. 45–52. [15] G. T. B. Tamanaka, R. V. Aroca, and G. A. de Paula Caurin, “Faulttolerant architecture and implementation of a distributed control system using containers,” in LARS/SBR/WRE, 2022, pp. 1–6. [16] A. Brogi, D. Neri, J. Soldani, and O. Zimmermann, “Design principles, architectural smells and refactorings for microservices: a multivocal review,” SICS Softw.-Intensive Cyber-Physical Systems, pp. 3–15, 2019. [17] N. Kukulicic, D. Samardzic, A. Bucaioni, and S. Mubeen, “Automotive service-oriented architectures: a systematic mapping study,” in Euromicro Conf. on Softw. Engin. and Adv. Applications (SEAA), 2022, pp. 459–466. [18] V. Velepucha and P. Flores, “Monoliths to microservices - Migration Problems and Challenges: A SMS,” in Int. Conf. on Information Systems and Softw. Technologies (ICI2ST), 2021, pp. 135–142. [19] S. Giallorenzo, J. Mauro, M. G. Poulsen, and F. Siroky, “Virtualization costs: benchmarking containers and virtual machines against bare-metal,” SN Computer Science, vol. 2, no. 5, p. 404, 2021. [20] R. Morabito, “Virtualization on internet of things edge devices with container technologies: A performance evaluation,” IEEE Access, vol. 5, pp. 8835–8850, 2017. [21] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance comparison of virtual machines and linux containers,” in IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS), 2015, pp. 171–172. [22] M. G. Xavier, M. V. Neves, F. D. Rossi, T. C. Ferreto, T. Lange, and C. A. De Rose, “Performance evaluation of container-based virtualization for high performance computing environments,” in Euromicro Int. Conf. on Parallel, Distributed, and Network-Based Processing (PDP), 2013, pp. 233–240. [23] A. K. S. Rajan, A. Feucht, L. Gamer, I. Smaili et al., “Hypervisor for consolidating real-time automotive control units: Its procedure, implications and hidden pitfalls,” J. Syst. Archit., vol. 82, pp. 37–48, 2018. [24] L. Wen, M. Rickert, F. Pan, J. Lin, and A. Knoll, “Bare-metal vs. hypervisors and containers: Performance evaluation of virtualization technologies for software-defined vehicles,” in IEEE Intelligent Vehicles Symp. (IEEE IV), Jun 2023. [25] M. Reke, D. Peter, J. Schulte-Tigges, S. Schiffer, A. Ferrein, T. Walter, and D. Matheis, “A self-driving car architecture in ros2,” in Int. SAUPEC/RobMech/PRASA Conf., 2020, pp. 1–6. [26] J. Betz, T. Betz, F. Fent, M. Geisslinger, A. Heilmeier, L. Hermansdorfer, T. Herrmann, S. Huch, P. Karle, M. Lienkamp et al., “TUM autonomous motorsport: An autonomous racing software for the indy autonomous challenge,” Journal of Field Robotics, vol. 40, no. 4, pp. 783–809, 2023. [27] Z. Li, A. Hasegawa, and T. Azumi, “Autoware Perf: A tracing and performance analysis framework for ROS 2 applications,” J. Syst. Archit., vol. 123, p. 102341, 2022. [28] T. Kuboichi, A. Hasegawa, B. Peng, K. Miura, K. Funaoka, S. Kato, and T. Azumi, “CARET: Chain-Aware ROS 2 Evaluation Tool,” in IEEE Int. Conf. on Embedded and Ubiquitous Computing (EUC), 2022. [29] T. Betz, M. Schmeller, A. Korb, and J. Betz, “Latency measurement for autonomous driving software using data flow extraction,” in IEEE Intelligent Vehicles Symp. (IEEE IV), 2023. [30] T. Blaß, A. Hamann, R. Lange, D. Ziegenbein, and B. B. Brandenburg, “Automatic Latency Management for ROS 2: Benefits, Challenges, and Open Problems,” in IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS), 2021, pp. 264–277. [31] C. Bedard, I. L ´ utkebohle, and M. Dagenais, “ros2 ¨ tracing: Multipurpose Low-Overhead Framework for Real-Time Tracing of ROS 2,” IEEE Robot. Autom. Lett., vol. 7, no. 3, pp. 6511–6518, 2022. [32] Apex.AI, “performance test.” [Online]. Available: https://gitlab.com/ ApexAI/performance test [33] “iRobot: ROS2 performance,” 2021. [Online]. Available: https: //github.com/irobot-ros/ros2-performance [34] “Nvidia-isaac-ros ros2 benchmark,” 2023. [Online]. Available: https: //github.com/NVIDIA-ISAAC-ROS/ros2 benchmark [35] H. Teper, M. Gunzel, N. Ueter, G. von der Br ¨ uggen, and J.-J. Chen, ¨ “End-to-end timing analysis in ros2,” in IEEE Real-Time Systems Symp. (RTSS), 2022, pp. 53–65. [36] S. Kato, S. Tokunaga, Y. Maruyama, S. Maeda, M. Hirabayashi, Y. Kitsukawa, A. Monrroy, T. Ando, Y. Fujii, and T. Azumi, “Autoware on board: Enabling autonomous vehicles with embedded systems,” in ACM/IEEE Int. Conf. on Cyber-Physical Systems (ICCPS), 2018. [37] T. Betz, M. Schmeller, H. Teper, and J. Betz, “How Fast is My Software? Latency Evaluation for a ROS 2 Autonomous Driving Software,” in IEEE Intelligent Vehicles Symp. (IEEE IV), 2023. [38] T. Kronauer, J. Pohlmann, M. Matthe, T. Smejkal, and G. Fettweis, ´ “Latency analysis of ros2 multi-node systems,” in IEEE Int. Conf. on Multisensor Fusion and Integration for Intell. Syst. (MFI), 2021, pp. 1–7. [39] T. Wu, B. Wu, S. Wang, L. Liu, S. Liu, Y. Bao, and W. Shi, “Oops! It’s Too Late. Your Autonomous Driving System Needs a Faster Middleware,” IEEE Robot. Autom. Lett., vol. 6, no. 4, pp. 7301–7308, 2021. [40] ADLINK, “SOAFEE for software defined vehicles,” 2022. [Online]. Available: https://www.adlinktech.com/en/soafee [41] Eclipse Foundation, “Eclipse Cyclone DDS,” 2022. [Online]. Available: https://cyclonedds.io [42] Indy Autonomous Challenge, 2021. [Online]. Available: https: //www.indyautonomouschallenge.com/ [43] J. Wang and E. Olson, “AprilTag 2: Efficient and robust fiducial detection,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2016, pp. 4193–4198. [44] M. Gunzel, H. Teper, K.-H. Chen, G. von der Br ¨ uggen, and J.-J. Chen, ¨ “On the equivalence of maximum reaction time and maximum data age for cause-effect chains,” in Euromicro Conf. on Real-Time Systems (ECRTS), 2023. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license. available on arxiv