paint-brush
Automatic Configuration Tuning on Cloud Database: A Surveyby@configuring

Automatic Configuration Tuning on Cloud Database: A Survey

by ConfiguringNovember 26th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This survey examines automatic parameter tuning techniques for modern cloud DBMSs, highlighting methods like Bayesian optimization, neural networks, and reinforcement learning. It discusses tuning objectives, workload characterization, and experimental settings, offering insights for researchers and practitioners in the field.
featured image - Automatic Configuration Tuning on Cloud Database: A Survey
Configuring HackerNoon profile picture

Authors:

(1) Limeng Zhang, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia;

(2) M. Ali Babar, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia.

Abstract and 1 Introduction

1.1 Configuration Parameter Tuning Challenges and 1.2 Contributions

2 Tuning Objectives

3 Overview of Tuning Framework

4 Workload Characterization and 4.1 Query-level Characterization

4.2 Runtime-based Characterization

5 Feature Pruning and 5.1 Workload-level Pruning

5.2 Configuration-level Pruning

5.3 Summary

6 Knowledge from Experience

7 Configuration Recommendation and 7.1 Bayesian Optimization

7.2 Neural Network

7.3 Reinforcement Learning

7.4 Search-based Solutions

8 Experimental Setting

9 Related Work

10 Discussion and Conclusion, and References


Abstract—Faced with the challenges of big data, modern cloud database management systems are designed to efficiently store, organize, and retrieve data, supporting optimal performance, scalability, and reliability for complex data processing and analysis. However, achieving good performance in modern databases is non-trivial as they are notorious for having dozens of configurable knobs, such as hardware setup, software setup, database physical and logical design, etc., that control runtime behaviors and impact database performance. To find the optimal configuration for achieving optimal performance, extensive research has been conducted on automatic parameter tuning in DBMS. This paper provides a comprehensive survey of predominant configuration tuning techniques, including Bayesian optimization-based solutions, Neural network-based solutions, Reinforcement learning-based solutions, and Search-based solutions. Moreover, it investigates the fundamental aspects of parameter tuning pipeline, including tuning objective, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings. We highlight technique comparisons in each component, corresponding solutions, and introduce the experimental setting for performance evaluation. Finally, we conclude this paper and present future research opportunities. This paper aims to assist future researchers and practitioners in gaining a better understanding of automatic parameter tuning in cloud databases by providing state-of-the-art existing solutions, research directions, and evaluation benchmarks.

1 INTRODUCTION

In the increasingly digitized age, vast and diverse volumes of data are generated from various sources, including mobile devices, social media platforms, sensors, and more. Faced with this data explosion, cloud database management systems (DBMS) cloud database management systems (DBMS) for data storage, coupled with big data analytics frameworks (BDAF), have emerged as powerful solutions to tackle the complexities of handling and processing massive and intricate data sets in a scalable and flexible manner. This makes them invaluable tools for organizations grappling with the challenges of big data and digital transformation [1], [2].


However, achieving good performance in modern DBMSs is non-trivial. Modern DBMSs have hundreds of configurable knobs regarding hardware setup, software configuration, database physical and logical design, that affect their performance [3]–[6]. Efficient parameter configurations can strike a balance between resource utilization, query responsiveness, and cost-effectiveness, while an inappropriate configuration can lead to significant performance degradation and inefficient usage of system resources [3], [6]–[13].


This paper is available on arxiv under CC BY 4.0 DEED.