This story draft by @configuring has not been reviewed by an editor, YET.
Authors:
(1) Limeng Zhang, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia;
(2) M. Ali Babar, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia.
1.1 Configuration Parameter Tuning Challenges and 1.2 Contributions
3 Overview of Tuning Framework
4 Workload Characterization and 4.1 Query-level Characterization
4.2 Runtime-based Characterization
5 Feature Pruning and 5.1 Workload-level Pruning
5.2 Configuration-level Pruning
7 Configuration Recommendation and 7.1 Bayesian Optimization
10 Discussion and Conclusion, and References
Recently, the authors in [84] provided an experimental evaluation of several BO-based solutions and RL-based solutions and demonstrated how hyper-parameter optimization algorithms can be borrowed to further enhance database configuration tuning. After that, the authors in [85] conducted a survey on the state-of-the-art DBMS tuning methods, including heuristic methods, Bayesian optimization methods, deep learning methods, and reinforcement learning methods. They illustrated the automatic tuning pipeline ranging from data preparation to configuration tuning.
In this study, in addition to elucidating the automatic tuning framework on DBMSs, we provide an in-depth analysis of each component. Specifically, we outline the primary tuning objectives and summarize three main constraints or factors in automatic configuration tuning in DBMS: overhead, adaptivity, and safety. Regarding workload characterization, we present methods for modeling a workload in terms of queries and DBMS runtime metrics. In the feature pruning section, we offer insights into pruning strategies at both the workload and configuration levels. Concerning knowledge transfer, we consolidate existing techniques and present two directions for adopting this knowledge. In the subsequent configuration recommendation section, we provide an overview of existing methods and highlight the available design considerations for each category. Additionally, we include several relevant automatic tuning methods that focus on big data analytics frameworks, which face similar challenges in identifying the optimal configuration within the complex and interdependent configuration space. Finally, in the experiment setting, we summarize popular benchmarks for evaluating DBMS performance.
This paper is available on arxiv under CC BY 4.0 DEED.