This story draft by @configuring has not been reviewed by an editor, YET.
Authors:
(1) Limeng Zhang, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia;
(2) M. Ali Babar, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia.
1.1 Configuration Parameter Tuning Challenges and 1.2 Contributions
3 Overview of Tuning Framework
4 Workload Characterization and 4.1 Query-level Characterization
4.2 Runtime-based Characterization
5 Feature Pruning and 5.1 Workload-level Pruning
5.2 Configuration-level Pruning
7 Configuration Recommendation and 7.1 Bayesian Optimization
10 Discussion and Conclusion, and References
This paper presents a comprehensive overview of predominant methodologies utilized in the automatic tuning of parameters within database management systems. The study explores a diverse array of configuration tuning techniques, encompassing Bayesian optimization, Neural network based approaches, Reinforcement learning methodologies, and Search-based strategies. By systematically dissecting the tuning process into discrete components—comprising tuning objectives, workload characterization, feature pruning, knowledge from experience, configuration recommendation, and experimental settings—this research provides nuanced insights into the strategic intricacies inherent within each phase.
Existing tuning methodologies have undergone extensive investigation into parameter optimization pertaining to DBMS performance, integrating considerations of overhead, adaptivity, and safety concerns. When addressing this tuning task, one essential aspect is workload characterization. The dynamic nature of on-demand cloud applications often necessitates more intricate and varied requirements for the cloud database. These requirements can enrich the application profiling process within the tuning framework, facilitating the optimization of DBMS parameters. Additionally, another essential aspect is data collection and search space reduction. Currently, ML-based solutions, especially for BO and NN solutions, typically require sufficient samples to bootstrap the tuning framework, which can be quite time-intensive. Regarding search space reduction, automatic tuning of DBMS can benefit from innovative research in the hyperparameter optimization problem area, such as distributional variance among source datasets and target datasets [86], as well as search space reduction techniques [87], [88]. Finally, some other DBMS characteristics can also be considered in the tuning framework, such as database scalability which elucidates performance fluctuations in response to changes in resource capacity, and database elasticity denoting the speed and precision with which a system adapts its allocated resources to varying load intensities. They also emerge as critical considerations in contemporary cloud computing environments [89]–[93].
[1] Y.-L. Choi, W.-S. Jeon, and S.-H. Yoon, “Improving database system performance by applying nosql,” Journal Of Information Processing Systems, vol. 10, no. 3, pp. 355–364, 2014.
[2] K. Sahatqija, J. Ajdari, X. Zenuni, B. Raufi, and F. Ismaili, “Comparison between relational and nosql databases,” in 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, 2018, pp. 0216– 0221.
[3] X. Zhang, H. Wu, Y. Li, J. Tan, F. Li, and B. Cui, “Towards dynamic and safe configuration tuning for cloud databases,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 631–645.
[4] Z. Yan, J. Lu, N. Chainani, and C. Lin, “Workload-aware performance tuning for autonomous dbmss,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021, pp. 2365–2368.
[5] S. Duan, V. Thummala, and S. Babu, “Tuning database configuration parameters with ituned,” Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 1246–1257, 2009.
[6] X. Zhang, H. Wu, Z. Chang, S. Jin, J. Tan, F. Li, T. Zhang, and B. Cui, “Restune: Resource oriented tuning boosted by metalearning for cloud databases,” in Proceedings of the 2021 international conference on management of data, 2021, pp. 2102–2114.
[7] J. Xin, K. Hwang, and Z. Yu, “Locat: Low-overhead online configuration auto-tuning of spark sql applications,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 674–684.
[8] D. Van Aken, A. Pavlo, G. J. Gordon, and B. Zhang, “Automatic database management system tuning through large-scale machine learning,” in Proceedings of the 2017 ACM international conference on management of data, 2017, pp. 1009–1024.
[9] G. Li, X. Zhou, S. Li, and B. Gao, “Qtune: A query-aware database tuning system with deep reinforcement learning,” Proceedings of the VLDB Endowment, vol. 12, no. 12, pp. 2118–2130, 2019.
[10] Y. Zhu, J. Liu, M. Guo, Y. Bao, W. Ma, Z. Liu, K. Song, and Y. Yang, “Bestconfig: tapping the performance potential of systems via automatic configuration tuning,” in Proceedings of the 2017 Symposium on Cloud Computing, SantaClara,CA,USA, 2017, pp. 338–350.
[11] L. Bao, X. Liu, and W. Chen, “Learning-based automatic parameter tuning for big data analytics frameworks,” in 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018, pp. 181–190.
[12] J. Zhang, Y. Liu, K. Zhou, G. Li, Z. Xiao, B. Cheng, J. Xing, Y. Wang, T. Cheng, L. Liu et al., “An end-to-end automatic cloud database tuning system using deep reinforcement learning,” in Proceedings of the 2019 International Conference on Management of Data (ICDM), 2019, pp. 415–432.
[13] B. Cai, Y. Liu, C. Zhang, G. Zhang, K. Zhou, L. Liu, C. Li, B. Cheng, J. Yang, and J. Xing, “Hunter: an online cloud database hybrid tuning system for personalized requirements,” in Proceedings of the 2022 International Conference on Management of Data, 2022, pp. 646– 659.
[14] D. Van Aken, D. Yang, S. Brillard, A. Fiorino, B. Zhang, C. Bilien, and A. Pavlo, “An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems,” Proceedings of the VLDB Endowment, vol. 14, no. 7, pp. 1241–1253, 2021.
[15] J. Tan, T. Zhang, F. Li, J. Chen, Q. Zheng, P. Zhang, H. Qiao, Y. Shi, W. Cao, and R. Zhang, “ibtune: Individualized buffer tuning for large-scale cloud databases,” Proceedings of the VLDB Endowment, vol. 12, no. 10, pp. 1221–1234, 2019.
[16] K. Kanellis, C. Ding, B. Kroth, A. Muller, C. Curino, and ¨ S. Venkataraman, “Llamatune: sample-efficient dbms configuration tuning,” arXiv preprint arXiv:2203.05128, 2022.
[17] S. Cereda, S. Valladares, P. Cremonesi, and S. Doni, “Cgptuner: a contextual gaussian process bandit approach for the automatic tuning of it configurations under varying workload conditions,” Proceedings of the VLDB Endowment, vol. 14, no. 8, pp. 1401–1413, 2021.
[18] I. Trummer, “Db-bert: a database tuning tool that” reads the manual”,” in Proceedings of the 2022 international conference on management of data, 2022, pp. 190–203.
[19] M. Kunjir and S. Babu, “Black or white? how to develop an autotuner for memory-based analytics,” in Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020, pp. 1667–1683.
[20] F. Song, K. Zaouk, C. Lyu, A. Sinha, Q. Fan, Y. Diao, and P. Shenoy, “Spark-based cloud data analytics using multi-objective optimization,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 2021, pp. 396–407.
[21] C. Lin, J. Zhuang, J. Feng, H. Li, X. Zhou, and G. Li, “Adaptive code learning for spark configuration tuning,” in 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022, pp. 1995–2007.
[22] Y. Gur, D. Yang, F. Stalschus, and B. Reinwald, “Adaptive multimodel reinforcement learning for online database tuning.” in EDBT, 2021, pp. 439–444.
[23] J.-K. Ge, Y.-F. Chai, and Y.-P. Chai, “Watuning: a workload-aware tuning system with attention-based deep reinforcement learning,” Journal of Computer Science and Technology, vol. 36, no. 4, pp. 741– 761, 2021.
[24] M. 8.0, “Innodb information schema metrics table,” https://dev.mysql.com/doc/refman/8.0/en/ innodb-information-schema-metrics-table.html, 2023.
[25] T. Hastie, R. Tibshirani, J. H. Friedman, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction. Springer, 2009, vol. 2. [26] J. H. Zar, “Spearman rank correlation,” Encyclopedia of biostatistics, vol. 7, 2005.
[27] K. Kanellis, R. Alagappan, and S. Venkataraman, “Too many knobs to tune? towards faster database tuning by pre-selecting important knobs,” in 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20), 2020.
[28] A. Nayebi, A. Munteanu, and M. Poloczek, “A framework for bayesian optimization in embedded subspaces,” in International Conference on Machine Learning. PMLR, 2019, pp. 4752–4761.
[29] Z. Cao, G. Kuenning, and E. Zadok, “Carver: Finding important parameters for storage system tuning,” in 18th USENIX Conference on File and Storage Technologies (FAST 20), 2020, pp. 43–57.
[30] X. Zhang, H. Wu, Y. Li, Z. Tang, J. Tan, F. Li, and B. Cui, “An efficient transfer learning based configuration adviser for database tuning,” Proceedings of the VLDB Endowment, vol. 17, no. 3, pp. 539– 552, 2023.
[31] C. K. J. Hou and K. Behdinan, “Dimensionality reduction in surrogate modeling: A review of combined methods,” Data Science and Engineering, vol. 7, no. 4, pp. 402–427, 2022.
[32] S. Yang, J. Wen, X. Zhan, and D. Kifer, “Et-lasso: a new efficient tuning of lasso-type regularization for high-dimensional data,” in Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2019, pp. 607–616.
[33] T. Bai, Y. Li, Y. Shen, X. Zhang, W. Zhang, and B. Cui, “Transfer learning for bayesian optimization: A survey,” arXiv preprint arXiv:2302.05927, 2023.
[34] F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential modelbased optimization for general algorithm configuration,” in Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5. Springer, 2011, pp. 507–523.
[35] M. Seeger, “Gaussian processes for machine learning,” International journal of neural systems, vol. 14, no. 02, pp. 69–106, 2004.
[36] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams, “Scalable bayesian optimization using deep neural networks,” in International conference on machine learning. PMLR, 2015, pp. 2171–2180.
[37] J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl, “Algorithms ´ for hyper-parameter optimization,” Advances in neural information processing systems, vol. 24, 2011.
[38] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global optimization, vol. 13, pp. 455–492, 1998.
[39] M. Hoffman, E. Brochu, N. De Freitas et al., “Portfolio allocation for bayesian optimization.” in UAI, 2011, pp. 327–336.
[40] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer school on machine learning. Springer, 2003, pp. 63–71.
[41] P. Hennig and C. J. Schuler, “Entropy search for informationefficient global optimization.” Journal of Machine Learning Research, vol. 13, no. 6, 2012.
[42] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” Advances in neural information processing systems, vol. 25, 2012.
[43] J. Mockus, Bayesian Approach to Global Optimization Theory and Applications, 1989.
[44] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” arXiv preprint arXiv:0912.3995, 2009.
[45] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602, 2013.
[46] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, vol. 96, no. 34, 1996, pp. 226–231.
[47] M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter, “Smac3: A versatile bayesian optimization package for hyperparameter optimization,” The Journal of Machine Learning Research, vol. 23, no. 1, pp. 2475–2483, 2022.
[48] Y. Li, Y. Shen, W. Zhang, Y. Chen, H. Jiang, M. Liu, J. Jiang, J. Gao, W. Wu, Z. Yang et al., “Openbox: A generalized blackbox optimization service,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 3209– 3219.
[49] A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah et al., “Self-driving database management systems.” in CIDR, vol. 4, 2017, p. 1.
[50] S. Cereda, G. Palermo, P. Cremonesi, and S. Doni, “A collaborative filtering approach for the automatic tuning of compiler optimisations,” in The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2020, pp. 15– 25.
[51] N. Schilling, M. Wistuba, and L. Schmidt-Thieme, “Scalable hyperparameter optimization with products of gaussian process experts,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 2016, pp. 33–48.
[52] M. Wistuba, N. Schilling, and L. Schmidt-Thieme, “Two-stage transfer surrogate model for automatic hyperparameter optimization,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16. Springer, 2016, pp. 199–214.
[53] M. Wistuba, N. Schilling, and L. Schmidt Thieme, “Scalable gaussian process-based transfer surrogates for hyperparameter optimization,” Machine Learning, vol. 107, no. 1, pp. 43–78, 2018.
[54] M. Feurer, B. Letham, and E. Bakshy, “Scalable meta-learning for bayesian optimization using ranking-weighted gaussian process ensembles,” in AutoML Workshop at ICML, vol. 7, 2018, p. 5.
[55] Y. Li, Y. Shen, H. Jiang, W. Zhang, Z. Yang, C. Zhang, and B. Cui, “Transbo: Hyperparameter optimization via two-phase transfer learning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 956–966.
[56] D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley, “Google vizier: A service for black-box optimization,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 1487–1495.
[57] S. Gelly and D. Silver, “Combining online and offline knowledge in uct,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 273–280.
[58] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
[59] M. P. Deisenroth, C. E. Rasmussen, and D. Fox, “Learning to control a low-cost manipulator using data-efficient reinforcement learning,” Robotics: Science and Systems VII, vol. 7, pp. 57–64, 2011.
[60] J. Peters and S. Schaal, “Learning to control in operational space,” The International Journal of Robotics Research, vol. 27, no. 2, pp. 197– 212, 2008.
[61] X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “Endto-end task-completion neural dialogue systems,” arXiv preprint arXiv:1703.01008, 2017.
[62] M. Johnson, M. Schuster, Q. V. Le, M. Krikun, Y. Wu, Z. Chen, N. Thorat, F. Viegas, M. Wattenberg, G. Corrado ´ et al., “Google’s multilingual neural machine translation system: Enabling zeroshot translation,” Transactions of the Association for Computational Linguistics, vol. 5, pp. 339–351, 2017.
[63] D. Bahdanau, P. Brakel, K. Xu, A. Goyal, R. Lowe, J. Pineau, A. Courville, and Y. Bengio, “An actor-critic algorithm for sequence prediction,” arXiv preprint arXiv:1607.07086, 2016.
[64] Y. Ling, S. A. Hasan, V. Datla, A. Qadir, K. Lee, J. Liu, and O. Farri, “Diagnostic inferencing via improving clinical concept extraction with deep reinforcement learning: A preliminary study,” in Machine Learning for Healthcare Conference. PMLR, 2017, pp. 271–285.
[65] C. Yu, J. Liu, S. Nemati, and G. Yin, “Reinforcement learning in healthcare: A survey,” ACM Computing Surveys (CSUR), vol. 55, no. 1, pp. 1–36, 2021.
[66] M. W. Brandt, A. Goyal, P. Santa-Clara, and J. R. Stroud, “A simulation approach to dynamic portfolio choice with an application to learning about return predictability,” The Review of Financial Studies, vol. 18, no. 3, pp. 831–873, 2005.
[67] J. Moody and M. Saffell, “Learning to trade via direct reinforcement,” IEEE transactions on neural Networks, vol. 12, no. 4, pp. 875– 889, 2001.
[68] G. Theocharous, P. S. Thomas, and M. Ghavamzadeh, “Ad recommendation systems for life-time value optimization,” in Proceedings of the 24th international conference on world wide web, 2015, pp. 1305–1310.
[69] B. Rolf, I. Jackson, M. Muller, S. Lang, T. Reggelin, and D. Ivanov, ¨ “A review on reinforcement learning algorithms and applications in supply chain management,” International Journal of Production Research, vol. 61, no. 20, pp. 7151–7179, 2023.
[70] A. Haydari and Y. Yılmaz, “Deep reinforcement learning for intelligent transportation systems: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 1, pp. 11–32, 2020.
[71] T. Qian, C. Shao, X. Wang, and M. Shahidehpour, “Deep reinforcement learning for ev charging navigation by coordinating smart grid and intelligent transportation system,” IEEE transactions on smart grid, vol. 11, no. 2, pp. 1714–1723, 2019.
[72] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
[73] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017.
[74] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in International conference on machine learning. Pmlr, 2014, pp. 387–395.
[75] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
[76] M. Stonebraker and A. Pavlo, “The seats airline ticketing systems benchmark,” 2012. [77] B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with ycsb,” in Proceedings of the 1st ACM symposium on Cloud computing, 2010, pp. 143–154.
[78] D. E. Difallah, A. Pavlo, C. Curino, and P. Cudre-Mauroux, “Oltp-bench: An extensible testbed for benchmarking relational databases,” Proceedings of the VLDB Endowment, vol. 7, no. 4, pp. 277–288, 2013.
[79] V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann, “How good are query optimizers, really?” Proceedings of the VLDB Endowment, vol. 9, no. 3, pp. 204–215, 2015.
[80] T. Chiba, T. Yoshimura, M. Horie, and H. Horii, “Towards selecting best combination of sql-on-hadoop systems and jvms,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). IEEE, 2018, pp. 245–252.
[81] T. Ivanov and M.-G. Beer, “Evaluating hive and spark sql with bigbench,” arXiv preprint arXiv:1512.08417, 2015.
[82] Y. Ramdane, O. Boussaid, N. Kabachi, and F. Bentayeb, “Partitioning and bucketing techniques to speed up query processing in spark-sql,” in 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, 2018, pp. 142–151.
[83] S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, “The hibench benchmark suite: Characterization of the mapreduce-based data analysis,” in 2010 IEEE 26th International conference on data engineering workshops (ICDEW 2010). IEEE, 2010, pp. 41–51.
[84] X. Zhang, Z. Chang, Y. Li, H. Wu, J. Tan, F. Li, and B. Cui, “Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation,” Proceedings of the VLDB Endowment, vol. 15, no. 9, pp. 1808–1821, 2022.
[85] X. Zhao, X. Zhou, and G. Li, “Automatic database knob tuning: A survey,” IEEE Transactions on Knowledge and Data Engineering, 2023.
[86] M. Nomura and Y. Saito, “Efficient hyperparameter optimization under multi-source covariate shift,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1376–1385.
[87] Y. Liu, X. Wang, X. Xu, J. Yang, and W. Zhu, “Meta hyperparameter optimization with adversarial proxy subsets sampling,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 1109–1118.
[88] Y. Li, Y. Shen, H. Jiang, T. Bai, W. Zhang, C. Zhang, and B. Cui, “Transfer learning based search space design for hyperparameter tuning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 967–977.
[89] D. Agrawal, A. El Abbadi, S. Das, and A. J. Elmore, “Database scalability, elasticity, and autonomy in the cloud,” in International conference on database systems for advanced applications. Springer, 2011, pp. 2–15.
[90] S. Loesing, M. Pilman, T. Etter, and D. Kossmann, “On the design and scalability of distributed shared-data databases,” in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015, pp. 663–676.
[91] N. R. Herbst, S. Kounev, and R. Reussner, “Elasticity in cloud computing: What it is, and what it is not,” in 10th international conference on autonomic computing (ICAC 13), 2013, pp. 23–27.
[92] A. Papaioannou and K. Magoutis, “Incremental elasticity for nosql data stores,” in 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS). IEEE, 2017, pp. 174–183.
[93] D. Seybold, S. Volpert, S. Wesner, A. Bauer, N. Herbst, and J. Domaschka, “Kaa: Evaluating elasticity of cloud-hosted dbms,” in 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 2019, pp. 54–61.
This paper is available on arxiv under CC BY 4.0 DEED.