Adaptive Action Masking: Accelerating Decision-Making in Database Tuning

Table of Links

Related Works

2.1 Traditional Index Selection Approaches

2.2 RL-based Index Selection Approaches
Index Selection Problem
Methodology

4.1 Formulation of the DRL Problem

4.2 Instance-Aware Deep Reinforcement Learning for Efficient Index Selection
System Framework of IA2

5.1 Preprocessing Phase

5.2 RL Training and Application Phase
Experiments

6.1 Experimental Setting

6.2 Experimental Results

6.3 End-to-End Performance Comparison

6.4 Key Insights
Conclusion and Future Work, and References

4 Methodology

In this section, we outline our novel approach to the Index Selection Problem (ISP), significantly advancing the application of Deep Reinforcement Learning (DRL). Our methodology not only frames the ISP within a DRL context but also introduces a groundbreaking RL model, the Twin Delayed Deep Deterministic Policy Gradients-Temporal Difference-State-Wise Action Refinery (TD3-TD-SWAR). Drawing on the innovative work of [12] and [15], TD3-TD-SWAR is specifically designed to efficiently manage the ISP’s complex solution space. A key novelty of our approach is the adaptive action masking mechanism, which accelerates training and sharpens decision-making by filtering out less beneficial actions based on current conditions, achieving optimal index selection with minimal decision steps.

4.1 Formulation of the DRL Problem

Deep Reinforcement Learning (DRL) offers a method for sequential decision-making to optimize database index configurations, addressing the Index Selection Problem (ISP) [7, 13]. The goal in ISP is for an agent to find an optimal index set 𝐼∗ from candidates to reduce workload execution costs. Given budget constraints, the agent strategically sequences new index additions, navigating the dynamic environment to find configurations that improve database performance over time, showcasing ISP as a fundamental DRL challenge that underscores strategic decision-making under constraints.

While basic Deep Reinforcement Learning (DRL) techniques are applicable, the challenge of large action space, i.e., too many index candidates, necessitates an advanced solution. Our Twin Delayed Deep Deterministic Policy GradientsTemporal Difference-State-Wise Action Refinery (TD3-TDSWAR) model addresses this by refining action space pruning, ensuring efficiency and precision in tackling the complexities of index selection under storage and training limitations.

Authors:

(1) Taiyi Wang, University of Cambridge, Cambridge, United Kingdom ([email protected]);

(2) Eiko Yoneki, University of Cambridge, Cambridge, United Kingdom ([email protected]).

This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.