Problem Description DolphinScheduler version 1.3.9 When querying task instance lists and other interfaces, responses are sometimes slow, taking over 20 seconds to return results. Problem Diagnosis (1) Enable MySQL Slow Query Log Add the following configuration parameters to /etc/mysql.cnf: /etc/mysql.cnf slow_query_log = ON slow_query_log_file = /data/log/mysql/mysql-slow.log long_query_time = 2 slow_query_log = ON slow_query_log_file = /data/log/mysql/mysql-slow.log long_query_time = 2 (2) Slow Query Log Analysis The following types of SQL queries frequently time out in the logs: select * from t_ds_process_instance where 1=1 and state in (0, 1, 2, 4) and process_definition_id = 71 and (schedule_time >= '2022-07-20 00:00:00' and schedule_time <= '2022-07-20 23:59:59.999' or start_time >= '2022-07-20 00:00:00' and start_time <= '2022-07-20 23:59:59.999') order by start_time desc limit 1; select * from t_ds_process_instance where 1=1 and state in (0, 1, 2, 4) and process_definition_id = 71 and (schedule_time >= '2022-07-20 00:00:00' and schedule_time <= '2022-07-20 23:59:59.999' or start_time >= '2022-07-20 00:00:00' and start_time <= '2022-07-20 23:59:59.999') order by start_time desc limit 1; select instance.*, process.name as process_instance_name from t_ds_task_instance instance left join t_ds_process_definition define on instance.process_definition_id = define.id left join t_ds_process_instance process on process.id = instance.process_instance_id where define.project_id = 6 order by instance.start_time desc LIMIT 0,10; select instance.*, process.name as process_instance_name from t_ds_task_instance instance left join t_ds_process_definition define on instance.process_definition_id = define.id left join t_ds_process_instance process on process.id = instance.process_instance_id where define.project_id = 6 order by instance.start_time desc LIMIT 0,10; The main tables involved are t_ds_process_instance and t_ds_task_instance. t_ds_process_instance t_ds_task_instance Using the explain analyze command, we found that the first type of SQL query consumes time due to using the start_time_index index, which has poor selectivity. A more selective index is needed to improve query performance: explain analyze start_time_index -> Limit: 1 row(s) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1) -> Filter: ((t_ds_process_instance.process_definition_id = 18) and (((t_ds_process_instance.schedule_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.schedule_time <= TIMESTAMP'2022-07-18 23:59:59.999')) or ((t_ds_process_instance.start_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.start_time <= TIMESTAMP'2022-07-18 23:59:59.999')))) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1) -> Index scan on t_ds_process_instance using start_time_index (reverse) (cost=0.20 rows=2) (actual time=0.030..103.165 rows=1239 loops=1) -> Limit: 1 row(s) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1) -> Filter: ((t_ds_process_instance.process_definition_id = 18) and (((t_ds_process_instance.schedule_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.schedule_time <= TIMESTAMP'2022-07-18 23:59:59.999')) or ((t_ds_process_instance.start_time >= TIMESTAMP'2022-07-18 00:00:00') and (t_ds_process_instance.start_time <= TIMESTAMP'2022-07-18 23:59:59.999')))) (cost=0.20 rows=0) (actual time=103.391..103.391 rows=1 loops=1) -> Index scan on t_ds_process_instance using start_time_index (reverse) (cost=0.20 rows=2) (actual time=0.030..103.165 rows=1239 loops=1) The second type of SQL query mainly consumes time during the left join phase, which requires updating the indexed fields: -> Limit: 10 row(s) (actual time=3601.141..3601.147 rows=10 loops=1) -> Sort row IDs: `instance`.start_time DESC, limit input to 10 row(s) per chunk (actual time=3601.140..3601.145 rows=10 loops=1) -> Table scan on <temporary> (cost=0.01..3774.21 rows=301738) (actual time=0.002..190.179 rows=722743 loops=1) -> Temporary table (cost=469419.96..473194.16 rows=301738) (actual time=3332.896..3551.716 rows=722743 loops=1) -> Nested loop left join (cost=439246.15 rows=301738) (actual time=0.051..1431.254 rows=722743 loops=1) -> Nested loop inner join (cost=107334.40 rows=301738) (actual time=0.045..1239.699 rows=722743 loops=1) -> Filter: (define.id is not null) (cost=12.40 rows=121) (actual time=0.025..0.162 rows=121 loops=1) -> Index lookup on define using process_definition_index (project_id=6) (cost=12.40 rows=121) (actual time=0.025..0.110 rows=121 loops=1) -> Index lookup on instance using task_instance_index (process_definition_id=define.id) (cost=639.65 rows=2494) (actual time=0.006..9.972 rows=5973 loops=121) -> Single-row index lookup on process using PRIMARY (id = `instance`.process_instance_id) (cost=1.00 rows=1) (actual time=0.000..0.000 rows=1 loops=722743) -> Limit: 10 row(s) (actual time=3601.141..3601.147 rows=10 loops=1) -> Sort row IDs: `instance`.start_time DESC, limit input to 10 row(s) per chunk (actual time=3601.140..3601.145 rows=10 loops=1) -> Table scan on <temporary> (cost=0.01..3774.21 rows=301738) (actual time=0.002..190.179 rows=722743 loops=1) -> Temporary table (cost=469419.96..473194.16 rows=301738) (actual time=3332.896..3551.716 rows=722743 loops=1) -> Nested loop left join (cost=439246.15 rows=301738) (actual time=0.051..1431.254 rows=722743 loops=1) -> Nested loop inner join (cost=107334.40 rows=301738) (actual time=0.045..1239.699 rows=722743 loops=1) -> Filter: (define.id is not null) (cost=12.40 rows=121) (actual time=0.025..0.162 rows=121 loops=1) -> Index lookup on define using process_definition_index (project_id=6) (cost=12.40 rows=121) (actual time=0.025..0.110 rows=121 loops=1) -> Index lookup on instance using task_instance_index (process_definition_id=define.id) (cost=639.65 rows=2494) (actual time=0.006..9.972 rows=5973 loops=121) -> Single-row index lookup on process using PRIMARY (id = `instance`.process_instance_id) (cost=1.00 rows=1) (actual time=0.000..0.000 rows=1 loops=722743) (3) Problem Identification MySQL Configuration Not Optimized The current MySQL setup uses default parameters. The following parameters significantly impact query performance: innodb_buffer_pool_size: The buffer pool used by InnoDB to cache index and raw data. innodb_thread_concurrency: Set to 0 to indicate no thread limit, allowing better use of CPU multi-core performance and improving concurrency. innodb_write_io_threads & innodb_read_io_threads: InnoDB uses background threads to process read/write I/O requests on data pages. These parameters need tuning to improve MySQL performance. Unoptimized DolphinScheduler Indexes The indexes involved in the slow queries need optimization. Large Tables in DolphinScheduler The SQL queries involve large tables (t_ds_task_instance), with data size over 4GB and more than 2 million records. These should be split and archived. MySQL Configuration Not Optimized The current MySQL setup uses default parameters. The following parameters significantly impact query performance: innodb_buffer_pool_size: The buffer pool used by InnoDB to cache index and raw data. innodb_thread_concurrency: Set to 0 to indicate no thread limit, allowing better use of CPU multi-core performance and improving concurrency. innodb_write_io_threads & innodb_read_io_threads: InnoDB uses background threads to process read/write I/O requests on data pages. These parameters need tuning to improve MySQL performance. MySQL Configuration Not Optimized The current MySQL setup uses default parameters. The following parameters significantly impact query performance: MySQL Configuration Not Optimized innodb_buffer_pool_size: The buffer pool used by InnoDB to cache index and raw data. innodb_thread_concurrency: Set to 0 to indicate no thread limit, allowing better use of CPU multi-core performance and improving concurrency. innodb_write_io_threads & innodb_read_io_threads: InnoDB uses background threads to process read/write I/O requests on data pages. innodb_buffer_pool_size: The buffer pool used by InnoDB to cache index and raw data. innodb_buffer_pool_size innodb_thread_concurrency: Set to 0 to indicate no thread limit, allowing better use of CPU multi-core performance and improving concurrency. innodb_thread_concurrency 0 innodb_write_io_threads & innodb_read_io_threads: InnoDB uses background threads to process read/write I/O requests on data pages. innodb_write_io_threads innodb_read_io_threads These parameters need tuning to improve MySQL performance. Unoptimized DolphinScheduler Indexes The indexes involved in the slow queries need optimization. Unoptimized DolphinScheduler Indexes The indexes involved in the slow queries need optimization. Unoptimized DolphinScheduler Indexes Large Tables in DolphinScheduler The SQL queries involve large tables (t_ds_task_instance), with data size over 4GB and more than 2 million records. These should be split and archived. Large Tables in DolphinScheduler The SQL queries involve large tables (t_ds_task_instance), with data size over 4GB and more than 2 million records. These should be split and archived. Large Tables in DolphinScheduler t_ds_task_instance Optimization Plan (1) Large Table Archiving Create Backup Tables Create t_ds_process_instance_bak using the DDL of t_ds_process_instance. Create Backup Tables Create t_ds_process_instance_bak using the DDL of t_ds_process_instance. Create Backup Tables t_ds_process_instance_bak t_ds_process_instance Create t_ds_task_instance_bak using the DDL of t_ds_task_instance. t_ds_task_instance_bak t_ds_task_instance Note: Remove foreign key constraints: Note CONSTRAINT `foreign_key_instance_id` FOREIGN KEY (`process_instance_id`) REFERENCES `t_ds_process_instance` (`id`) ON DELETE CASCADE ON UPDATE RESTRICT CONSTRAINT `foreign_key_instance_id` FOREIGN KEY (`process_instance_id`) REFERENCES `t_ds_process_instance` (`id`) ON DELETE CASCADE ON UPDATE RESTRICT Move Historical Data to Backup Tables Move data before the archive date (e.g., 2022-01-01 00:00:00) to the backup tables: Move Historical Data to Backup Tables Move data before the archive date (e.g., 2022-01-01 00:00:00) to the backup tables: Move Historical Data to Backup Tables 2022-01-01 00:00:00 INSERT INTO t_ds_process_instance_copy SELECT * FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; INSERT INTO t_ds_process_instance_copy SELECT * FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; INSERT INTO t_ds_task_instance_copy SELECT * FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; INSERT INTO t_ds_task_instance_copy SELECT * FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; Verify Backup Success Verify Backup Success Verify Backup Success SELECT COUNT(*) FROM t_ds_process_instance_copy; SELECT COUNT(*) FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; SELECT COUNT(*) FROM t_ds_task_instance_copy; SELECT COUNT(*) FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; SELECT COUNT(*) FROM t_ds_process_instance_copy; SELECT COUNT(*) FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; SELECT COUNT(*) FROM t_ds_task_instance_copy; SELECT COUNT(*) FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; Delete Historical Data Delete Historical Data Delete Historical Data DELETE FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; DELETE FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; DELETE FROM t_ds_process_instance WHERE start_time < '2022-01-01 00:00:00'; DELETE FROM t_ds_task_instance WHERE start_time < '2022-01-01 00:00:00'; (2) Parameter Optimization Modify the following parameters in the /etc/my.cnf MySQL configuration file and restart MySQL: /etc/my.cnf innodb_log_buffer_size = 64M innodb_buffer_pool_size = 20G // Adjust based on server memory. If MySQL is on a dedicated machine, set to 70% of physical memory. innodb_log_file_size = 1G innodb_thread_concurrency = 0 join_buffer_size = 64M sort_buffer_size = 64M innodb_read_io_threads = 16 // Adjust based on server CPU innodb_write_io_threads = 16 // Adjust based on server CPU innodb_log_buffer_size = 64M innodb_buffer_pool_size = 20G // Adjust based on server memory. If MySQL is on a dedicated machine, set to 70% of physical memory. innodb_log_file_size = 1G innodb_thread_concurrency = 0 join_buffer_size = 64M sort_buffer_size = 64M innodb_read_io_threads = 16 // Adjust based on server CPU innodb_write_io_threads = 16 // Adjust based on server CPU (3) Index Optimization CREATE INDEX state_index ON t_ds_process_instance(state, process_definition_id); CREATE INDEX start_time_process_definition_index ON t_ds_task_instance(start_time, process_definition_id); ALTER TABLE t_ds_task_instance DROP INDEX task_instance_index; CREATE INDEX state_index ON t_ds_process_instance(state, process_definition_id); CREATE INDEX start_time_process_definition_index ON t_ds_task_instance(start_time, process_definition_id); ALTER TABLE t_ds_task_instance DROP INDEX task_instance_index;