1. Preface For Apache DolphinScheduler cluster deployment, the author has summarized a document that can be followed directly from start to finish, facilitating subsequent operations such as deployment, upgrade, adding nodes, and reducing nodes. 2. Preparations 2.1. Basic Components JDK: Download JDK (1.8+ link), install, and configure the JAVA_HOME environment variable. Append the bin directory to the PATH environment variable. Skip if JDK is already installed. Binary Package: Download DolphinScheduler binary package from here. Database: PostgreSQL (8.2.15+ link) or MySQL (5.7+). Choose either. For MySQL, JDBC Driver 8 version is required, which can be downloaded from the central repository. Registry Center: ZooKeeper (3.4.6+). Download from here. Process Tree Analysis macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. Note: DolphinScheduler does not depend on Hadoop, Hive, Spark, etc., but if your tasks require them, corresponding environment support is needed. 3. Upload Upload the binary package and extract it to a directory. Specify the directory location as per your preference. Pay attention to directory names; it's advisable to add some characters to differentiate between the installation directory and the directory where the binary package is extracted. For example: tar -xvf apache-dolphinscheduler-3.1.7-bin.tar.gz mv apache-dolphinscheduler-3.1.7-bin dolphinscheduler-3.1.7-origin The '-origin' suffix indicates the original extracted binary package. When there are configuration changes later, you can modify the files in this directory and then re-execute the installation script. 4. User configurations 4.1. Configure User Permissions and Passwordless Access Create a deployment user and ensure to configure sudo passwordless access. For example: # Create user (requires root login) useradd dolphinscheduler # Set password echo "dolphinscheduler" | passwd --stdin dolphinscheduler # Configure sudo passwordless access sed -i '$a dolphinscheduler ALL=(ALL) NOPASSWD: ALL' /etc/sudoers sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers # Modify directory permissions to grant deployment user access to the extracted apache-dolphinscheduler-*-bin directory chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin Note: Deployment user needs sudo privileges for task execution services, and it should be passwordless. Beginners can ignore this for now. If "/etc/sudoers" contains "Defaults requirett", comment it out. 4.2. Configure SSH Passwordless Login for Machines SSH passwordless login is required for resource transfer between different machines. Follow these steps to configure it: su dolphinscheduler ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # Execute the following command; otherwise, passwordless login will fail chmod 600 ~/.ssh/authorized_keys Note: After configuration, you can test by running ssh localhost to check if login without password is successful. 5. Start ZooKeeper Simply start ZooKeeper in the cluster. 6 Modify Configuration All the following operations should be executed under the dolphinscheduler user. After preparing the basic environment, modify the configuration files based on your machine environment. Configuration files can be found in the bin/env directory, namely install_env.sh and dolphinscheduler_env.sh. 6.1 install_env.sh The install_env.sh file configures where DolphinScheduler will be installed on which machines, and which services will be installed on each machine. You can find this file in the bin/env/ directory, then follow the instructions below to modify the corresponding configurations. # --------------------------------------------------------- # INSTALL MACHINE # --------------------------------------------------------- # A comma separated list of machine hostname or IP would be installed DolphinScheduler, # including master, worker, api, alert. If you want to deploy in pseudo-distributed # mode, just write a pseudo-distributed hostname # Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5" # Configure the machines where DolphinScheduler will be installed. ips=${ips:-"ds01,ds02,ds03,hadoop02,hadoop03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine # modify it if you use different ssh port sshPort=${sshPort:-"22"} # A comma separated list of machine hostname or IP would be installed Master server, it # must be a subset of configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" # Configure the machines where the Master server will be installed. masters=${masters:-"ds01,ds02,ds03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a # subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" # To configure which machines the Worker role will be installed on, you need to specify a comma-separated list of machine hostnames or IP addresses along with their corresponding worker groups in the `workers` variable. By default, all workers are placed in the `default` worker group. Additional worker groups can be configured individually through the DolphinScheduler interface. workers=${workers:-"ds01:default,ds02:default,ds03:default,hadoop02:default,hadoop03:default,hadoop04:default,hadoop05:default,hadoop06:default,hadoop07:default,hadoop08:default"} # A comma separated list of machine hostname or IP would be installed Alert server, it # must be a subset of configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" # To configure which machine the Alert role will be installed on, specify a single machine alertServer=${alertServer:-"hadoop03"} # A comma separated list of machine hostname or IP would be installed API server, it # must be a subset of configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" # To configure which machine the Alert role will be installed on, specify a single machine apiServers=${apiServers:-"hadoop04"} # The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists. # Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path. # Installation path configuration: It will be installed on all machines in the Dolphin cluster. Make sure to differentiate it from the directory where the binary package is extracted. It's preferable to include the version number for easier upgrade operations later. installPath=${installPath:-"/opt/dolphinscheduler-3.1.5"} # The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh` # script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs # to be created by this user # Deployment user: Use the user created above for deployment. deployUser=${deployUser:-"dolphinscheduler"} # The root of zookeeper, for now DolphinScheduler default registry server is zookeeper. # Configure the name registered to the ZooKeeper znode. If multiple DolphinScheduler clusters are configured, different names need to be configured. zkRoot=${zkRoot:-"/dolphinscheduler"} 6.2. dolphinscheduler_env.sh You can find this file at the path bin/env/. It is used to configure some environment settings. Modify the corresponding configurations according to the following instructions: # JDK path, must be modified export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_202} # Database type, supports mysql, postgresql export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} # Connection URL, mainly modify the hostname below, and the last configuration is for the East Eight Zone export SPRING_DATASOURCE_URL="jdbc:mysql://hostname:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai" export SPRING_DATASOURCE_USERNAME=dolphinscheduler # If the password is complex, it needs to be enclosed in single quotes before and after export SPRING_DATASOURCE_PASSWORD='xxxxxxxxxxxxx' export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none} # Configure the time zone used when JVM starts for each role. Default is -UTC, if you want to fully support the East Eight Zone, set it to -GMT+8 export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8} export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10} export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} # Configure the zookeeper address used export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2181,hadoop02:2181,hadoop03:2181} # Configure some environment variables used according to your needs, install all required components by yourself export HADOOP_HOME=${HADOOP_HOME:-/opt/cloudera/parcels/CDH/lib/hadoop} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1} export SPARK_HOME2=${SPARK_HOME2:-/opt/spark-3.3.2} export PYTHON_HOME=${PYTHON_HOME:-/opt/python-3.9.16} export HIVE_HOME=${HIVE_HOME:-/opt/cloudera/parcels/CDH/lib/hive} export FLINK_HOME=${FLINK_HOME:-/opt/flink-1.15.3} export DATAX_HOME=${DATAX_HOME:-/opt/datax} export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel-2.1.3} export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PY 6.3. common.properties Download the hdfs-site.xml and core-site.xml files from your Hadoop cluster and place them in the api-server/conf/ and worker-server/conf/ directories. If you have set up an Apache native cluster, retrieve these files from the respective component's conf directory. For CDH, you can directly download them from the CDH interface. Modify these files located in the api-server/conf/ and worker-server/conf/ directories. These files mainly configure parameters related to resource uploads, such as uploading DolphinScheduler's resources to HDFS. Follow the instructions below to make the necessary modifications: # Local path, mainly used to store temporary files during task execution. Ensure that the user has read and write permissions for this directory. Generally, keep the default. If you encounter permission errors during task execution indicating insufficient permissions for files in this directory, simply change the directory permissions to 777. data.basedir.path=/tmp/dolphinscheduler # Resource view suffixes #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # Location to save resources, possible values: HDFS, S3, OSS, NONE resource.storage.type=HDFS # Base path for resource uploads, must start with /dolphinscheduler, ensure that the user has read and write permissions for this directory resource.storage.upload.base.path=/dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id=minioadmin # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.secret.access.key=minioadmin # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.region=cn-north-1 # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. resource.aws.s3.bucket.name=dolphinscheduler # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn resource.aws.s3.endpoint=http://localhost:9000 # alibaba cloud access key id, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.id=<your-access-key-id> # alibaba cloud access key secret, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.secret=<your-access-key-secret> # alibaba cloud region, required if you set resource.storage.type=OSS resource.alibaba.cloud.region=cn-hangzhou # oss bucket name, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.bucket.name=dolphinscheduler # oss bucket endpoint, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user=hdfs # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir # resource.hdfs.fs.defaultFS=hdfs://bigdata:8020 # whether to startup kerberos hadoop.security.authentication.startup.state=false # java.security.krb5.conf path java.security.krb5.conf.path=/opt/krb5.conf # login user from keytab username login.user.keytab.username=hdfs-mycluster@ESZ.COM # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time=2 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port=8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids=hadoop02,hadoop03 # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http://hadoop02:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable=false # datasource encryption salt datasource.encryption.salt=!@#$%^&* # data quality option data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path=/tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession=false # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable=true setTaskDirToTenant.enable=false # network interface preferred like eth0, default: empty #dolphin.scheduler.network.interface.preferred= # network IP gets priority, default: inner outer #dolphin.scheduler.network.priority.strategy=default # system env path #dolphinscheduler.env.path=dolphinscheduler_env.sh # development state development.state=false # rpc port alert.rpc.port=50052 # set path of conda.sh conda.path=/opt/anaconda3/etc/profile.d/conda.sh # Task resource limit state task.resource.limit.state=false # mlflow task plugin preset repository ml.mlflow.preset_repository=https://github.com/apache/dolphinscheduler-mlflow # mlflow task plugin preset repository version ml.mlflow.preset_repository_version="main" 6.4 application.yaml You need to modify the /conf/application.yaml file for all roles, including: master-server/conf/application.yaml, worker-server/conf/application.yaml, api-server/conf/application.yaml, and alert-server/conf/application.yaml. The main modification is to set the time zone. Here's the specific modification: spring: banner: charset: UTF-8 jackson: # Set the time zone to GMT+8, modify only this section time-zone: GMT+8 date-format: "yyyy-MM-dd HH:mm:ss" 6.5. service.57a50399.js和service.57a50399.js.gz You'll find these two files, service.57a50399.js and service.57a50399.js.gz, in the api-server/ui/assets/ and ui/assets/ directories, respectively. Navigate to each of these directories and locate the mentioned files. Then, open them using the vim command. Once opened, search for 15e3 and change it to 15e5. This modification adjusts the timeout for page responses. The default value 15e3 represents 15 seconds, and we're changing it to 1500 seconds. This change ensures that there won't be errors due to page timeouts when uploading large files. 7 Initialize the database To initialize the database, follow these steps: Driver Configuration: Copy the MySQL driver (8.x) to the lib directory of each DolphinScheduler role, including: api-server/libs alert-server/libs master-server/libs worker-server/libs tools/libs Database User: Log in to MySQL with the root user. Execute the following SQL commands (both MySQL 5 and MySQL 8 are supported): create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci; create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password'; grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%'; flush privileges; Execute Database Upgrade Script: Run the following command to execute the database upgrade script: bash tools/bin/upgrade-schema.sh 8. Installation: Run the installation script: bash ./bin/install.sh This script will remotely transfer all local files to the machines configured in the above configuration files using scp. It will then stop the corresponding roles on each machine and start them again. After the first installation, all roles will be started automatically. There's no need to start any roles separately. If any roles are not started, you can check the corresponding logs on the respective machines to identify the specific issues. 9. Start and stop the services Stop all services: bash ./bin/stop-all.sh Start all services: bash ./bin/start-all.sh Start/Stop Master: bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server Start/Stop Worker: bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server Start/Stop Api: bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server Start/Stop Alert: bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server It's crucial to note that you must execute these scripts using the user who installed DolphinScheduler to avoid permission issues. Each service has a dolphinscheduler_env.sh file in the <service>/conf/ directory, which provides convenience for microservice requirements. This means you can configure <service>/conf/dolphinscheduler_env.sh for the corresponding service and then start each service based on different environment variables using <service>/bin/start.sh command. However, if you start the server using the command /bin/dolphinscheduler-daemon.sh start <service>, it will override <service>/conf/dolphinscheduler_env.sh with the file bin/env/dolphinscheduler_env.sh and then start the service. This is done to reduce the cost of users modifying configurations. 10. Scaling Out 10.1. Standard Method Refer to the steps above and follow these operations: New Node - Install and configure JDK. - Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node. Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles. Disadvantages of this method: If DolphinScheduler has many tasks running at the minute level or real-time tasks such as Flink or Spark, stopping all roles and restarting them will take some time. During this period, tasks may stop abnormally due to the restart of the entire cluster or may not be scheduled normally. However, DolphinScheduler implements automatic fault tolerance and disaster recovery functions, so this operation is feasible. Finally, observe whether all tasks are executed normally. 10.2. Simple Method Refer to the steps above and follow these operations: New Node Install and configure JDK. Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Compress the entire directory previously configured, then transfer it to the new node. New Node Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh. Log in as the user who installed DolphinScheduler. Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is: ./dolphinscheduler-daemon.sh start master-server ./dolphinscheduler-daemon.sh start worker-server Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the corresponding roles have started on the new node. 11. Scaling In Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh script. The stop command is: ./dolphinscheduler-daemon.sh stop worker-server Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the roles stopped on the machine have disappeared. On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package: Log in as the user who installed Dolphin Scheduler. Modify the configuration file bin/env/install_env.sh. In this configuration file, remove the machines corresponding to the offline roles. 12. Upgrade Follow the steps above step by step. For operations that have been performed before, there is no need to perform them again. Below are some specific operation steps: Upload the new version binary package. Uncompress it to a directory different from the old version installation directory, or rename it. Modify the configuration files. A simpler way is to copy all the configuration files involved in the previous installation directory to the new version directory and replace them. Package all the components deployed on other nodes, then unpack and place them in the corresponding positions of the new node. To find out which components need to be copied, you can refer to the configuration in dolphinscheduler_env.sh file. Configure the drivers, referring to the steps in "Initializing the Database". Stop the previous cluster. Backup the entire database. Execute the database upgrade script, referring to the steps in "Initializing the Database". Execute the installation script, referring to "Installation". After the upgrade is complete, log in to the interface and check the "Monitor Center" to see if all roles have started successfully. 1. Preface For Apache DolphinScheduler cluster deployment, the author has summarized a document that can be followed directly from start to finish, facilitating subsequent operations such as deployment, upgrade, adding nodes, and reducing nodes. 2. Preparations 2.1. Basic Components JDK: Download JDK (1.8+ link), install, and configure the JAVA_HOME environment variable. Append the bin directory to the PATH environment variable. Skip if JDK is already installed. Binary Package: Download DolphinScheduler binary package from here. Database: PostgreSQL (8.2.15+ link) or MySQL (5.7+). Choose either. For MySQL, JDBC Driver 8 version is required, which can be downloaded from the central repository. Registry Center: ZooKeeper (3.4.6+). Download from here. Process Tree Analysis macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. JDK : Download JDK (1.8+ link ), install, and configure the JAVA_HOME environment variable. Append the bin directory to the PATH environment variable. Skip if JDK is already installed. JDK link Binary Package : Download DolphinScheduler binary package from here . Binary Package here Database : PostgreSQL (8.2.15+ link ) or MySQL (5.7+). Choose either. For MySQL, JDBC Driver 8 version is required, which can be downloaded from the central repository. Database link Registry Center : ZooKeeper (3.4.6+). Download from here . Registry Center here Process Tree Analysis macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. macOS : Install pstree. macOS Fedora/Red/Hat/CentOS/Ubuntu/Debian : Install psmisc. Fedora/Red/Hat/CentOS/Ubuntu/Debian Note: DolphinScheduler does not depend on Hadoop, Hive, Spark, etc., but if your tasks require them, corresponding environment support is needed. 3. Upload Upload the binary package and extract it to a directory. Specify the directory location as per your preference. Pay attention to directory names; it's advisable to add some characters to differentiate between the installation directory and the directory where the binary package is extracted. For example: tar -xvf apache-dolphinscheduler-3.1.7-bin.tar.gz mv apache-dolphinscheduler-3.1.7-bin dolphinscheduler-3.1.7-origin tar -xvf apache-dolphinscheduler-3.1.7-bin.tar.gz mv apache-dolphinscheduler-3.1.7-bin dolphinscheduler-3.1.7-origin The '-origin' suffix indicates the original extracted binary package. When there are configuration changes later, you can modify the files in this directory and then re-execute the installation script. 4. User configurations 4.1. Configure User Permissions and Passwordless Access Create a deployment user and ensure to configure sudo passwordless access. For example: # Create user (requires root login) useradd dolphinscheduler # Set password echo "dolphinscheduler" | passwd --stdin dolphinscheduler # Configure sudo passwordless access sed -i '$a dolphinscheduler ALL=(ALL) NOPASSWD: ALL' /etc/sudoers sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers # Modify directory permissions to grant deployment user access to the extracted apache-dolphinscheduler-*-bin directory chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin # Create user (requires root login) useradd dolphinscheduler # Set password echo "dolphinscheduler" | passwd --stdin dolphinscheduler # Configure sudo passwordless access sed -i '$a dolphinscheduler ALL=(ALL) NOPASSWD: ALL' /etc/sudoers sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers # Modify directory permissions to grant deployment user access to the extracted apache-dolphinscheduler-*-bin directory chown -R dolphinscheduler:dolphinscheduler apache-dolphinscheduler-*-bin Note: Deployment user needs sudo privileges for task execution services, and it should be passwordless. Beginners can ignore this for now. If "/etc/sudoers" contains "Defaults requirett", comment it out. Deployment user needs sudo privileges for task execution services, and it should be passwordless. Beginners can ignore this for now. If "/etc/sudoers" contains "Defaults requirett", comment it out. 4.2. Configure SSH Passwordless Login for Machines SSH passwordless login is required for resource transfer between different machines. Follow these steps to configure it: su dolphinscheduler ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # Execute the following command; otherwise, passwordless login will fail chmod 600 ~/.ssh/authorized_keys su dolphinscheduler ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys # Execute the following command; otherwise, passwordless login will fail chmod 600 ~/.ssh/authorized_keys Note: After configuration, you can test by running ssh localhost to check if login without password is successful. ssh localhost 5. Start ZooKeeper Simply start ZooKeeper in the cluster. 6 Modify Configuration All the following operations should be executed under the dolphinscheduler user. After preparing the basic environment, modify the configuration files based on your machine environment. Configuration files can be found in the bin/env directory, namely install_env.sh and dolphinscheduler_env.sh . install_env.sh dolphinscheduler_env.sh 6.1 install_env.sh The install_env.sh file configures where DolphinScheduler will be installed on which machines, and which services will be installed on each machine. You can find this file in the bin/env/ directory, then follow the instructions below to modify the corresponding configurations. install_env.sh bin/env/ # --------------------------------------------------------- # INSTALL MACHINE # --------------------------------------------------------- # A comma separated list of machine hostname or IP would be installed DolphinScheduler, # including master, worker, api, alert. If you want to deploy in pseudo-distributed # mode, just write a pseudo-distributed hostname # Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5" # Configure the machines where DolphinScheduler will be installed. ips=${ips:-"ds01,ds02,ds03,hadoop02,hadoop03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine # modify it if you use different ssh port sshPort=${sshPort:-"22"} # A comma separated list of machine hostname or IP would be installed Master server, it # must be a subset of configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" # Configure the machines where the Master server will be installed. masters=${masters:-"ds01,ds02,ds03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a # subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" # To configure which machines the Worker role will be installed on, you need to specify a comma-separated list of machine hostnames or IP addresses along with their corresponding worker groups in the `workers` variable. By default, all workers are placed in the `default` worker group. Additional worker groups can be configured individually through the DolphinScheduler interface. workers=${workers:-"ds01:default,ds02:default,ds03:default,hadoop02:default,hadoop03:default,hadoop04:default,hadoop05:default,hadoop06:default,hadoop07:default,hadoop08:default"} # A comma separated list of machine hostname or IP would be installed Alert server, it # must be a subset of configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" # To configure which machine the Alert role will be installed on, specify a single machine alertServer=${alertServer:-"hadoop03"} # A comma separated list of machine hostname or IP would be installed API server, it # must be a subset of configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" # To configure which machine the Alert role will be installed on, specify a single machine apiServers=${apiServers:-"hadoop04"} # The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists. # Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path. # Installation path configuration: It will be installed on all machines in the Dolphin cluster. Make sure to differentiate it from the directory where the binary package is extracted. It's preferable to include the version number for easier upgrade operations later. installPath=${installPath:-"/opt/dolphinscheduler-3.1.5"} # The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh` # script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs # to be created by this user # Deployment user: Use the user created above for deployment. deployUser=${deployUser:-"dolphinscheduler"} # The root of zookeeper, for now DolphinScheduler default registry server is zookeeper. # Configure the name registered to the ZooKeeper znode. If multiple DolphinScheduler clusters are configured, different names need to be configured. zkRoot=${zkRoot:-"/dolphinscheduler"} # --------------------------------------------------------- # INSTALL MACHINE # --------------------------------------------------------- # A comma separated list of machine hostname or IP would be installed DolphinScheduler, # including master, worker, api, alert. If you want to deploy in pseudo-distributed # mode, just write a pseudo-distributed hostname # Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5" # Configure the machines where DolphinScheduler will be installed. ips=${ips:-"ds01,ds02,ds03,hadoop02,hadoop03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine # modify it if you use different ssh port sshPort=${sshPort:-"22"} # A comma separated list of machine hostname or IP would be installed Master server, it # must be a subset of configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" # Configure the machines where the Master server will be installed. masters=${masters:-"ds01,ds02,ds03,hadoop04,hadoop05,hadoop06,hadoop07,hadoop08"} # A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a # subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" # To configure which machines the Worker role will be installed on, you need to specify a comma-separated list of machine hostnames or IP addresses along with their corresponding worker groups in the `workers` variable. By default, all workers are placed in the `default` worker group. Additional worker groups can be configured individually through the DolphinScheduler interface. workers=${workers:-"ds01:default,ds02:default,ds03:default,hadoop02:default,hadoop03:default,hadoop04:default,hadoop05:default,hadoop06:default,hadoop07:default,hadoop08:default"} # A comma separated list of machine hostname or IP would be installed Alert server, it # must be a subset of configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" # To configure which machine the Alert role will be installed on, specify a single machine alertServer=${alertServer:-"hadoop03"} # A comma separated list of machine hostname or IP would be installed API server, it # must be a subset of configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" # To configure which machine the Alert role will be installed on, specify a single machine apiServers=${apiServers:-"hadoop04"} # The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists. # Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path. # Installation path configuration: It will be installed on all machines in the Dolphin cluster. Make sure to differentiate it from the directory where the binary package is extracted. It's preferable to include the version number for easier upgrade operations later. installPath=${installPath:-"/opt/dolphinscheduler-3.1.5"} # The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh` # script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs # to be created by this user # Deployment user: Use the user created above for deployment. deployUser=${deployUser:-"dolphinscheduler"} # The root of zookeeper, for now DolphinScheduler default registry server is zookeeper. # Configure the name registered to the ZooKeeper znode. If multiple DolphinScheduler clusters are configured, different names need to be configured. zkRoot=${zkRoot:-"/dolphinscheduler"} 6.2. dolphinscheduler_env.sh You can find this file at the path bin/env/ . It is used to configure some environment settings. Modify the corresponding configurations according to the following instructions: bin/env/ # JDK path, must be modified export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_202} # Database type, supports mysql, postgresql export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} # Connection URL, mainly modify the hostname below, and the last configuration is for the East Eight Zone export SPRING_DATASOURCE_URL="jdbc:mysql://hostname:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai" export SPRING_DATASOURCE_USERNAME=dolphinscheduler # If the password is complex, it needs to be enclosed in single quotes before and after export SPRING_DATASOURCE_PASSWORD='xxxxxxxxxxxxx' export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none} # Configure the time zone used when JVM starts for each role. Default is -UTC, if you want to fully support the East Eight Zone, set it to -GMT+8 export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8} export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10} export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} # Configure the zookeeper address used export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2181,hadoop02:2181,hadoop03:2181} # Configure some environment variables used according to your needs, install all required components by yourself export HADOOP_HOME=${HADOOP_HOME:-/opt/cloudera/parcels/CDH/lib/hadoop} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1} export SPARK_HOME2=${SPARK_HOME2:-/opt/spark-3.3.2} export PYTHON_HOME=${PYTHON_HOME:-/opt/python-3.9.16} export HIVE_HOME=${HIVE_HOME:-/opt/cloudera/parcels/CDH/lib/hive} export FLINK_HOME=${FLINK_HOME:-/opt/flink-1.15.3} export DATAX_HOME=${DATAX_HOME:-/opt/datax} export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel-2.1.3} export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PY # JDK path, must be modified export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_202} # Database type, supports mysql, postgresql export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} # Connection URL, mainly modify the hostname below, and the last configuration is for the East Eight Zone export SPRING_DATASOURCE_URL="jdbc:mysql://hostname:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai" export SPRING_DATASOURCE_USERNAME=dolphinscheduler # If the password is complex, it needs to be enclosed in single quotes before and after export SPRING_DATASOURCE_PASSWORD='xxxxxxxxxxxxx' export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none} # Configure the time zone used when JVM starts for each role. Default is -UTC, if you want to fully support the East Eight Zone, set it to -GMT+8 export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8} export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10} export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} # Configure the zookeeper address used export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2181,hadoop02:2181,hadoop03:2181} # Configure some environment variables used according to your needs, install all required components by yourself export HADOOP_HOME=${HADOOP_HOME:-/opt/cloudera/parcels/CDH/lib/hadoop} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1} export SPARK_HOME2=${SPARK_HOME2:-/opt/spark-3.3.2} export PYTHON_HOME=${PYTHON_HOME:-/opt/python-3.9.16} export HIVE_HOME=${HIVE_HOME:-/opt/cloudera/parcels/CDH/lib/hive} export FLINK_HOME=${FLINK_HOME:-/opt/flink-1.15.3} export DATAX_HOME=${DATAX_HOME:-/opt/datax} export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/seatunnel-2.1.3} export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PY 6.3. common.properties Download the hdfs-site.xml and core-site.xml files from your Hadoop cluster and place them in the api-server/conf/ and worker-server/conf/ directories. If you have set up an Apache native cluster, retrieve these files from the respective component's conf directory. For CDH, you can directly download them from the CDH interface. hdfs-site.xml core-site.xml api-server/conf/ worker-server/conf/ conf Modify these files located in the api-server/conf/ and worker-server/conf/ directories. These files mainly configure parameters related to resource uploads, such as uploading DolphinScheduler's resources to HDFS. Follow the instructions below to make the necessary modifications: api-server/conf/ worker-server/conf/ # Local path, mainly used to store temporary files during task execution. Ensure that the user has read and write permissions for this directory. Generally, keep the default. If you encounter permission errors during task execution indicating insufficient permissions for files in this directory, simply change the directory permissions to 777. data.basedir.path=/tmp/dolphinscheduler # Resource view suffixes #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # Location to save resources, possible values: HDFS, S3, OSS, NONE resource.storage.type=HDFS # Base path for resource uploads, must start with /dolphinscheduler, ensure that the user has read and write permissions for this directory resource.storage.upload.base.path=/dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id=minioadmin # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.secret.access.key=minioadmin # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.region=cn-north-1 # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. resource.aws.s3.bucket.name=dolphinscheduler # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn resource.aws.s3.endpoint=http://localhost:9000 # alibaba cloud access key id, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.id=<your-access-key-id> # alibaba cloud access key secret, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.secret=<your-access-key-secret> # alibaba cloud region, required if you set resource.storage.type=OSS resource.alibaba.cloud.region=cn-hangzhou # oss bucket name, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.bucket.name=dolphinscheduler # oss bucket endpoint, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user=hdfs # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir # resource.hdfs.fs.defaultFS=hdfs://bigdata:8020 # whether to startup kerberos hadoop.security.authentication.startup.state=false # java.security.krb5.conf path java.security.krb5.conf.path=/opt/krb5.conf # login user from keytab username login.user.keytab.username=hdfs-mycluster@ESZ.COM # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time=2 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port=8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids=hadoop02,hadoop03 # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http://hadoop02:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable=false # datasource encryption salt datasource.encryption.salt=!@#$%^&* # data quality option data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path=/tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession=false # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable=true setTaskDirToTenant.enable=false # network interface preferred like eth0, default: empty #dolphin.scheduler.network.interface.preferred= # network IP gets priority, default: inner outer #dolphin.scheduler.network.priority.strategy=default # system env path #dolphinscheduler.env.path=dolphinscheduler_env.sh # development state development.state=false # rpc port alert.rpc.port=50052 # set path of conda.sh conda.path=/opt/anaconda3/etc/profile.d/conda.sh # Task resource limit state task.resource.limit.state=false # mlflow task plugin preset repository ml.mlflow.preset_repository=https://github.com/apache/dolphinscheduler-mlflow # mlflow task plugin preset repository version ml.mlflow.preset_repository_version="main" # Local path, mainly used to store temporary files during task execution. Ensure that the user has read and write permissions for this directory. Generally, keep the default. If you encounter permission errors during task execution indicating insufficient permissions for files in this directory, simply change the directory permissions to 777. data.basedir.path=/tmp/dolphinscheduler # Resource view suffixes #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js # Location to save resources, possible values: HDFS, S3, OSS, NONE resource.storage.type=HDFS # Base path for resource uploads, must start with /dolphinscheduler, ensure that the user has read and write permissions for this directory resource.storage.upload.base.path=/dolphinscheduler # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.access.key.id=minioadmin # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.secret.access.key=minioadmin # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required resource.aws.region=cn-north-1 # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name. resource.aws.s3.bucket.name=dolphinscheduler # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn resource.aws.s3.endpoint=http://localhost:9000 # alibaba cloud access key id, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.id=<your-access-key-id> # alibaba cloud access key secret, required if you set resource.storage.type=OSS resource.alibaba.cloud.access.key.secret=<your-access-key-secret> # alibaba cloud region, required if you set resource.storage.type=OSS resource.alibaba.cloud.region=cn-hangzhou # oss bucket name, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.bucket.name=dolphinscheduler # oss bucket endpoint, required if you set resource.storage.type=OSS resource.alibaba.cloud.oss.endpoint=https://oss-cn-hangzhou.aliyuncs.com # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path resource.hdfs.root.user=hdfs # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir # resource.hdfs.fs.defaultFS=hdfs://bigdata:8020 # whether to startup kerberos hadoop.security.authentication.startup.state=false # java.security.krb5.conf path java.security.krb5.conf.path=/opt/krb5.conf # login user from keytab username login.user.keytab.username=hdfs-mycluster@ESZ.COM # login user from keytab path login.user.keytab.path=/opt/hdfs.headless.keytab # kerberos expire time, the unit is hour kerberos.expire.time=2 # resourcemanager port, the default value is 8088 if not specified resource.manager.httpaddress.port=8088 # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty yarn.resourcemanager.ha.rm.ids=hadoop02,hadoop03 # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http://hadoop02:19888/ws/v1/history/mapreduce/jobs/%s # datasource encryption enable datasource.encryption.enable=false # datasource encryption salt datasource.encryption.salt=!@#$%^&* # data quality option data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar #data-quality.error.output.path=/tmp/data-quality-error-data # Network IP gets priority, default inner outer # Whether hive SQL is executed in the same session support.hive.oneSession=false # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions sudo.enable=true setTaskDirToTenant.enable=false # network interface preferred like eth0, default: empty #dolphin.scheduler.network.interface.preferred= # network IP gets priority, default: inner outer #dolphin.scheduler.network.priority.strategy=default # system env path #dolphinscheduler.env.path=dolphinscheduler_env.sh # development state development.state=false # rpc port alert.rpc.port=50052 # set path of conda.sh conda.path=/opt/anaconda3/etc/profile.d/conda.sh # Task resource limit state task.resource.limit.state=false # mlflow task plugin preset repository ml.mlflow.preset_repository=https://github.com/apache/dolphinscheduler-mlflow # mlflow task plugin preset repository version ml.mlflow.preset_repository_version="main" 6.4 application.yaml You need to modify the /conf/application.yaml file for all roles, including: master-server/conf/application.yaml , worker-server/conf/application.yaml , api-server/conf/application.yaml , and alert-server/conf/application.yaml . The main modification is to set the time zone. Here's the specific modification: /conf/application.yaml master-server/conf/application.yaml worker-server/conf/application.yaml api-server/conf/application.yaml alert-server/conf/application.yaml spring: banner: charset: UTF-8 jackson: # Set the time zone to GMT+8, modify only this section time-zone: GMT+8 date-format: "yyyy-MM-dd HH:mm:ss" spring: banner: charset: UTF-8 jackson: # Set the time zone to GMT+8, modify only this section time-zone: GMT+8 date-format: "yyyy-MM-dd HH:mm:ss" 6.5. service.57a50399.js和service.57a50399.js.gz You'll find these two files, service.57a50399.js and service.57a50399.js.gz , in the api-server/ui/assets/ and ui/assets/ directories, respectively. service.57a50399.js service.57a50399.js.gz api-server/ui/assets/ ui/assets/ Navigate to each of these directories and locate the mentioned files. Then, open them using the vim command. Once opened, search for 15e3 and change it to 15e5 . This modification adjusts the timeout for page responses. The default value 15e3 represents 15 seconds, and we're changing it to 1500 seconds. This change ensures that there won't be errors due to page timeouts when uploading large files. vim 15e3 15e5 15e3 7 Initialize the database To initialize the database, follow these steps: Driver Configuration: Copy the MySQL driver (8.x) to the lib directory of each DolphinScheduler role, including: Driver Configuration: lib api-server/libs alert-server/libs master-server/libs worker-server/libs tools/libs api-server/libs api-server/libs api-server/libs alert-server/libs alert-server/libs alert-server/libs master-server/libs master-server/libs master-server/libs worker-server/libs worker-server/libs worker-server/libs tools/libs tools/libs tools/libs Database User: Database User: Log in to MySQL with the root user. Execute the following SQL commands (both MySQL 5 and MySQL 8 are supported): create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci; create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password'; grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%'; flush privileges; Log in to MySQL with the root user. Log in to MySQL with the root user. Execute the following SQL commands (both MySQL 5 and MySQL 8 are supported): create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci; create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password'; grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%'; flush privileges; Execute the following SQL commands (both MySQL 5 and MySQL 8 are supported): create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci; create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password'; grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%'; flush privileges; create database `dolphinscheduler` character set utf8mb4 collate utf8mb4_general_ci; create user 'dolphinscheduler'@'%' IDENTIFIED WITH mysql_native_password by 'your_password'; grant ALL PRIVILEGES ON dolphinscheduler.* to 'dolphinscheduler'@'%'; flush privileges; Execute Database Upgrade Script: Run the following command to execute the database upgrade script: Execute Database Upgrade Script: bash tools/bin/upgrade-schema.sh bash tools/bin/upgrade-schema.sh 8. Installation: Run the installation script: Run the installation script: bash ./bin/install.sh bash ./bin/install.sh This script will remotely transfer all local files to the machines configured in the above configuration files using scp . It will then stop the corresponding roles on each machine and start them again. scp After the first installation, all roles will be started automatically. There's no need to start any roles separately. If any roles are not started, you can check the corresponding logs on the respective machines to identify the specific issues. 9. Start and stop the services Stop all services: Stop all services: bash ./bin/stop-all.sh bash ./bin/stop-all.sh Start all services: Start all services: bash ./bin/start-all.sh bash ./bin/start-all.sh Start/Stop Master: Start/Stop Master: bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server Start/Stop Worker: Start/Stop Worker: bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server Start/Stop Api: Start/Stop Api: bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server Start/Stop Alert: Start/Stop Alert: bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server It's crucial to note that you must execute these scripts using the user who installed DolphinScheduler to avoid permission issues. Each service has a dolphinscheduler_env.sh file in the <service>/conf/ directory, which provides convenience for microservice requirements. This means you can configure <service>/conf/dolphinscheduler_env.sh for the corresponding service and then start each service based on different environment variables using <service>/bin/start.sh command. However, if you start the server using the command /bin/dolphinscheduler-daemon.sh start <service> , it will override <service>/conf/dolphinscheduler_env.sh with the file bin/env/dolphinscheduler_env.sh and then start the service. This is done to reduce the cost of users modifying configurations. dolphinscheduler_env.sh <service>/conf/ <service>/conf/dolphinscheduler_env.sh <service>/bin/start.sh /bin/dolphinscheduler-daemon.sh start <service> <service>/conf/dolphinscheduler_env.sh bin/env/dolphinscheduler_env.sh 10. Scaling Out 10.1. Standard Method Refer to the steps above and follow these operations: New Node - Install and configure JDK. - Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. New Node On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node. Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node. Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles. Log in as the user who installed DolphinScheduler. Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node. Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles. Log in as the user who installed DolphinScheduler. Modify the entire directory previously configured in the configuration file bin/env/install_env.sh and specify which roles need to be deployed on the new node. Execute the /bin/install.sh script for installation. This script will retransmit the entire directory to all machines configured in bin/env/install_env.sh, then stop all roles on all machines, and finally restart all roles. Disadvantages of this method: If DolphinScheduler has many tasks running at the minute level or real-time tasks such as Flink or Spark, stopping all roles and restarting them will take some time. During this period, tasks may stop abnormally due to the restart of the entire cluster or may not be scheduled normally. However, DolphinScheduler implements automatic fault tolerance and disaster recovery functions, so this operation is feasible. Finally, observe whether all tasks are executed normally. 10.2. Simple Method Refer to the steps above and follow these operations: New Node Install and configure JDK. Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Compress the entire directory previously configured, then transfer it to the new node. New Node Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh. Log in as the user who installed DolphinScheduler. Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is: New Node Install and configure JDK. Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. New Node Install and configure JDK. Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. Install and configure JDK. Create a new user for DolphinScheduler (Linux user) and configure passwordless login and permissions. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Compress the entire directory previously configured, then transfer it to the new node. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Compress the entire directory previously configured, then transfer it to the new node. On the machine where DolphinScheduler was previously installed and the binary package was uncompressed. Log in as the user who installed DolphinScheduler. Compress the entire directory previously configured, then transfer it to the new node. New Node Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh. Log in as the user who installed DolphinScheduler. Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is: New Node Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh. Log in as the user who installed DolphinScheduler. Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is: Uncompress the files on the new node and rename them to the installation directory configured in the configuration file bin/env/install_env.sh. Log in as the user who installed DolphinScheduler. Start the roles that need to be deployed on the new node. The specific script location is /bin/dolphinscheduler-daemon.sh, and the start command is: ./dolphinscheduler-daemon.sh start master-server ./dolphinscheduler-daemon.sh start worker-server ./dolphinscheduler-daemon.sh start master-server ./dolphinscheduler-daemon.sh start worker-server Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the corresponding roles have started on the new node. Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the corresponding roles have started on the new node. Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the corresponding roles have started on the new node. Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the corresponding roles have started on the new node. 11. Scaling In Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh script. The stop command is: ./dolphinscheduler-daemon.sh stop worker-server Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the roles stopped on the machine have disappeared. On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package: Log in as the user who installed Dolphin Scheduler. Modify the configuration file bin/env/install_env.sh. In this configuration file, remove the machines corresponding to the offline roles. Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh script. The stop command is: ./dolphinscheduler-daemon.sh stop worker-server Stop all roles on the machine to be removed using the /bin/dolphinscheduler-daemon.sh script. The stop command is: /bin/dolphinscheduler-daemon.sh ./dolphinscheduler-daemon.sh stop worker-server ./dolphinscheduler-daemon.sh stop worker-server Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the roles stopped on the machine have disappeared. Log in to the DolphinScheduler interface and observe in the "Monitor Center" whether the roles stopped on the machine have disappeared. On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package: Log in as the user who installed Dolphin Scheduler. Modify the configuration file bin/env/install_env.sh. In this configuration file, remove the machines corresponding to the offline roles. On the machine where you previously installed Dolphin Scheduler by extracting the binary installation package: Log in as the user who installed Dolphin Scheduler. Modify the configuration file bin/env/install_env.sh. In this configuration file, remove the machines corresponding to the offline roles. Log in as the user who installed Dolphin Scheduler. Modify the configuration file bin/env/install_env.sh . In this configuration file, remove the machines corresponding to the offline roles. bin/env/install_env.sh 12. Upgrade Follow the steps above step by step. For operations that have been performed before, there is no need to perform them again. Below are some specific operation steps: Upload the new version binary package. Uncompress it to a directory different from the old version installation directory, or rename it. Modify the configuration files. A simpler way is to copy all the configuration files involved in the previous installation directory to the new version directory and replace them. Package all the components deployed on other nodes, then unpack and place them in the corresponding positions of the new node. To find out which components need to be copied, you can refer to the configuration in dolphinscheduler_env.sh file. Configure the drivers, referring to the steps in "Initializing the Database". Stop the previous cluster. Backup the entire database. Execute the database upgrade script, referring to the steps in "Initializing the Database". Execute the installation script, referring to "Installation". After the upgrade is complete, log in to the interface and check the "Monitor Center" to see if all roles have started successfully. Upload the new version binary package. Uncompress it to a directory different from the old version installation directory, or rename it. Modify the configuration files. A simpler way is to copy all the configuration files involved in the previous installation directory to the new version directory and replace them. Package all the components deployed on other nodes, then unpack and place them in the corresponding positions of the new node. To find out which components need to be copied, you can refer to the configuration in dolphinscheduler_env.sh file. Configure the drivers, referring to the steps in "Initializing the Database". Stop the previous cluster. Backup the entire database. Execute the database upgrade script, referring to the steps in "Initializing the Database". Execute the installation script, referring to "Installation". After the upgrade is complete, log in to the interface and check the "Monitor Center" to see if all roles have started successfully.