A Comprehensive Guide to Building DolphinScheduler 3.2.0 Production-Grade Cluster Deployment

Introduction: DolphinScheduler provides powerful workflow management and scheduling capabilities for data engineers by simplifying complex task dependencies. In version 3.2.0, DolphinScheduler introduces a series of new features and improvements, significantly enhancing its stability and availability in production environments. To help readers better understand and apply this version, we have carefully prepared this DolphinScheduler 3.2.0 Production Cluster High Availability Setup Guide, delving into how to set up a high-available DolphinScheduler cluster in a production environment, including but not limited to environment preparation, database configuration, user permission settings, SSH keyless login configuration, ZooKeeper startup, and service startup and shutdown procedures. 1. Environment Preparation 1.1 Cluster Planning The installation environment is contos 7.9. 1.2 Component Download Links DolphinScheduler-3.2.0 Official Website: https://dolphinscheduler.apache.org/zh-cn/download/3.2.0 Official Installation Documentation: https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster 1.3 Preparatory Work JDK: Download JDK (1.8+), install and configure the JAVA_HOME environment variable, and append its bin directory to the PATH environment variable. Skip this step if JDK is already installed in your environment. Binary Package: Download the DolphinScheduler binary package from the download page. Database: PostgreSQL (8.2.15+) or MySQL (5.7+), choose either of them, e.g., MySQL requires JDBC Driver 8.0.16. Process Tree Analysis: macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. [hadoop@hadoop1 ~]$ sudo yum install -y psmisc Note: While DolphinScheduler itself does not depend on Hadoop, Hive, or Spark, corresponding environment support is needed if your tasks rely on them. 2. DolphinScheduler Cluster Installation 2.1 Extract Installation Package Upload the DolphinScheduler installation package to the /data/software directory of the hadoop1 node. Extract the package to the current directory. hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin 2.2 Database Configuration DolphinScheduler metadata is stored in a relational database, so create the corresponding database and user. mysql -uroot -p Create the database: Mysql>CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; //Create the user: //revise {user} and {password} name and paaaword as you wish mysql>CREATE USER '{user}'@'%' IDENTIFIED BY '{password}'; mysql>Grant privileges: mysql>GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%'; mysql>FLUSH PRIVILEGES; Note: If you encounter an error message _ERROR 1819 (HY000): Your password does not satisfy the current policy requirements_indicating the password does not meet policy requirements, you can either increase password complexity or execute commands to lower the MySQL password strength level. mysql> set global validate_password_policy=0; mysql> set global validate_password_length=4; Granting User Permissions mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%'; mysql> FLUSH PRIVILEGES; If using MySQL, you need to manually download the mysql-connector-java driver (version 8.0.31) and move it to the libs directory of each DolphinScheduler module, including api-server/libs, alert-server/libs, master-server/libs, worker-server/libs, and tools/libs. Note: If you only want to use MySQL in the data source center, there is no requirement for the version of the MySQL JDBC driver. However, if you want to use MySQL as DolphinScheduler’s metadata database, only version 8.0.16 and above are supported. echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar 2.2 Preparing DolphinScheduler Startup Environment Configure User SSH Access and Permissions If you already have an existing Hadoop cluster account, it is recommended to use it directly without configuration Create a deployment user and be sure to configure sudo password-free. Take, for example, creating a Hadoop user # To create a user, you need to log in as rootuseradd hadoop # Add a passwordecho "hadoop" | passwd --stdin hadoop# Configure sudo password-freesed -i '$ahadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoerssed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers# Modify the directory permissions so that the deployment user has the operation permission to the apache-dolphinscheduler-*-bin directory after the binary package is decompressedchown -R hadoop:hadoop apache-dolphinscheduler-*-binchmod -R 755 apache-dolphinscheduler-*-bin Note: 1. Because the task execution service implements multi-tenant running jobs by switching between different Linux users with sudo -u {linux-user}, the deployment user needs to have sudo permissions, and it is password-free. If a beginner doesn’t understand, he or she can ignore this for a while 2. If you find a line “Defaults requirett” in the /etc/sudoers file, please comment it out as well Configure SSH password-free login on the machineSince resources need to be sent to different machines during installation, SSH password-free login is required between each machine. The following steps are performed to configure passwordless login su hadoop ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys Note: After the configuration is complete, you can run the command ssh localhost to determine whether it is successful, if you can ssh login without entering a password, it will prove that it is successful 2.3 Start ZooKeeper (no need to configure a Hadoop cluster) Go to the zookeeper installation directory and copy the zoo_sample.cfg configuration file to conf/zoo.cfg, and change the value in dataDir in conf/zoo.cfg to dataDir=./tmp/zookeeper # Start ZooKeeper ./bin/zkServer.sh start 2.4 Revise install_env.sh file The file install_env.sh outlines the machines on which DolphinScheduler will be installed and the services that will be deployed on each machine. Located at bin/env/install_env.sh, you can modify the environment variables using the format: export =. Below are the configuration details: ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"} # Modify it if you use a different SSH port sshPort=${sshPort:-"xxx"} # A comma-separated list of machine hostname or IP addresses that will host the Master server. It must be a subset of the configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" masters=${masters:-"hadoop1,hadoop2"} # A comma-separated list of machine : or : . All hostnames or IPs must be a subset of the configuration `ips`, and the workerGroup has a default value of `default`, but we recommend you declare it after the hosts. # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"} # A comma-separated list of machine hostname or IP addresses that will host the Alert server. It must be a subset of the configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" alertServer=${alertServer:-"hadoop3"} # A comma-separated list of machine hostname or IP addresses that will host the API server. It must be a subset of the configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" apiServers=${apiServers:-"hadoop2"} # The directory to install DolphinScheduler on all machines defined above. It will automatically be created by the `install.sh` script if it doesn't exist. # Do not set this configuration to be the same as the current path (pwd). Do not enclose it in quotes if you are using a relative path. installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"} # The user to deploy DolphinScheduler on all machines defined above. This user must be created manually before running the `install.sh` script. The user needs sudo privileges and permissions to operate HDFS. If HDFS is enabled, the root directory must be created by this user. deployUser=${deployUser:-"hadoop"} # The root directory of ZooKeeper. For now, DolphinScheduler's default registry server is ZooKeeper. # It will delete ${zkRoot} in ZooKeeper when you run install.sh, so please keep it consistent with registry.zookeeper.namespace in yml files. # Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well. zkRoot=${zkRoot:-"/dolphinscheduler"} 2.5 Modify the dolphinscheduler_env.sh file The file ./bin/env/dolphinscheduler_env.shdescribes the following configurations: The database configuration of DolphinScheduler, the detailed configuration method is in [Initializing the Database], some task type external dependency paths or library files, such as JAVA_HOME and SPARK_HOME, are defined here. If you do not use certain task types, you can ignore the external dependencies of the tasks, but you must change the JAVA_HOME, registry, and database-related configurations according to your environment. export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212} # Database related configuration, set database type, username and password export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8" export SPRING_DATASOURCE_USERNAME=xxx export SPRING_DATASOURCE_PASSWORD=xxx # Registry center configuration, determines the type and link of the registry center export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181} export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop} export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1} #export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2} #export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python} export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3} export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2} export DATAX_HOME=${DATAX_HOME:-/data/module/datax} #export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel} #export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH After configuring the environment variables, you can proceed with the DolphinScheduler installation process. 2.6 Initialize the database After completing the above steps, you have created a new database for DolphinScheduler and configured it in DolphinScheduler. Now, you can initialize the database through a quick Shell script. bash tools/bin/upgrade-schema.sh 2.7 Modify the application.yaml file There are5 files, the parts that need to be modified are the same, but the other configurations inside are different and need to be rewritten separately. They are: master-server/conf/application.yaml api-server/conf/application.yaml worker-server/conf/application.yaml alert-server/conf/application.yaml tools/conf/application.yaml datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxx password: xxx registry: type: zookeeper zookeeper: namespace: dolphinscheduler connect-string: xxxx retry-policy: base-sleep-time: 60ms max-sleep: 300ms max-retries: 5 session-timeout: 30s connection-timeout: 9s block-until-connected: 600ms digest: ~ spring: config: activate: on-profile: mysql datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxxx password: xxxx quartz: properties: org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobs 2.8. Modify common.properties File Similarly, there are 5 files to modify, with the same sections needing modification but with different configurations inside. The modifications need to be made separately for: master-server/conf/common.properties api-server/conf/common.properties worker-server/conf/common.properties alert-server/conf/common.properties tools/conf/common.properties data.basedir.path=Customize the local file storage location resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path=Customize the hdsf location resource.hdfs.root.user=Customize the username and keep it consistent with the previous configuration in the document. # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS=hdfs://xxx:8020 #A highly available IP address yarn.resourcemanager.ha.rm.ids=xxxx,xxx # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s Note: In this case, DolphinScheduler’s distributed storage is using HDFS. If other configurations are required, you can configure them according to the instructions on the official website. 2.9. Distributed Storage HDFS Dependency Distribution echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml 2.10 Start DolphinScheduler Using the deployment user created above, run the following command to complete the deployment, and the post-deployment run logs will be stored in the logs folder. bash ./bin/install.sh Note: For the first deployment, there may be 5 times sh: bin/dolphinscheduler-daemon.sh: No such file or directory information, which can be ignored as non-important information 2.11 Log in to DolphinScheduler Log in to the system UI by accessing the browser address http://localhost:12345/dolphinscheduler/ui. The default username and password is admin/dolphinscheduler123 3. Start and Stop Services # Stop all cluster services bash ./bin/stop-all.sh # Start all cluster services bash ./bin/start-all.sh # Start/stop Master bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server # Start/stop Worker bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server # Start/stop API bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server # Start/stop Alert bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server Introduction: DolphinScheduler provides powerful workflow management and scheduling capabilities for data engineers by simplifying complex task dependencies. In version 3.2.0, DolphinScheduler introduces a series of new features and improvements, significantly enhancing its stability and availability in production environments. To help readers better understand and apply this version, we have carefully prepared this DolphinScheduler 3.2.0 Production Cluster High Availability Setup Guide, delving into how to set up a high-available DolphinScheduler cluster in a production environment, including but not limited to environment preparation, database configuration, user permission settings, SSH keyless login configuration, ZooKeeper startup, and service startup and shutdown procedures. 1. Environment Preparation 1.1 Cluster Planning The installation environment is contos 7.9. The installation environment is contos 7.9. The installation environment is contos 7.9. 1.2 Component Download Links DolphinScheduler-3.2.0 Official Website: https://dolphinscheduler.apache.org/zh-cn/download/3.2.0 https://dolphinscheduler.apache.org/zh-cn/download/3.2.0 Official Installation Documentation: https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster https://dolphinscheduler.apache.org/zh-cn/docs/3.2.0/guide/installation/cluster 1.3 Preparatory Work JDK: Download JDK (1.8+), install and configure the JAVA_HOME environment variable, and append its bin directory to the PATH environment variable. Skip this step if JDK is already installed in your environment. Binary Package: Download the DolphinScheduler binary package from the download page. Database: PostgreSQL (8.2.15+) or MySQL (5.7+), choose either of them, e.g., MySQL requires JDBC Driver 8.0.16. Process Tree Analysis: macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. JDK: Download JDK (1.8+), install and configure the JAVA_HOME environment variable, and append its bin directory to the PATH environment variable. Skip this step if JDK is already installed in your environment. Binary Package: Download the DolphinScheduler binary package from the download page. Database: PostgreSQL (8.2.15+) or MySQL (5.7+), choose either of them, e.g., MySQL requires JDBC Driver 8.0.16. Process Tree Analysis: macOS: Install pstree. Fedora/Red/Hat/CentOS/Ubuntu/Debian: Install psmisc. [hadoop@hadoop1 ~]$ sudo yum install -y psmisc Note: While DolphinScheduler itself does not depend on Hadoop, Hive, or Spark, corresponding environment support is needed if your tasks rely on them. Note: While DolphinScheduler itself does not depend on Hadoop, Hive, or Spark, corresponding environment support is needed if your tasks rely on them. Note: While DolphinScheduler itself does not depend on Hadoop, Hive, or Spark, corresponding environment support is needed if your tasks rely on them. 2. DolphinScheduler Cluster Installation 2.1 Extract Installation Package Upload the DolphinScheduler installation package to the /data/software directory of the hadoop1 node. Extract the package to the current directory. Upload the DolphinScheduler installation package to the /data/software directory of the hadoop1 node. Extract the package to the current directory. hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin hadoop@hadoop1 software]$ tar -zxvf apache-dolphinscheduler-3.2.0-bin 2.2 Database Configuration DolphinScheduler metadata is stored in a relational database, so create the corresponding database and user. mysql -uroot -p Create the database: Mysql>CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; //Create the user: //revise {user} and {password} name and paaaword as you wish mysql>CREATE USER '{user}'@'%' IDENTIFIED BY '{password}'; mysql>Grant privileges: mysql>GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%'; mysql>FLUSH PRIVILEGES; mysql -uroot -p Create the database: Mysql>CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; //Create the user: //revise {user} and {password} name and paaaword as you wish mysql>CREATE USER '{user}'@'%' IDENTIFIED BY '{password}'; mysql>Grant privileges: mysql>GRANT ALL PRIVILEGES ON dolphinscheduler.* TO '{user}'@'%'; mysql>FLUSH PRIVILEGES; Note: If you encounter an error message _ERROR 1819 (HY000): Your password does not satisfy the current policy requirements_indicating the password does not meet policy requirements, you can either increase password complexity or execute commands to lower the MySQL password strength level. Note: If you encounter an error message _ERROR 1819 (HY000): Your password does not satisfy the current policy requirements_ indicating the password does not meet policy requirements, you can either increase password complexity or execute commands to lower the MySQL password strength level. Note: If you encounter an error message _ERROR 1819 (HY000): Your password does not satisfy the current policy requirements_ indicating the password does not meet policy requirements, you can either increase password complexity or execute commands to lower the MySQL password strength level. mysql> set global validate_password_policy=0; mysql> set global validate_password_length=4; mysql> set global validate_password_policy=0; mysql> set global validate_password_length=4; Granting User Permissions mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%'; mysql> FLUSH PRIVILEGES; mysql> GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%'; mysql> FLUSH PRIVILEGES; If using MySQL, you need to manually download the mysql-connector-java driver (version 8.0.31) and move it to the libs directory of each DolphinScheduler module, including api-server/libs, alert-server/libs, master-server/libs, worker-server/libs, and tools/libs. Note: If you only want to use MySQL in the data source center, there is no requirement for the version of the MySQL JDBC driver. However, if you want to use MySQL as DolphinScheduler’s metadata database, only version 8.0.16 and above are supported. If using MySQL, you need to manually download the mysql-connector-java driver (version 8.0.31) and move it to the libs directory of each DolphinScheduler module, including api-server/libs, alert-server/libs, master-server/libs, worker-server/libs, and tools/libs. If using MySQL, you need to manually download the mysql-connector-java driver (version 8.0.31) and move it to the libs directory of each DolphinScheduler module, including api-server/libs, alert-server/libs, master-server/libs, worker-server/libs, and tools/libs. Note: If you only want to use MySQL in the data source center, there is no requirement for the version of the MySQL JDBC driver. However, if you want to use MySQL as DolphinScheduler’s metadata database, only version 8.0.16 and above are supported. Note: If you only want to use MySQL in the data source center, there is no requirement for the version of the MySQL JDBC driver. However, if you want to use MySQL as DolphinScheduler’s metadata database, only version 8.0.16 and above are supported. echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar echo /data/software/dolphinscheduler-3.2.0/master-server/libs/ /data/software/dolphinscheduler-3.2.0/alert-server/libs/ /data/software/dolphinscheduler-3.2.0/api-server/libs/ /data/software/dolphinscheduler-3.2.0/worker-server/libs/ /data/software/dolphinscheduler-3.2.0/tools/libs/ | xargs -n 1 cp -v /data/software/mysql-8.0.31/mysql-connector-j-8.0.31.jar 2.2 Preparing DolphinScheduler Startup Environment Configure User SSH Access and Permissions Configure User SSH Access and Permissions If you already have an existing Hadoop cluster account, it is recommended to use it directly without configuration If you already have an existing Hadoop cluster account, it is recommended to use it directly without configuration If you already have an existing Hadoop cluster account, it is recommended to use it directly without configuration Create a deployment user and be sure to configure sudo password-free. Take, for example, creating a Hadoop user # To create a user, you need to log in as rootuseradd hadoop # Add a passwordecho "hadoop" | passwd --stdin hadoop# Configure sudo password-freesed -i '$ahadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoerssed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers# Modify the directory permissions so that the deployment user has the operation permission to the apache-dolphinscheduler-*-bin directory after the binary package is decompressedchown -R hadoop:hadoop apache-dolphinscheduler-*-binchmod -R 755 apache-dolphinscheduler-*-bin # To create a user, you need to log in as rootuseradd hadoop # Add a passwordecho "hadoop" | passwd --stdin hadoop# Configure sudo password-freesed -i '$ahadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoerssed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers# Modify the directory permissions so that the deployment user has the operation permission to the apache-dolphinscheduler-*-bin directory after the binary package is decompressedchown -R hadoop:hadoop apache-dolphinscheduler-*-binchmod -R 755 apache-dolphinscheduler-*-bin Note: 1. Because the task execution service implements multi-tenant running jobs by switching between different Linux users with sudo -u {linux-user}, the deployment user needs to have sudo permissions, and it is password-free. If a beginner doesn’t understand, he or she can ignore this for a while 2. If you find a line “Defaults requirett” in the /etc/sudoers file, please comment it out as well Note: Note: 1. Because the task execution service implements multi-tenant running jobs by switching between different Linux users with sudo -u {linux-user}, the deployment user needs to have sudo permissions, and it is password-free. If a beginner doesn’t understand, he or she can ignore this for a while 2. If you find a line “Defaults requirett” in the /etc/sudoers file, please comment it out as well Configure SSH password-free login on the machineSince resources need to be sent to different machines during installation, SSH password-free login is required between each machine. The following steps are performed to configure passwordless login Configure SSH password-free login on the machineSince resources need to be sent to different machines during installation, SSH password-free login is required between each machine. The following steps are performed to configure passwordless login su hadoop ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys su hadoop ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys Note: After the configuration is complete, you can run the command ssh localhost to determine whether it is successful, if you can ssh login without entering a password, it will prove that it is successful Note: After the configuration is complete, you can run the command ssh localhost to determine whether it is successful, if you can ssh login without entering a password, it will prove that it is successful Note: After the configuration is complete, you can run the command ssh localhost to determine whether it is successful, if you can ssh login without entering a password, it will prove that it is successful 2.3 Start ZooKeeper (no need to configure a Hadoop cluster) Go to the zookeeper installation directory and copy the zoo_sample.cfg configuration file to conf/zoo.cfg, and change the value in dataDir in conf/zoo.cfg to dataDir=./tmp/zookeeper # Start ZooKeeper ./bin/zkServer.sh start # Start ZooKeeper ./bin/zkServer.sh start 2.4 Revise install_env.sh file The file install_env.sh outlines the machines on which DolphinScheduler will be installed and the services that will be deployed on each machine. Located at bin/env/install_env.sh , you can modify the environment variables using the format: export = . Below are the configuration details: install_env.sh bin/env/install_env.sh export = ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"} # Modify it if you use a different SSH port sshPort=${sshPort:-"xxx"} # A comma-separated list of machine hostname or IP addresses that will host the Master server. It must be a subset of the configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" masters=${masters:-"hadoop1,hadoop2"} # A comma-separated list of machine : or : . All hostnames or IPs must be a subset of the configuration `ips`, and the workerGroup has a default value of `default`, but we recommend you declare it after the hosts. # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"} # A comma-separated list of machine hostname or IP addresses that will host the Alert server. It must be a subset of the configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" alertServer=${alertServer:-"hadoop3"} # A comma-separated list of machine hostname or IP addresses that will host the API server. It must be a subset of the configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" apiServers=${apiServers:-"hadoop2"} # The directory to install DolphinScheduler on all machines defined above. It will automatically be created by the `install.sh` script if it doesn't exist. # Do not set this configuration to be the same as the current path (pwd). Do not enclose it in quotes if you are using a relative path. installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"} # The user to deploy DolphinScheduler on all machines defined above. This user must be created manually before running the `install.sh` script. The user needs sudo privileges and permissions to operate HDFS. If HDFS is enabled, the root directory must be created by this user. deployUser=${deployUser:-"hadoop"} # The root directory of ZooKeeper. For now, DolphinScheduler's default registry server is ZooKeeper. # It will delete ${zkRoot} in ZooKeeper when you run install.sh, so please keep it consistent with registry.zookeeper.namespace in yml files. # Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well. zkRoot=${zkRoot:-"/dolphinscheduler"} ips=${ips:-"hadoop1,hadoop2,hadoop3,hadoop4,hadoop5"} # Modify it if you use a different SSH port sshPort=${sshPort:-"xxx"} # A comma-separated list of machine hostname or IP addresses that will host the Master server. It must be a subset of the configuration `ips`. # Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2" masters=${masters:-"hadoop1,hadoop2"} # A comma-separated list of machine : or : . All hostnames or IPs must be a subset of the configuration `ips`, and the workerGroup has a default value of `default`, but we recommend you declare it after the hosts. # Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default" workers=${workers:-"hadoop3:default,hadoop4:default,hadoop5:default"} # A comma-separated list of machine hostname or IP addresses that will host the Alert server. It must be a subset of the configuration `ips`. # Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3" alertServer=${alertServer:-"hadoop3"} # A comma-separated list of machine hostname or IP addresses that will host the API server. It must be a subset of the configuration `ips`. # Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1" apiServers=${apiServers:-"hadoop2"} # The directory to install DolphinScheduler on all machines defined above. It will automatically be created by the `install.sh` script if it doesn't exist. # Do not set this configuration to be the same as the current path (pwd). Do not enclose it in quotes if you are using a relative path. installPath=${installPath:-"/data/module/dolphinscheduler-3.2.0"} # The user to deploy DolphinScheduler on all machines defined above. This user must be created manually before running the `install.sh` script. The user needs sudo privileges and permissions to operate HDFS. If HDFS is enabled, the root directory must be created by this user. deployUser=${deployUser:-"hadoop"} # The root directory of ZooKeeper. For now, DolphinScheduler's default registry server is ZooKeeper. # It will delete ${zkRoot} in ZooKeeper when you run install.sh, so please keep it consistent with registry.zookeeper.namespace in yml files. # Similarly, if you want to modify the value, please modify registry.zookeeper.namespace in yml files as well. zkRoot=${zkRoot:-"/dolphinscheduler"} 2.5 Modify the dolphinscheduler_env.sh file The file ./bin/env/dolphinscheduler_env.sh describes the following configurations: The database configuration of DolphinScheduler, the detailed configuration method is in [Initializing the Database], some task type external dependency paths or library files, such as JAVA_HOME and SPARK_HOME , are defined here. ./bin/env/dolphinscheduler_env.sh JAVA_HOME SPARK_HOME If you do not use certain task types, you can ignore the external dependencies of the tasks, but you must change the JAVA_HOME, registry, and database-related configurations according to your environment. export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212} # Database related configuration, set database type, username and password export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8" export SPRING_DATASOURCE_USERNAME=xxx export SPRING_DATASOURCE_PASSWORD=xxx # Registry center configuration, determines the type and link of the registry center export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181} export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop} export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1} #export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2} #export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python} export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3} export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2} export DATAX_HOME=${DATAX_HOME:-/data/module/datax} #export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel} #export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH export JAVA_HOME=${JAVA_HOME:-/data/module/jdk1.8.0_212} # Database related configuration, set database type, username and password export DATABASE=${DATABASE:-mysql} export SPRING_PROFILES_ACTIVE=${DATABASE} export SPRING_DATASOURCE_URL="jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8" export SPRING_DATASOURCE_USERNAME=xxx export SPRING_DATASOURCE_PASSWORD=xxx # Registry center configuration, determines the type and link of the registry center export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper} export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-xxxx:2181,xxx:2181,xxx:2181} export HADOOP_HOME=${HADOOP_HOME:-/data/module/hadoop-3.3.4} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/data/module/hadoop-3.3.4/etc/hadoop} export SPARK_HOME1=${SPARK_HOME1:-/data/module/spark-3.3.1} #export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2} #export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python} export HIVE_HOME=${HIVE_HOME:-/data/module/hive-3.1.3} export FLINK_HOME=${FLINK_HOME:-/data/module/flink-1.16.2} export DATAX_HOME=${DATAX_HOME:-/data/module/datax} #export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel} #export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun} export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH After configuring the environment variables, you can proceed with the DolphinScheduler installation process. 2.6 Initialize the database After completing the above steps, you have created a new database for DolphinScheduler and configured it in DolphinScheduler. Now, you can initialize the database through a quick Shell script. bash tools/bin/upgrade-schema.sh bash tools/bin/upgrade-schema.sh 2.7 Modify the application.yaml file There are5 files, the parts that need to be modified are the same, but the other configurations inside are different and need to be rewritten separately. They are: master-server/conf/application.yaml api-server/conf/application.yaml worker-server/conf/application.yaml alert-server/conf/application.yaml tools/conf/application.yaml master-server/conf/application.yaml api-server/conf/application.yaml worker-server/conf/application.yaml alert-server/conf/application.yaml tools/conf/application.yaml datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxx password: xxx registry: type: zookeeper zookeeper: namespace: dolphinscheduler connect-string: xxxx retry-policy: base-sleep-time: 60ms max-sleep: 300ms max-retries: 5 session-timeout: 30s connection-timeout: 9s block-until-connected: 600ms digest: ~ spring: config: activate: on-profile: mysql datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxxx password: xxxx quartz: properties: org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobs datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxx password: xxx registry: type: zookeeper zookeeper: namespace: dolphinscheduler connect-string: xxxx retry-policy: base-sleep-time: 60ms max-sleep: 300ms max-retries: 5 session-timeout: 30s connection-timeout: 9s block-until-connected: 600ms digest: ~ spring: config: activate: on-profile: mysql datasource: driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql:/xxxx:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8 username: xxxx password: xxxx quartz: properties: org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobs 2.8. Modify common.properties File Similarly, there are 5 files to modify, with the same sections needing modification but with different configurations inside. The modifications need to be made separately for: master-server/conf/common.properties api-server/conf/common.properties worker-server/conf/common.properties alert-server/conf/common.properties tools/conf/common.properties master-server/conf/common.properties api-server/conf/common.properties worker-server/conf/common.properties alert-server/conf/common.properties tools/conf/common.properties data.basedir.path=Customize the local file storage location resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path=Customize the hdsf location resource.hdfs.root.user=Customize the username and keep it consistent with the previous configuration in the document. # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS=hdfs://xxx:8020 #A highly available IP address yarn.resourcemanager.ha.rm.ids=xxxx,xxx # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s data.basedir.path=Customize the local file storage location resource.storage.type=HDFS # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended resource.storage.upload.base.path=Customize the hdsf location resource.hdfs.root.user=Customize the username and keep it consistent with the previous configuration in the document. # if resource.storage.type=S3, the value like: s3a://dolphinscheduler; if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir resource.hdfs.fs.defaultFS=hdfs://xxx:8020 #A highly available IP address yarn.resourcemanager.ha.rm.ids=xxxx,xxx # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000) yarn.job.history.status.address=http:/xxx:19888/jobhistory/logs/%s Note: In this case, DolphinScheduler’s distributed storage is using HDFS. If other configurations are required, you can configure them according to the instructions on the official website. Note: In this case, DolphinScheduler’s distributed storage is using HDFS. If other configurations are required, you can configure them according to the instructions on the official website. Note: In this case, DolphinScheduler’s distributed storage is using HDFS. If other configurations are required, you can configure them according to the instructions on the official website. 2.9. Distributed Storage HDFS Dependency Distribution echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml echo /data/software/dolphinscheduler-3.2.0/master-server/conf/ /data/software/dolphinscheduler-3.2.0/alert-server/conf/ /data/software/dolphinscheduler-3.2.0/api-server/conf/ /data/software/dolphinscheduler-3.2.0/worker-server/conf/ | xargs -n 1 cp -v /data/module/hadoop-3.3.4/etc/hadoop/core-site.xml /data/module/hadoop-3.3.4/etc/hadoop/hdfs-site.xml 2.10 Start DolphinScheduler Using the deployment user created above, run the following command to complete the deployment, and the post-deployment run logs will be stored in the logs folder. bash ./bin/install.sh bash ./bin/install.sh Note: For the first deployment, there may be 5 times sh: bin/dolphinscheduler-daemon.sh: No such file or directory information, which can be ignored as non-important information Note: For the first deployment, there may be 5 times sh: bin/dolphinscheduler-daemon.sh: No such file or directory information, which can be ignored as non-important information Note: For the first deployment, there may be 5 times sh: bin/dolphinscheduler-daemon.sh: No such file or directory information, which can be ignored as non-important information 2.11 Log in to DolphinScheduler Log in to the system UI by accessing the browser address http://localhost:12345/dolphinscheduler/ui . The default username and password is admin/dolphinscheduler123 http://localhost:12345/dolphinscheduler/ui 3. Start and Stop Services # Stop all cluster services bash ./bin/stop-all.sh # Start all cluster services bash ./bin/start-all.sh # Start/stop Master bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server # Start/stop Worker bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server # Start/stop API bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server # Start/stop Alert bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server # Stop all cluster services bash ./bin/stop-all.sh # Start all cluster services bash ./bin/start-all.sh # Start/stop Master bash ./bin/dolphinscheduler-daemon.sh stop master-server bash ./bin/dolphinscheduler-daemon.sh start master-server # Start/stop Worker bash ./bin/dolphinscheduler-daemon.sh start worker-server bash ./bin/dolphinscheduler-daemon.sh stop worker-server # Start/stop API bash ./bin/dolphinscheduler-daemon.sh start api-server bash ./bin/dolphinscheduler-daemon.sh stop api-server # Start/stop Alert bash ./bin/dolphinscheduler-daemon.sh start alert-server bash ./bin/dolphinscheduler-daemon.sh stop alert-server