Output variables are a core mechanism in DolphinScheduler task scheduling for achieving data flow and task collaboration. By explicitly defining and passing parameters, they solve issues such as cross-node data sharing and priority conflicts while supporting complex process orchestration (such as sub-processes and conditional branches). Proper use of output variables can significantly improve workflow flexibility and maintainability. This article will introduce important output variables in DolphinScheduler and how to use them.
In Shell scripts, single quotes ('), double quotes ("), and backticks (`) serve different purposes. Understanding their differences and applications is crucial for writing and debugging Shell scripts.
VAR="world"
echo 'Hello, $VAR' # Output: Hello, $VAR
VAR="world"
echo "Hello, $VAR" # Output: Hello, world
$()
instead.DATE=`date`
echo "Current date and time: $DATE" # Outputs the current date and time
$()
for Command SubstitutionDATE=$(date)
echo "Current date and time: $DATE" # Outputs the current date and time
# Example of nested command substitution
OUTER=$(echo "Outer $(echo "Inner")")
echo $OUTER # Output: Outer Inner
$()
: Use when executing commands and using their output.
taskA
echo 'taskA'
echo "#{setValue(linesNum=${lines_num})}"
echo '${setValue(words=20)}'
Note: Here, ${lines_num}
is directly replaced by the Worker.
taskB
echo 'taskB'
echo ${linesNum}
echo ${words}
Focus on taskB
output
}
[INFO] 2024-07-05 10:09:54.539 +0800 - Success initialized task plugin instance successfully
[INFO] 2024-07-05 10:09:54.539 +0800 - Set taskVarPool: [{"prop":"linesNum","direct":"IN","type":"VARCHAR","value":"100"},{"prop":"words","direct":"IN","type":"VARCHAR","value":"20"}] successfully
[INFO] 2024-07-05 10:09:54.539 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 10:09:54.539 +0800 - ********************************* Execute task instance *************************************
[INFO] 2024-07-05 10:09:54.539 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 10:09:54.540 +0800 - Final Shell file is:
[INFO] 2024-07-05 10:09:54.540 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2024-07-05 10:09:54.540 +0800 - #!/bin/bash
echo 'taskB'
echo 100
echo 20
[INFO] 2024-07-05 10:09:56.544 +0800 - ->
taskB
100
20
[INFO] 2024-07-05 10:09:56.546 +0800 - process has exited. Execute path:/tmp/dolphinscheduler/... , processId:588336 , exitStatusCode:0
fileUploadTask
echo 'fileUploadTask'
mkdir -p data/test1 data/test2
echo "test1 message" >> data/test1/text.txt
echo "test2 message" >> data/test2/text.txt
tree .
fileDownloadTask
echo 'fileDownloadTask'
cat input_dir/test1/text.txt
cat input_dir/test2/text.txt
[INFO] 2024-07-05 11:11:08.160 +0800 - Success initialized task plugin instance successfully
[INFO] 2024-07-05 11:11:08.160 +0800 - Set taskVarPool: [{"prop":"fileUploadTask.file-text","direct":"IN","type":"FILE","value":"DATA_TRANSFER/..."}] successfully
[INFO] 2024-07-05 11:11:08.164 +0800 - process start, process id is: 590323
[INFO] 2024-07-05 11:11:10.164 +0800 - ->
fileDownloadTask
test1 message
test2 message
[INFO] 2024-07-05 11:11:10.166 +0800 - process has exited. Execute path:/tmp/dolphinscheduler/... , processId:590323 , exitStatusCode:0
The mode involves outputting results from an SQL task and using them in a Shell task.
sqlOutVarTask
SELECT user_name AS userNameList FROM t_ds_user;
readOutVarTask
echo 'readOutVarTask'
echo ${userNameList}
[INFO] 2024-07-05 11:19:00.294 +0800 - Success initialized task plugin instance successfully
[INFO] 2024-07-05 11:19:00.294 +0800 - Set taskVarPool: [{"prop":"userNameList","direct":"IN","type":"LIST","value":"[\"admin\",\"qiaozhanwei\",\"test\"]"}] successfully
[INFO] 2024-07-05 11:19:00.294 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 11:19:00.294 +0800 - ********************************* Execute task instance *************************************
[INFO] 2024-07-05 11:19:00.294 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - Final Shell file is:
[INFO] 2024-07-05 11:19:00.295 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - #!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
source /etc/profile
export HADOOP_HOME=${HADOOP_HOME:-/home/hadoop-3.3.1}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME=${SPARK_HOME:-/home/spark-3.2.1-bin-hadoop3.2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/home/hive-3.1.2}
export FLINK_HOME=/home/flink-1.18.1
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export SEATUNNEL_HOME=/opt/software/seatunnel
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
export PATH=``HADOOP_HOME/bin:``SPARK_HOME/bin:``PYTHON_HOME/bin:``JAVA_HOME/bin:``HIVE_HOME/bin:``FLINK_HOME/bin:``DATAX_HOME/bin:``SEATUNNEL_HOME/bin:``CHUNJUN_HOME/bin:``PATH
echo 'readOutVarTask'
echo ["admin","qiaozhanwei","test"]
[INFO] 2024-07-05 11:19:00.295 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - Executing shell command : sudo -u root -i /tmp/dolphinscheduler/exec/process/root/13850571680800/14172048617888_1/1963/1458/1963_1458.sh
[INFO] 2024-07-05 11:19:00.299 +0800 - process start, process id is: 590781
[INFO] 2024-07-05 11:19:02.299 +0800 - ->
readOutVarTask
[admin,qiaozhanwei,test]
[INFO] 2024-07-05 11:19:02.301 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/root/13850571680800/14172048617888_1/1963/1458, processId:590781 ,exitStatusCode:0 ,processWaitForStatus:true ,processExitValue:0
Output variables in DolphinScheduler are a powerful mechanism for improving workflow flexibility and maintainability. By mastering their usage, users can create more dynamic, efficient, and scalable workflows.
By applying these techniques, you can maximize the potential of DolphinScheduler in handling complex data integration and process automation tasks.