paint-brush
Stop Moving Data Manually—Let DolphinScheduler’s Output Variables Do the Heavy Lifting For Youby@williamguo
New Story

Stop Moving Data Manually—Let DolphinScheduler’s Output Variables Do the Heavy Lifting For You

by William GuoMarch 20th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Master output variables in ApacheDolphinScheduler to boost workflow flexibility! Learn how to use task-specific & workflow-level variables, SQL/Python task outputs, and conditional branching for dynamic execution.

Coin Mentioned

Mention Thumbnail
featured image - Stop Moving Data Manually—Let DolphinScheduler’s Output Variables Do the Heavy Lifting For You
William Guo HackerNoon profile picture


Output variables are a core mechanism in DolphinScheduler task scheduling for achieving data flow and task collaboration. By explicitly defining and passing parameters, they solve issues such as cross-node data sharing and priority conflicts while supporting complex process orchestration (such as sub-processes and conditional branches). Proper use of output variables can significantly improve workflow flexibility and maintainability. This article will introduce important output variables in DolphinScheduler and how to use them.

1. Use of Single Quotes ('), Double Quotes ("), and Backticks (`) in Shell Scripts

In Shell scripts, single quotes ('), double quotes ("), and backticks (`) serve different purposes. Understanding their differences and applications is crucial for writing and debugging Shell scripts.

1.1 Single Quotes (')

  • Purpose: Full quoting, used to protect all characters within the string, preventing variable substitution or command substitution.
  • Characteristics: All characters inside the quotes are output as-is without interpretation.
VAR="world"
echo 'Hello, $VAR'  # Output: Hello, $VAR

1.2 Double Quotes (")

  • Purpose: Partial quoting, used to protect most characters within the string while allowing variable and command substitution.
  • Characteristics: Variables and commands inside the quotes are interpreted, while other characters remain unchanged.
VAR="world"
echo "Hello, $VAR"  # Output: Hello, world

1.3 Backticks (`)

  • Purpose: Command substitution, used to execute the command inside the quotes and return its output as the result.
  • Characteristics: The command inside the quotes is executed, and its result replaces the backtick content.
  • Note: Backticks are an older method of command substitution. It is recommended to use $() instead.
DATE=`date`
echo "Current date and time: $DATE"  # Outputs the current date and time
  • More modern and readable
  • Supports nesting, whereas backticks make nesting difficult
DATE=$(date)
echo "Current date and time: $DATE"  # Outputs the current date and time  

# Example of nested command substitution  
OUTER=$(echo "Outer $(echo "Inner")")
echo $OUTER  # Output: Outer Inner

1.5 Use Cases

  • Single Quotes: Use when you do not want any part of the string to be interpreted, such as for regular expressions or special characters.
  • Double Quotes: Use when the string contains variables or command substitutions.
  • Backticks / $(): Use when executing commands and using their output.

2. Output Variables and File Transfer in DolphinScheduler

2.1 Shell Output Variables

2.1.1 Process Overview

2.1.2 Task Configuration


taskA

echo 'taskA'
echo "#{setValue(linesNum=${lines_num})}"
echo '${setValue(words=20)}'

Note: Here, ${lines_num} is directly replaced by the Worker.


taskB

echo 'taskB'
echo ${linesNum}
echo ${words}

2.1.3 Output Results

Focus on taskB output

}
[INFO] 2024-07-05 10:09:54.539 +0800 - Success initialized task plugin instance successfully  
[INFO] 2024-07-05 10:09:54.539 +0800 - Set taskVarPool: [{"prop":"linesNum","direct":"IN","type":"VARCHAR","value":"100"},{"prop":"words","direct":"IN","type":"VARCHAR","value":"20"}] successfully  
[INFO] 2024-07-05 10:09:54.539 +0800 - ***********************************************************************************************  
[INFO] 2024-07-05 10:09:54.539 +0800 - *********************************  Execute task instance  *************************************  
[INFO] 2024-07-05 10:09:54.539 +0800 - ***********************************************************************************************  
[INFO] 2024-07-05 10:09:54.540 +0800 - Final Shell file is:  
[INFO] 2024-07-05 10:09:54.540 +0800 - ****************************** Script Content *****************************************************************  
[INFO] 2024-07-05 10:09:54.540 +0800 - #!/bin/bash  
echo 'taskB'  
echo 100  
echo 20  
[INFO] 2024-07-05 10:09:56.544 +0800 -  ->   
    taskB  
    100  
    20  
[INFO] 2024-07-05 10:09:56.546 +0800 - process has exited. Execute path:/tmp/dolphinscheduler/... , processId:588336 , exitStatusCode:0  

2.2 Shell File Transfer

2.2.1 Process Overview

2.2.2 Task Configuration


fileUploadTask

echo 'fileUploadTask'
mkdir -p data/test1 data/test2
echo "test1 message" >> data/test1/text.txt
echo "test2 message" >> data/test2/text.txt
tree .


fileDownloadTask

echo 'fileDownloadTask'
cat input_dir/test1/text.txt
cat input_dir/test2/text.txt

2.2.3 Output Results

[INFO] 2024-07-05 11:11:08.160 +0800 - Success initialized task plugin instance successfully  
[INFO] 2024-07-05 11:11:08.160 +0800 - Set taskVarPool: [{"prop":"fileUploadTask.file-text","direct":"IN","type":"FILE","value":"DATA_TRANSFER/..."}] successfully  
[INFO] 2024-07-05 11:11:08.164 +0800 - process start, process id is: 590323  
[INFO] 2024-07-05 11:11:10.164 +0800 -  ->   
    fileDownloadTask  
    test1 message  
    test2 message  
[INFO] 2024-07-05 11:11:10.166 +0800 - process has exited. Execute path:/tmp/dolphinscheduler/... , processId:590323 , exitStatusCode:0  

2.3 SQL Task Output Variables

The mode involves outputting results from an SQL task and using them in a Shell task.

2.3.1 Process Overview

2.3.2 Task Configuration


sqlOutVarTask


SELECT user_name AS userNameList FROM t_ds_user;


readOutVarTask

echo 'readOutVarTask'
echo ${userNameList}

2.3.3 Output Results

[INFO] 2024-07-05 11:19:00.294 +0800 - Success initialized task plugin instance successfully
[INFO] 2024-07-05 11:19:00.294 +0800 - Set taskVarPool: [{"prop":"userNameList","direct":"IN","type":"LIST","value":"[\"admin\",\"qiaozhanwei\",\"test\"]"}] successfully
[INFO] 2024-07-05 11:19:00.294 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 11:19:00.294 +0800 - *********************************  Execute task instance  *************************************
[INFO] 2024-07-05 11:19:00.294 +0800 - ***********************************************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - Final Shell file is: 
[INFO] 2024-07-05 11:19:00.295 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - #!/bin/bash
BASEDIR=$(cd `dirname $0`; pwd)
cd $BASEDIR
source /etc/profile
export HADOOP_HOME=${HADOOP_HOME:-/home/hadoop-3.3.1}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME=${SPARK_HOME:-/home/spark-3.2.1-bin-hadoop3.2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/home/hive-3.1.2}
export FLINK_HOME=/home/flink-1.18.1
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export SEATUNNEL_HOME=/opt/software/seatunnel
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}

export PATH=``HADOOP_HOME/bin:``SPARK_HOME/bin:``PYTHON_HOME/bin:``JAVA_HOME/bin:``HIVE_HOME/bin:``FLINK_HOME/bin:``DATAX_HOME/bin:``SEATUNNEL_HOME/bin:``CHUNJUN_HOME/bin:``PATH
echo 'readOutVarTask'
echo ["admin","qiaozhanwei","test"]
[INFO] 2024-07-05 11:19:00.295 +0800 - ****************************** Script Content *****************************************************************
[INFO] 2024-07-05 11:19:00.295 +0800 - Executing shell command : sudo -u root -i /tmp/dolphinscheduler/exec/process/root/13850571680800/14172048617888_1/1963/1458/1963_1458.sh
[INFO] 2024-07-05 11:19:00.299 +0800 - process start, process id is: 590781
[INFO] 2024-07-05 11:19:02.299 +0800 -  -> 
    readOutVarTask
    [admin,qiaozhanwei,test]
[INFO] 2024-07-05 11:19:02.301 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/root/13850571680800/14172048617888_1/1963/1458, processId:590781 ,exitStatusCode:0 ,processWaitForStatus:true ,processExitValue:0  

Conclusion

Output variables in DolphinScheduler are a powerful mechanism for improving workflow flexibility and maintainability. By mastering their usage, users can create more dynamic, efficient, and scalable workflows.


By applying these techniques, you can maximize the potential of DolphinScheduler in handling complex data integration and process automation tasks.