How to Run SeaTunnel in Separated Cluster Mode on K8s

Apache SeaTunnel is a new generation of high-performance, distributed data integration and synchronization tool that has been widely recognized and applied in the industry. SeaTunnel supports three deployment modes: Local mode, Hybrid Cluster Mode, and Separated Cluster Mode. This article aims to introduce the deployment of SeaTunnel in Separated Cluster Mode on Kubernetes, providing a comprehensive deployment process and configuration examples for those with relevant needs. 1. Preparation Before starting deployment, the following environments and components must be ready: Kubernetes cluster environmentkubectl command-line tooldockerhelm (optional) Kubernetes cluster environment kubectl command-line tool docker helm (optional) For those familiar with Helm, you can directly refer to the official Helm deployment tutorial: https://seatunnel.apache.org/docs/2.3.10/start-v2/kubernetes/helmhttps://github.com/apache/seatunnel/tree/dev/deploy/kubernetes/seatunnel https://seatunnel.apache.org/docs/2.3.10/start-v2/kubernetes/helm https://seatunnel.apache.org/docs/2.3.10/start-v2/kubernetes/helm https://github.com/apache/seatunnel/tree/dev/deploy/kubernetes/seatunnel https://github.com/apache/seatunnel/tree/dev/deploy/kubernetes/seatunnel This article mainly introduces deployment based on Kubernetes and kubectl tools. 2. Build SeaTunnel Docker Image The official images of various versions are already provided and can be pulled directly. For details, please refer to the official documentation: Set Up With Docker. Set Up With Docker docker pull apache/seatunnel: docker pull apache/seatunnel: Since we need to deploy cluster mode, the next step is to configure cluster network communication. The network service of the SeaTunnel cluster is implemented via Hazelcast, so we will configure this part next. Hazelcast 3. Hazelcast Cluster Related Configuration Headless Service Configuration The Hazelcast cluster is a network formed by cluster members running Hazelcast, which automatically join together to form a cluster. This automatic joining is achieved through various discovery mechanisms used by cluster members to find each other. Hazelcast supports the following discovery mechanisms: Auto Discovery, supporting environments like:AWSAzureGCPKubernetesTCPMulticastEurekaZookeeper Auto Discovery, supporting environments like: AWS Azure GCP Kubernetes TCP Multicast Eureka Zookeeper In this article’s cluster deployment, we configure Hazelcast using Kubernetes auto discovery mechanism. Detailed principles can be found in the official document: Kubernetes Auto Discovery. Kubernetes Auto Discovery Hazelcast’s Kubernetes auto discovery mechanism (DNS Lookup mode) requires Kubernetes Headless Service to work. Headless Service resolves the service domain name into a list of IP addresses of all matching Pods, enabling Hazelcast cluster members to discover each other. First, we create a Kubernetes Headless Service: # use for hazelcast cluster join apiVersion: v1 kind: Service metadata: name: seatunnel-cluster spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 ports: - port: 5801 name: hazelcast # use for hazelcast cluster join apiVersion: v1 kind: Service metadata: name: seatunnel-cluster spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 ports: - port: 5801 name: hazelcast Key parts of the above configuration: metadata.name: seatunnel-cluster: service name, Hazelcast clients/nodes discover cluster members through this namespec.clusterIP: None: critical configuration declaring this as Headless Service without virtual IPspec.selector: selector matching Pod labels that will be selected by this Servicespec.port: port exposed for Hazelcast metadata.name: seatunnel-cluster: service name, Hazelcast clients/nodes discover cluster members through this name metadata.name: seatunnel-cluster spec.clusterIP: None: critical configuration declaring this as Headless Service without virtual IP spec.clusterIP: None spec.selector: selector matching Pod labels that will be selected by this Service spec.selector spec.port: port exposed for Hazelcast spec.port Meanwhile, to access the cluster externally via REST API, we define another Service for the master node Pod: # use for access seatunnel from outside system via rest api apiVersion: v1 kind: Service metadata: name: seatunnel-cluster-master spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master ports: - port: 8080 name: "master-port" targetPort: 8080 protocol: TCP # use for access seatunnel from outside system via rest api apiVersion: v1 kind: Service metadata: name: seatunnel-cluster-master spec: type: ClusterIP clusterIP: None selector: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master ports: - port: 8080 name: "master-port" targetPort: 8080 protocol: TCP After defining the above Kubernetes Services, next configure hazelcast-master.yaml and hazelcast-worker.yaml files according to Hazelcast’s Kubernetes discovery mechanism. hazelcast-master.yaml hazelcast-worker.yaml Hazelcast master and worker yaml configurations In SeaTunnel’s separated cluster mode, all network-related configuration is contained in hazelcast-master.yaml and hazelcast-worker.yaml. hazelcast-master.yaml hazelcast-worker.yaml hazelcast-master.yaml example: hazelcast-master.yaml hazelcast: cluster-name: seatunnel-cluster network: rest-api: enabled: true endpoint-groups: CLUSTER_WRITE: enabled: true DATA: enabled: true join: kubernetes: enabled: true service-dns: seatunnel-cluster.bigdata.svc.cluster.local service-port: 5801 port: auto-increment: false port: 5801 properties: hazelcast.invocation.max.retry.count: 20 hazelcast.tcp.join.port.try.count: 30 hazelcast.logging.type: log4j2 hazelcast.operation.generic.thread.count: 50 hazelcast.heartbeat.failuredetector.type: phi-accrual hazelcast.heartbeat.interval.seconds: 30 hazelcast.max.no.heartbeat.seconds: 300 hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15 hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200 hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200 hazelcast: cluster-name: seatunnel-cluster network: rest-api: enabled: true endpoint-groups: CLUSTER_WRITE: enabled: true DATA: enabled: true join: kubernetes: enabled: true service-dns: seatunnel-cluster.bigdata.svc.cluster.local service-port: 5801 port: auto-increment: false port: 5801 properties: hazelcast.invocation.max.retry.count: 20 hazelcast.tcp.join.port.try.count: 30 hazelcast.logging.type: log4j2 hazelcast.operation.generic.thread.count: 50 hazelcast.heartbeat.failuredetector.type: phi-accrual hazelcast.heartbeat.interval.seconds: 30 hazelcast.max.no.heartbeat.seconds: 300 hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15 hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200 hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200 Key configuration items: cluster-nameThis config identifies if multiple nodes belong to the same cluster; only nodes with the same cluster-name will join the same Hazelcast cluster. Different cluster-name nodes reject requests from each other.Network configuration cluster-name This config identifies if multiple nodes belong to the same cluster; only nodes with the same cluster-name will join the same Hazelcast cluster. Different cluster-name nodes reject requests from each other. Network configuration rest-api.enabled: Hazelcast REST service is disabled by default in ST 2.3.10; it must be explicitly enabled here.service-dns (required): full domain name of the Headless Service, generally ${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local.service-port (optional): Hazelcast port; if specified and > 0, overrides default port (5701). rest-api.enabled: Hazelcast REST service is disabled by default in ST 2.3.10; it must be explicitly enabled here. service-dns (required): full domain name of the Headless Service, generally ${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local. ${SERVICE-NAME}.${NAMESPACE}.svc.cluster.local service-port (optional): Hazelcast port; if specified and > 0, overrides default port (5701). Using this Kubernetes join mechanism, when Hazelcast Pod starts, it resolves the service-dns to get the IP list of all member Pods (via Headless Service), and then members attempt TCP connections over port 5801. Similarly, the hazelcast-worker.yaml configuration is: hazelcast-worker.yaml hazelcast: cluster-name: seatunnel-cluster network: rest-api: enabled: true endpoint-groups: CLUSTER_WRITE: enabled: true DATA: enabled: true join: kubernetes: enabled: true service-dns: seatunnel-cluster.bigdata.svc.cluster.local service-port: 5801 port: auto-increment: false port: 5801 properties: hazelcast.invocation.max.retry.count: 20 hazelcast.tcp.join.port.try.count: 30 hazelcast.logging.type: log4j2 hazelcast.operation.generic.thread.count: 50 hazelcast.heartbeat.failuredetector.type: phi-accrual hazelcast.heartbeat.interval.seconds: 30 hazelcast.max.no.heartbeat.seconds: 300 hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15 hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200 hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200 member-attributes: rule: type: string value: worker hazelcast: cluster-name: seatunnel-cluster network: rest-api: enabled: true endpoint-groups: CLUSTER_WRITE: enabled: true DATA: enabled: true join: kubernetes: enabled: true service-dns: seatunnel-cluster.bigdata.svc.cluster.local service-port: 5801 port: auto-increment: false port: 5801 properties: hazelcast.invocation.max.retry.count: 20 hazelcast.tcp.join.port.try.count: 30 hazelcast.logging.type: log4j2 hazelcast.operation.generic.thread.count: 50 hazelcast.heartbeat.failuredetector.type: phi-accrual hazelcast.heartbeat.interval.seconds: 30 hazelcast.max.no.heartbeat.seconds: 300 hazelcast.heartbeat.phiaccrual.failuredetector.threshold: 15 hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200 hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 200 member-attributes: rule: type: string value: worker Through the above, we complete Hazelcast cluster member discovery configuration based on Kubernetes. Next, proceed to configure SeaTunnel engine. 4. Configure SeaTunnel Engine The configuration related to the SeaTunnel engine is all in the seatunnel.yaml file. Below is a sample seatunnel.yaml configuration for reference: seatunnel.yaml seatunnel.yaml seatunnel: engine: history-job-expire-minutes: 1440 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 classloader-cache-mode: true http: enable-http: true port: 8080 enable-dynamic-port: false port-range: 100 slot-service: dynamic-slot: true checkpoint: interval: 300000 timeout: 60000 storage: type: hdfs max-retained: 3 plugin-config: namespace: /tmp/seatunnel/checkpoint_snapshot storage.type: hdfs fs.defaultFS: hdfs://xxx:8020 # Ensure directory has write permission telemetry: metric: enabled: true seatunnel: engine: history-job-expire-minutes: 1440 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 classloader-cache-mode: true http: enable-http: true port: 8080 enable-dynamic-port: false port-range: 100 slot-service: dynamic-slot: true checkpoint: interval: 300000 timeout: 60000 storage: type: hdfs max-retained: 3 plugin-config: namespace: /tmp/seatunnel/checkpoint_snapshot storage.type: hdfs fs.defaultFS: hdfs://xxx:8020 # Ensure directory has write permission telemetry: metric: enabled: true This includes the following configuration information: history-job-expire-minutes: the retention period of task history records is 24 hours (1440 minutes), after which they will be automatically cleaned up.backup-count: 1: number of backup replicas for task state is 1.queue-type: blockingqueue: use a blocking queue to manage tasks to avoid resource exhaustion.print-execution-info-interval: 60: print task execution status every 60 seconds.print-job-metrics-info-interval: 60: output task metrics (such as throughput, latency) every 60 seconds.classloader-cache-mode: true: enable class loader caching to reduce repeated loading overhead and improve performance.dynamic-slot: true: allow dynamic adjustment of task slot quantity based on load to optimize resource utilization.checkpoint.interval: 300000: trigger checkpoint every 5 minutes.checkpoint.timeout: 60000: checkpoint timeout set to 1 minute.telemetry.metric.enabled: true: enable collection of runtime task metrics (e.g., latency, throughput) for monitoring. history-job-expire-minutes: the retention period of task history records is 24 hours (1440 minutes), after which they will be automatically cleaned up. history-job-expire-minutes backup-count: 1: number of backup replicas for task state is 1. backup-count: 1 queue-type: blockingqueue: use a blocking queue to manage tasks to avoid resource exhaustion. queue-type: blockingqueue print-execution-info-interval: 60: print task execution status every 60 seconds. print-execution-info-interval: 60 print-job-metrics-info-interval: 60: output task metrics (such as throughput, latency) every 60 seconds. print-job-metrics-info-interval: 60 classloader-cache-mode: true: enable class loader caching to reduce repeated loading overhead and improve performance. classloader-cache-mode: true dynamic-slot: true: allow dynamic adjustment of task slot quantity based on load to optimize resource utilization. dynamic-slot: true checkpoint.interval: 300000: trigger checkpoint every 5 minutes. checkpoint.interval: 300000 checkpoint.timeout: 60000: checkpoint timeout set to 1 minute. checkpoint.timeout: 60000 telemetry.metric.enabled: true: enable collection of runtime task metrics (e.g., latency, throughput) for monitoring. telemetry.metric.enabled: true 5. Create Kubernetes YAML Files to Deploy the Application After completing the above workflow, the final step is to create Kubernetes YAML files for Master and Worker nodes, defining deployment-related configurations. To decouple configuration files from the application, the above-mentioned configuration files are merged into one ConfigMap, mounted under the container's configuration path for unified management and easier updates. Below are sample configurations for seatunnel-cluster-master.yaml and seatunnel-cluster-worker.yaml, covering ConfigMap mounting, container startup commands, and deployment resource definitions. seatunnel-cluster-master.yaml seatunnel-cluster-worker.yaml seatunnel-cluster-master.yaml: seatunnel-cluster-master.yaml apiVersion: apps/v1 kind: Deployment metadata: name: seatunnel-cluster-master spec: replicas: 2 # modify replicas according to your scenario strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 50% selector: matchLabels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master template: metadata: annotations: prometheus.io/path: /hazelcast/rest/instance/metrics prometheus.io/port: "5801" prometheus.io/scrape: "true" prometheus.io/role: "seatunnel-master" labels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeAffinity-key operator: Exists containers: - name: seatunnel-master image: seatunnel:2.3.10 imagePullPolicy: IfNotPresent ports: - containerPort: 5801 name: hazelcast - containerPort: 8080 name: "master-port" command: - /opt/seatunnel/bin/seatunnel-cluster.sh - -r - master resources: requests: cpu: "1" memory: 4G volumeMounts: - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml" name: seatunnel-configs subPath: hazelcast-master.yaml - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml" name: seatunnel-configs subPath: hazelcast-worker.yaml - mountPath: "/opt/seatunnel/config/seatunnel.yaml" name: seatunnel-configs subPath: seatunnel.yaml - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml" name: seatunnel-configs subPath: hazelcast-client.yaml - mountPath: "/opt/seatunnel/config/log4j2_client.properties" name: seatunnel-configs subPath: log4j2_client.properties - mountPath: "/opt/seatunnel/config/log4j2.properties" name: seatunnel-configs subPath: log4j2.properties volumes: - name: seatunnel-configs configMap: name: seatunnel-cluster-configs apiVersion: apps/v1 kind: Deployment metadata: name: seatunnel-cluster-master spec: replicas: 2 # modify replicas according to your scenario strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 50% selector: matchLabels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master template: metadata: annotations: prometheus.io/path: /hazelcast/rest/instance/metrics prometheus.io/port: "5801" prometheus.io/scrape: "true" prometheus.io/role: "seatunnel-master" labels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-master app.kubernetes.io/component: master spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeAffinity-key operator: Exists containers: - name: seatunnel-master image: seatunnel:2.3.10 imagePullPolicy: IfNotPresent ports: - containerPort: 5801 name: hazelcast - containerPort: 8080 name: "master-port" command: - /opt/seatunnel/bin/seatunnel-cluster.sh - -r - master resources: requests: cpu: "1" memory: 4G volumeMounts: - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml" name: seatunnel-configs subPath: hazelcast-master.yaml - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml" name: seatunnel-configs subPath: hazelcast-worker.yaml - mountPath: "/opt/seatunnel/config/seatunnel.yaml" name: seatunnel-configs subPath: seatunnel.yaml - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml" name: seatunnel-configs subPath: hazelcast-client.yaml - mountPath: "/opt/seatunnel/config/log4j2_client.properties" name: seatunnel-configs subPath: log4j2_client.properties - mountPath: "/opt/seatunnel/config/log4j2.properties" name: seatunnel-configs subPath: log4j2.properties volumes: - name: seatunnel-configs configMap: name: seatunnel-cluster-configs Deployment Strategy Use multiple replicas (replicas=2) to ensure service high availability.Use rolling update strategy for zero downtime deployment:maxUnavailable: 25%: ensure at least 75% of Pods are running during updates.maxSurge: 50%: temporarily allow 50% more Pods during transition for smooth upgrade. Use multiple replicas (replicas=2) to ensure service high availability. replicas=2 Use rolling update strategy for zero downtime deployment: maxUnavailable: 25%: ensure at least 75% of Pods are running during updates. maxUnavailable: 25% maxSurge: 50%: temporarily allow 50% more Pods during transition for smooth upgrade. maxSurge: 50% Label Selectors Use Kubernetes recommended standard label systemspec.selector.matchLabels: defines the scope of Pods managed by the Deployment based on labelsspec.template.labels: labels assigned to new Pods to identify their metadata Use Kubernetes recommended standard label system standard label system spec.selector.matchLabels: defines the scope of Pods managed by the Deployment based on labels spec.selector.matchLabels spec.template.labels: labels assigned to new Pods to identify their metadata spec.template.labels Node Affinity Configure affinity to specify which nodes the Pod should be scheduled onReplace nodeAffinity-key with labels matching your Kubernetes environment nodes Configure affinity to specify which nodes the Pod should be scheduled on affinity Replace nodeAffinity-key with labels matching your Kubernetes environment nodes nodeAffinity-key Config File Mounting Centralize core configuration files in a ConfigMap to decouple management from applicationsUse subPath to mount individual files from ConfigMap Centralize core configuration files in a ConfigMap to decouple management from applications Use subPath to mount individual files from ConfigMap subPath The seatunnel-cluster-worker.yaml configuration is: seatunnel-cluster-worker.yaml apiVersion: apps/v1 kind: Deployment metadata: name: seatunnel-cluster-worker spec: replicas: 3 # modify replicas according to your scenario strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 50% selector: matchLabels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-worker app.kubernetes.io/component: worker template: metadata: annotations: prometheus.io/path: /hazelcast/rest/instance/metrics prometheus.io/port: "5801" prometheus.io/scrape: "true" prometheus.io/role: "seatunnel-worker" labels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-worker app.kubernetes.io/component: worker spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeAffinity-key operator: Exists containers: - name: seatunnel-worker image: seatunnel:2.3.10 imagePullPolicy: IfNotPresent ports: - containerPort: 5801 name: hazelcast command: - /opt/seatunnel/bin/seatunnel-cluster.sh - -r - worker resources: requests: cpu: "1" memory: 10G volumeMounts: - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml" name: seatunnel-configs subPath: hazelcast-master.yaml - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml" name: seatunnel-configs subPath: hazelcast-worker.yaml - mountPath: "/opt/seatunnel/config/seatunnel.yaml" name: seatunnel-configs subPath: seatunnel.yaml - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml" name: seatunnel-configs subPath: hazelcast-client.yaml - mountPath: "/opt/seatunnel/config/log4j2_client.properties" name: seatunnel-configs subPath: log4j2_client.properties - mountPath: "/opt/seatunnel/config/log4j2.properties" name: seatunnel-configs sub subPath: log4j2.properties volumes: - name: seatunnel-configs configMap: name: seatunnel-cluster-configs apiVersion: apps/v1 kind: Deployment metadata: name: seatunnel-cluster-worker spec: replicas: 3 # modify replicas according to your scenario strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 25% maxSurge: 50% selector: matchLabels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-worker app.kubernetes.io/component: worker template: metadata: annotations: prometheus.io/path: /hazelcast/rest/instance/metrics prometheus.io/port: "5801" prometheus.io/scrape: "true" prometheus.io/role: "seatunnel-worker" labels: app.kubernetes.io/instance: seatunnel-cluster-app app.kubernetes.io/version: 2.3.10 app.kubernetes.io/name: seatunnel-cluster-worker app.kubernetes.io/component: worker spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeAffinity-key operator: Exists containers: - name: seatunnel-worker image: seatunnel:2.3.10 imagePullPolicy: IfNotPresent ports: - containerPort: 5801 name: hazelcast command: - /opt/seatunnel/bin/seatunnel-cluster.sh - -r - worker resources: requests: cpu: "1" memory: 10G volumeMounts: - mountPath: "/opt/seatunnel/config/hazelcast-master.yaml" name: seatunnel-configs subPath: hazelcast-master.yaml - mountPath: "/opt/seatunnel/config/hazelcast-worker.yaml" name: seatunnel-configs subPath: hazelcast-worker.yaml - mountPath: "/opt/seatunnel/config/seatunnel.yaml" name: seatunnel-configs subPath: seatunnel.yaml - mountPath: "/opt/seatunnel/config/hazelcast-client.yaml" name: seatunnel-configs subPath: hazelcast-client.yaml - mountPath: "/opt/seatunnel/config/log4j2_client.properties" name: seatunnel-configs subPath: log4j2_client.properties - mountPath: "/opt/seatunnel/config/log4j2.properties" name: seatunnel-configs sub subPath: log4j2.properties volumes: - name: seatunnel-configs configMap: name: seatunnel-cluster-configs After defining the above master and worker YAML files, you can deploy them to the Kubernetes cluster by running: kubectl apply -f seatunnel-cluster-master.yaml kubectl apply -f seatunnel-cluster-worker.yaml kubectl apply -f seatunnel-cluster-master.yaml kubectl apply -f seatunnel-cluster-worker.yaml Under normal circumstances, you will see the SeaTunnel cluster running with 2 master nodes and 3 worker nodes: $ kubectl get pods | grep seatunnel-cluster seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m $ kubectl get pods | grep seatunnel-cluster seatunnel-cluster-master-6989898f66-6fjz8 1/1 Running 0 156m seatunnel-cluster-master-6989898f66-hbtdn 1/1 Running 0 155m seatunnel-cluster-worker-87fb469f7-5c96x 1/1 Running 0 156m seatunnel-cluster-worker-87fb469f7-7kt2h 1/1 Running 0 155m seatunnel-cluster-worker-87fb469f7-drm9r 1/1 Running 0 156m At this point, we have successfully deployed the SeaTunnel cluster in Kubernetes using the separated cluster mode. Now that the cluster is ready, how do clients submit jobs to it? 6. Client Submits Jobs to the Cluster Submit Jobs Using the Command-Line Tool All client configurations for SeaTunnel are located in the hazelcast-client.yaml file. hazelcast-client.yaml First, download the binary installation package locally on the client (which contains the bin and configdirectories), and ensure the SeaTunnel installation path is consistent with the server. This is what the official documentation refers to as: Setting the SEATUNNEL_HOME the same as the server, otherwise errors such as "cannot find connector plugin path on the server" may occur because the server-side plugin path differs from the client-side path. bin config Enter the installation directory and modify the config/hazelcast-client.yaml file to point to the Headless Service address created earlier: config/hazelcast-client.yaml hazelcast-client: cluster-name: seatunnel-cluster properties: hazelcast.logging.type: log4j2 connection-strategy: connection-retry: cluster-connect-timeout-millis: 3000 network: cluster-members: - seatunnel-cluster.bigdata.svc.cluster.local:5801 hazelcast-client: cluster-name: seatunnel-cluster properties: hazelcast.logging.type: log4j2 connection-strategy: connection-retry: cluster-connect-timeout-millis: 3000 network: cluster-members: - seatunnel-cluster.bigdata.svc.cluster.local:5801 After the client configuration is done, you can submit jobs to the cluster. There are two main ways to configure JVM options for job submission: Configure JVM options in the config/jvm_client_options file:JVM options configured here will apply to all jobs submitted via seatunnel.sh, regardless of running in local or cluster mode. All submitted jobs will share the same JVM configuration.Specify JVM options directly in the command line when submitting jobs:When submitting jobs via seatunnel.sh, you can specify JVM parameters on the command line, e.g.,sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G.This allows specifying JVM options individually for each job submission. Configure JVM options in the config/jvm_client_options file: config/jvm_client_options JVM options configured here will apply to all jobs submitted via seatunnel.sh, regardless of running in local or cluster mode. All submitted jobs will share the same JVM configuration. seatunnel.sh Specify JVM options directly in the command line when submitting jobs: When submitting jobs via seatunnel.sh, you can specify JVM parameters on the command line, e.g., seatunnel.sh sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G. sh bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -DJvmOption=-Xms2G -Xmx2G This allows specifying JVM options individually for each job submission. Next, here is a sample job configuration to demonstrate submitting a job to the cluster: env { parallelism = 2 job.mode = "STREAMING" checkpoint.interval = 2000 } source { FakeSource { parallelism = 2 plugin_output = "fake" row.num = 16 schema = { fields { name = "string" age = "int" } } } } sink { Console { } } env { parallelism = 2 job.mode = "STREAMING" checkpoint.interval = 2000 } source { FakeSource { parallelism = 2 plugin_output = "fake" row.num = 16 schema = { fields { name = "string" age = "int" } } } } sink { Console { } } Use the following command on the client to submit the job: sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G" sh bin/seatunnel.sh --config config/v2.streaming.example.template -m cluster -n st.example.template -DJvmOption="-Xms2G -Xmx2G" On the Master node, list running jobs with: $ sh bin/seatunnel.sh -l Job ID Job Name Job Status Submit Time Finished Time ------------------ ------------------- ---------- ----------------------- ----------------------- 964354250769432580 st.example.template RUNNING 2025-04-15 10:39:30.588 $ sh bin/seatunnel.sh -l Job ID Job Name Job Status Submit Time Finished Time ------------------ ------------------- ---------- ----------------------- ----------------------- 964354250769432580 st.example.template RUNNING 2025-04-15 10:39:30.588 You can see the job named st.example.template is currently in the RUNNING state. In the Worker node logs, you should observe log entries like: st.example.template 2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049 2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602 ... 2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bdaUB, 110348049 2025-04-15 10:34:41,998 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=1: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : mOifY, 1974539087 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : jKFrR, 1828047742 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=1 rowIndex=2: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : gDiqR, 1177544796 2025-04-15 10:34:41,999 INFO [.a.s.c.s.c.s.ConsoleSinkWriter] [st-multi-table-sink-writer-1] - subtaskIndex=0 rowIndex=3: SeaTunnelRow#tableId=fake SeaTunnelRow#kind=INSERT : bCVxc, 909343602 ... This confirms the job has been successfully submitted to the SeaTunnel cluster and is running normally. Submit Jobs Using the REST API SeaTunnel also provides a REST API for querying job status, statistics, submitting, and stopping jobs. We configured a Headless Service for Master nodes with port 8080 exposed. This allows submitting jobs via REST API from clients. You can submit a job by uploading the configuration file via curl: curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' --form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' --form 'jobName=st.example.template' {"jobId":"964553575034257409","jobName":"st.example.template"} curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/submit-job/upload' --form 'config_file=@"/opt/seatunnel/config/v2.streaming.example.template"' --form 'jobName=st.example.template' {"jobId":"964553575034257409","jobName":"st.example.template"} If submission succeeds, the API returns the job ID and job name as above. To list running jobs, query: curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/running-jobs' [{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...}] curl 'http://seatunnel-cluster-master.bigdata.svc.cluster.local:8080/running-jobs' [{"jobId":"964553575034257409","jobName":"st.example.template","jobStatus":"RUNNING","envOptions":{"job.mode":"STREAMING","checkpoint.interval":"2000","parallelism":"2"}, ...}] The response shows the job status and additional metadata, confirming the REST API job submission method works correctly. More details on the REST API can be found in the official documentation: RESTful API V2 RESTful API V2 7. Summary This article focused on how to deploy SeaTunnel in Kubernetes using the recommended separated cluster mode. To summarize, the main deployment steps include: Prepare the Kubernetes environment: Ensure a running Kubernetes cluster and necessary tools are installed.Build SeaTunnel Docker images: Use the official image if no custom development is needed; otherwise, build locally and create your own image.Configure Headless Service and Hazelcast cluster:Hazelcast’s Kubernetes auto-discovery DNS Lookup mode requires Kubernetes Headless Service, so create a Headless Service and configure Hazelcast with the service DNS accordingly. The Headless Service resolves to all pods’ IPs to enable Hazelcast cluster member discovery.Configure SeaTunnel engine: Modify seatunnel.yaml to set engine parameters.Create Kubernetes deployment YAML files: Define Master and Worker deployments with node selectors, startup commands, resources, and volume mounts, then deploy to Kubernetes.Configure the SeaTunnel client: Install SeaTunnel on the client, ensure SEATUNNEL_HOME matches the server, and configure hazelcast-client.yaml to connect to the cluster.Submit and run jobs: Submit jobs from the client to the SeaTunnel cluster for execution. Prepare the Kubernetes environment: Ensure a running Kubernetes cluster and necessary tools are installed. Build SeaTunnel Docker images: Use the official image if no custom development is needed; otherwise, build locally and create your own image. Configure Headless Service and Hazelcast cluster:Hazelcast’s Kubernetes auto-discovery DNS Lookup mode requires Kubernetes Headless Service, so create a Headless Service and configure Hazelcast with the service DNS accordingly. The Headless Service resolves to all pods’ IPs to enable Hazelcast cluster member discovery. Configure SeaTunnel engine: Modify seatunnel.yaml to set engine parameters. seatunnel.yaml Create Kubernetes deployment YAML files: Define Master and Worker deployments with node selectors, startup commands, resources, and volume mounts, then deploy to Kubernetes. Configure the SeaTunnel client: Install SeaTunnel on the client, ensure SEATUNNEL_HOME matches the server, and configure hazelcast-client.yaml to connect to the cluster. SEATUNNEL_HOME hazelcast-client.yaml Submit and run jobs: Submit jobs from the client to the SeaTunnel cluster for execution. The configurations and cases presented here serve as references. There may be many other configuration options and details not covered. Feedback and discussions are welcome. Hope this is helpful for everyone!