The Storage System in Apache DolphinScheduler provides a unified interface for storing and retrieving files across various storage backends. It enables resource management for workflows and tasks, allowing users to upload files such as scripts, JAR files, configuration files, and other artifacts that can be used in task execution. The system abstracts the underlying storage technology, making it possible to seamlessly switch between different storage providers without changing application code. Architecture Overview The storage system is designed as a pluggable component with a consistent API across different storage implementations. This architecture allows DolphinScheduler to work with multiple storage backends while maintaining a unified interface for resource operations. Sources: dolphinscheduler-storage-plugin/dolphinscheduler-storage-api/src/main/java/org/apache/dolphinscheduler/plugin/storage/api/StorageType.java 22-62 dolphinscheduler-common/src/main/resources/common.properties 24-33 dolphinscheduler-storage-plugin/dolphinscheduler-storage-api/src/main/java/org/apache/dolphinscheduler/plugin/storage/api/StorageType.java 22-62 dolphinscheduler-common/src/main/resources/common.properties 24-33 Supported Storage Types DolphinScheduler supports the following storage backends: Sources: dolphinscheduler-storage-plugin/dolphinscheduler-storage-api/src/main/java/org/apache/dolphinscheduler/plugin/storage/api/StorageType.java 22-36 docs/docs/en/guide/resource/configuration.md 1-7 dolphinscheduler-storage-plugin/dolphinscheduler-storage-api/src/main/java/org/apache/dolphinscheduler/plugin/storage/api/StorageType.java 22-36 docs/docs/en/guide/resource/configuration.md 1-7 Plugin Architecture The storage functionality is implemented using a plugin architecture that allows for easy extension and maintenance. Sources: dolphinscheduler-storage-plugin/pom.xml 30-39 dolphinscheduler-storage-plugin/dolphinscheduler-storage-all/pom.xml 29-54 dolphinscheduler-storage-plugin/pom.xml 30-39 dolphinscheduler-storage-plugin/dolphinscheduler-storage-all/pom.xml 29-54 Configuration The storage system is configured through the common.properties file. Different storage backends require different configuration parameters. common.properties Basic Configuration # Storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS, COS resource.storage.type=LOCAL # Base path for resource storage resource.storage.upload.base.path=/tmp/dolphinscheduler # Storage type: LOCAL, HDFS, S3, OSS, GCS, ABS, OBS, COS resource.storage.type=LOCAL # Base path for resource storage resource.storage.upload.base.path=/tmp/dolphinscheduler Sources: dolphinscheduler-common/src/main/resources/common.properties 24-27 dolphinscheduler-common/src/main/resources/common.properties 24-27 Configuration Flow When DolphinScheduler starts, it loads the storage configuration and initializes the appropriate storage operator: Sources: dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/utils/PropertyUtils.java 49-60 dolphinscheduler-common/src/main/java/org/apache/dolphinscheduler/common/utils/PropertyUtils.java 49-60 Storage Type-Specific Configuration Local Storage The Local Storage option stores files on the local file system of the machine where DolphinScheduler is running. This is the default configuration. resource.storage.type=LOCAL resource.storage.upload.base.path=/tmp/dolphinscheduler resource.storage.type=LOCAL resource.storage.upload.base.path=/tmp/dolphinscheduler Note: When using LOCAL storage type with multiple DolphinScheduler nodes, each node has its own local file system. This means resources uploaded on one node are not automatically available on other nodes unless you use a shared file system. Sources: dolphinscheduler-common/src/main/resources/common.properties 24-27 docs/docs/en/guide/resource/configuration.md 10-28 dolphinscheduler-common/src/main/resources/common.properties 24-27 docs/docs/en/guide/resource/configuration.md 10-28 HDFS Storage For HDFS storage, additional configuration is required: resource.storage.type=HDFS resource.hdfs.fs.defaultFS=hdfs://namenode:8020 resource.hdfs.root.user=hdfs resource.storage.type=HDFS resource.hdfs.fs.defaultFS=hdfs://namenode:8020 resource.hdfs.root.user=hdfs If HDFS with Kerberos authentication is used, additional Kerberos configuration is required. Sources: dolphinscheduler-common/src/main/resources/common.properties 97-115 dolphinscheduler-common/src/main/resources/common.properties 97-115 S3 Storage For Amazon S3 or S3-compatible storage: resource.storage.type=S3 resource.storage.type=S3 AWS connection parameters are specified in the aws.yaml file: aws.yaml aws: s3: credentials.provider.type: AWSStaticCredentialsProvider access.key.id: <access.key.id> access.key.secret: <access.key.secret> region: <region> bucket.name: <bucket.name> endpoint: <endpoint> aws: s3: credentials.provider.type: AWSStaticCredentialsProvider access.key.id: <access.key.id> access.key.secret: <access.key.secret> region: <region> bucket.name: <bucket.name> endpoint: <endpoint> Sources: docs/docs/en/guide/resource/configuration.md 29-53 docs/docs/en/guide/resource/configuration.md 29-53 Other Cloud Storage DolphinScheduler also supports storage on Alibaba Cloud OSS, Huawei Cloud OBS, Tencent Cloud COS, Google Cloud Storage, and Azure Blob Storage, each with its own configuration parameters. Sources: docs/docs/en/guide/resource/configuration.md 54-127 docs/docs/en/guide/resource/configuration.md 54-127 Database Schema for Resources In addition to the actual file storage, DolphinScheduler also maintains metadata about resources in its database. The relevant tables include: Resource metadata tables (storing information about resources like name, path, owner, etc.) Resource-user relation tables (defining access permissions) Resource metadata tables (storing information about resources like name, path, owner, etc.) Resource-user relation tables (defining access permissions) Sources: dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_mysql.sql 729-748 dolphinscheduler-dao/src/main/resources/sql/dolphinscheduler_mysql.sql 729-748 Inter-Component Integration The Storage System integrates with other DolphinScheduler components: Sources: The diagrams provided in the prompt showing system architecture. Usage Considerations When selecting a storage type, consider the following: Single-node vs. Multi-node: For a single-node deployment, LOCAL storage is sufficient. For multi-node deployments, consider HDFS or cloud storage. Performance: Local storage typically offers the best performance but lacks distributed capabilities. HDFS provides good performance for on-premises deployments, while cloud storage options are suitable for cloud-based deployments. Reliability: Cloud storage providers typically offer high durability and availability. For on-premises deployments, HDFS with proper replication provides reliable storage. Integration: If you're already using a particular cloud provider or have an existing Hadoop cluster, it may be simplest to use the corresponding storage option. Cost: Different storage options have different cost structures. Cloud storage typically charges for storage volume, requests, and data transfer. Single-node vs. Multi-node: For a single-node deployment, LOCAL storage is sufficient. For multi-node deployments, consider HDFS or cloud storage. Performance: Local storage typically offers the best performance but lacks distributed capabilities. HDFS provides good performance for on-premises deployments, while cloud storage options are suitable for cloud-based deployments. Reliability: Cloud storage providers typically offer high durability and availability. For on-premises deployments, HDFS with proper replication provides reliable storage. Integration: If you're already using a particular cloud provider or have an existing Hadoop cluster, it may be simplest to use the corresponding storage option. Cost: Different storage options have different cost structures. Cloud storage typically charges for storage volume, requests, and data transfer. Sources: docs/docs/en/guide/resource/configuration.md 13-26 docs/docs/zh/guide/resource/configuration.md 12-25 docs/docs/en/guide/resource/configuration.md 13-26 docs/docs/zh/guide/resource/configuration.md 12-25 Configuration Best Practices Consistent Configuration: Ensure the storage configuration is identical across all DolphinScheduler nodes (API server and Worker server). Permissions: The user running DolphinScheduler must have appropriate permissions to access the configured storage. Shared Storage: In a distributed deployment, use a shared storage solution (HDFS, S3, etc.) rather than LOCAL storage to ensure all nodes can access the same resources. Security: For cloud storage, use appropriate security measures such as IAM roles or access keys with minimal required permissions. Backup: Implement a backup strategy for your resource storage, especially for critical resources. Consistent Configuration: Ensure the storage configuration is identical across all DolphinScheduler nodes (API server and Worker server). Permissions: The user running DolphinScheduler must have appropriate permissions to access the configured storage. Shared Storage: In a distributed deployment, use a shared storage solution (HDFS, S3, etc.) rather than LOCAL storage to ensure all nodes can access the same resources. Security: For cloud storage, use appropriate security measures such as IAM roles or access keys with minimal required permissions. Backup: Implement a backup strategy for your resource storage, especially for critical resources. Sources: docs/docs/en/guide/resource/configuration.md 95-97 docs/docs/zh/guide/resource/configuration.md 87-91 docs/docs/en/guide/resource/configuration.md 95-97 docs/docs/zh/guide/resource/configuration.md 87-91