QuickStart in Kubernetes

Kubernetes deployment is deploy DolphinScheduler in a Kubernetes cluster, which can schedule a large number of tasks and can be used in production.

If you are a green hand and want to experience DolphinScheduler, we recommended you install follow Standalone. If you want to experience more complete functions or schedule large tasks number, we recommended you install follow pseudo-cluster deployment. If you want to using DolphinScheduler in production, we recommended you follow cluster deployment or kubernetes

Prerequisites

  • Helm 3.1.0+
  • Kubernetes 1.12+
  • PV provisioner support in the underlying infrastructure

Installing the Chart

Please download the source code package apache-dolphinscheduler-2.0.0-src.tar.gz, download address: download

To install the chart with the release name dolphinscheduler, please execute the following commands:

  1. $ tar -zxvf apache-dolphinscheduler-2.0.0-src.tar.gz
  2. $ cd apache-dolphinscheduler-2.0.0-src/docker/kubernetes/dolphinscheduler
  3. $ helm repo add bitnami https://charts.bitnami.com/bitnami
  4. $ helm dependency update .
  5. $ helm install dolphinscheduler . --set image.tag=2.0.0

To install the chart with a namespace named test:

  1. $ helm install dolphinscheduler . -n test

Tip: If a namespace named test is used, the option -n test needs to be added to the helm and kubectl command

These commands deploy DolphinScheduler on the Kubernetes cluster in the default configuration. The Appendix-Configuration section lists the parameters that can be configured during installation.

Tip: List all releases using helm list

The PostgreSQL (with username root, password root and database dolphinscheduler) and ZooKeeper services will start by default

Access DolphinScheduler UI

If ingress.enabled in values.yaml is set to true, you just access http://${ingress.host}/dolphinscheduler in browser.

Tip: If there is a problem with ingress access, please contact the Kubernetes administrator and refer to the Ingress

Otherwise, when api.service.type=ClusterIP you need to execute port-forward command like:

  1. $ kubectl port-forward --address 0.0.0.0 svc/dolphinscheduler-api 12345:12345
  2. $ kubectl port-forward --address 0.0.0.0 -n test svc/dolphinscheduler-api 12345:12345 # with test namespace

Tip: If the error of unable to do port forwarding: socat not found appears, you need to install socat at first

And then access the web: http://192.168.xx.xx:12345/dolphinscheduler (The local address is http://127.0.0.1:12345/dolphinscheduler)

Or when api.service.type=NodePort you need to execute the command:

  1. NODE_IP=$(kubectl get no -n {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
  2. NODE_PORT=$(kubectl get svc {{ template "dolphinscheduler.fullname" . }}-api -n {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}")
  3. echo http://$NODE_IP:$NODE_PORT/dolphinscheduler

And then access the web: http://NODEIP:NODE\_IP:NODE​I​​P:NODE\_PORT/dolphinscheduler

The default username is admin and the default password is dolphinscheduler123

Please refer to the Quick Start in the chapter User Manual to explore how to use DolphinScheduler

Uninstalling the Chart

To uninstall/delete the dolphinscheduler deployment:

  1. $ helm uninstall dolphinscheduler

The command removes all the Kubernetes components but PVC’s associated with the chart and deletes the release.

To delete the PVC’s associated with dolphinscheduler:

  1. $ kubectl delete pvc -l app.kubernetes.io/instance=dolphinscheduler

Note: Deleting the PVC’s will delete all data as well. Please be cautious before doing it.

Configuration

The configuration file is values.yaml, and the Appendix-Configuration tables lists the configurable parameters of the DolphinScheduler and their default values.

Support Matrix

TypeSupportNotes
ShellYes
Python2Yes
Python3Indirect YesRefer to FAQ
Hadoop2Indirect YesRefer to FAQ
Hadoop3Not SureNot tested
Spark-Local(client)Indirect YesRefer to FAQ
Spark-YARN(cluster)Indirect YesRefer to FAQ
Spark-Standalone(cluster)Not Yet
Spark-Kubernetes(cluster)Not Yet
Flink-Local(local>=1.11)Not YetGeneric CLI mode is not yet supported
Flink-YARN(yarn-cluster)Indirect YesRefer to FAQ
Flink-YARN(yarn-session/yarn-per-job/yarn-application>=1.11)Not YetGeneric CLI mode is not yet supported
Flink-Standalone(default)Not Yet
Flink-Standalone(remote>=1.11)Not YetGeneric CLI mode is not yet supported
Flink-Kubernetes(default)Not Yet
Flink-Kubernetes(remote>=1.11)Not YetGeneric CLI mode is not yet supported
Flink-NativeKubernetes(kubernetes-session/application>=1.11)Not YetGeneric CLI mode is not yet supported
MapReduceIndirect YesRefer to FAQ
KerberosIndirect YesRefer to FAQ
HTTPYes
DataXIndirect YesRefer to FAQ
SqoopIndirect YesRefer to FAQ
SQL-MySQLIndirect YesRefer to FAQ
SQL-PostgreSQLYes
SQL-HiveIndirect YesRefer to FAQ
SQL-SparkIndirect YesRefer to FAQ
SQL-ClickHouseIndirect YesRefer to FAQ
SQL-OracleIndirect YesRefer to FAQ
SQL-SQLServerIndirect YesRefer to FAQ
SQL-DB2Indirect YesRefer to FAQ

FAQ

How to view the logs of a pod container?

List all pods (aka po):

  1. kubectl get po
  2. kubectl get po -n test # with test namespace

View the logs of a pod container named dolphinscheduler-master-0:

  1. kubectl logs dolphinscheduler-master-0
  2. kubectl logs -f dolphinscheduler-master-0 # follow log output
  3. kubectl logs --tail 10 dolphinscheduler-master-0 -n test # show last 10 lines from the end of the logs

How to scale api, master and worker on Kubernetes?

List all deployments (aka deploy):

  1. kubectl get deploy
  2. kubectl get deploy -n test # with test namespace

Scale api to 3 replicas:

  1. kubectl scale --replicas=3 deploy dolphinscheduler-api
  2. kubectl scale --replicas=3 deploy dolphinscheduler-api -n test # with test namespace

List all stateful sets (aka sts):

  1. kubectl get sts
  2. kubectl get sts -n test # with test namespace

Scale master to 2 replicas:

  1. kubectl scale --replicas=2 sts dolphinscheduler-master
  2. kubectl scale --replicas=2 sts dolphinscheduler-master -n test # with test namespace

Scale worker to 6 replicas:

  1. kubectl scale --replicas=6 sts dolphinscheduler-worker
  2. kubectl scale --replicas=6 sts dolphinscheduler-worker -n test # with test namespace

How to use MySQL as the DolphinScheduler’s database instead of PostgreSQL?

Because of the commercial license, we cannot directly use the driver of MySQL.

If you want to use MySQL, you can build a new image based on the apache/dolphinscheduler image as follows.

  1. Download the MySQL driver mysql-connector-java-8.0.16.jar

  2. Create a new Dockerfile to add MySQL driver:

  1. FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.0
  2. COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
  1. Build a new docker image including MySQL driver:
  1. docker build -t apache/dolphinscheduler:mysql-driver .
  1. Push the docker image apache/dolphinscheduler:mysql-driver to a docker registry

  2. Modify image repository and update tag to mysql-driver in values.yaml

  3. Modify postgresql enabled to false in values.yaml

  4. Modify externalDatabase (especially modify host, username and password) in values.yaml:

  1. externalDatabase:
  2. type: "mysql"
  3. driver: "com.mysql.jdbc.Driver"
  4. host: "localhost"
  5. port: "3306"
  6. username: "root"
  7. password: "root"
  8. database: "dolphinscheduler"
  9. params: "useUnicode=true&characterEncoding=UTF-8"
  1. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

How to support MySQL datasource in Datasource manage?

Because of the commercial license, we cannot directly use the driver of MySQL.

If you want to add MySQL datasource, you can build a new image based on the apache/dolphinscheduler image as follows.

  1. Download the MySQL driver mysql-connector-java-8.0.16.jar

  2. Create a new Dockerfile to add MySQL driver:

  1. FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.0
  2. COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib
  1. Build a new docker image including MySQL driver:
  1. docker build -t apache/dolphinscheduler:mysql-driver .
  1. Push the docker image apache/dolphinscheduler:mysql-driver to a docker registry

  2. Modify image repository and update tag to mysql-driver in values.yaml

  3. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  4. Add a MySQL datasource in Datasource manage

How to support Oracle datasource in Datasource manage?

Because of the commercial license, we cannot directly use the driver of Oracle.

If you want to add Oracle datasource, you can build a new image based on the apache/dolphinscheduler image as follows.

  1. Download the Oracle driver ojdbc8.jar (such as ojdbc8-19.9.0.0.jar)

  2. Create a new Dockerfile to add Oracle driver:

  1. FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.0
  2. COPY ojdbc8-19.9.0.0.jar /opt/dolphinscheduler/lib
  1. Build a new docker image including Oracle driver:
  1. docker build -t apache/dolphinscheduler:oracle-driver .
  1. Push the docker image apache/dolphinscheduler:oracle-driver to a docker registry

  2. Modify image repository and update tag to oracle-driver in values.yaml

  3. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  4. Add an Oracle datasource in Datasource manage

How to support Python 2 pip and custom requirements.txt?

  1. Create a new Dockerfile to install pip:
  1. FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.0
  2. COPY requirements.txt /tmp
  3. RUN apt-get update && \
  4. apt-get install -y --no-install-recommends python-pip && \
  5. pip install --no-cache-dir -r /tmp/requirements.txt && \
  6. rm -rf /var/lib/apt/lists/*

The command will install the default pip 18.1. If you upgrade the pip, just add one line

  1. pip install --no-cache-dir -U pip && \
  1. Build a new docker image including pip:
  1. docker build -t apache/dolphinscheduler:pip .
  1. Push the docker image apache/dolphinscheduler:pip to a docker registry

  2. Modify image repository and update tag to pip in values.yaml

  3. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  4. Verify pip under a new Python task

How to support Python 3?

  1. Create a new Dockerfile to install Python 3:
  1. FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.0
  2. RUN apt-get update && \
  3. apt-get install -y --no-install-recommends python3 && \
  4. rm -rf /var/lib/apt/lists/*

The command will install the default Python 3.7.3. If you also want to install pip3, just replace python3 with python3-pip like

  1. apt-get install -y --no-install-recommends python3-pip && \
  1. Build a new docker image including Python 3:
  1. docker build -t apache/dolphinscheduler:python3 .
  1. Push the docker image apache/dolphinscheduler:python3 to a docker registry

  2. Modify image repository and update tag to python3 in values.yaml

  3. Modify PYTHON_HOME to /usr/bin/python3 in values.yaml

  4. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  5. Verify Python 3 under a new Python task

Take Spark 2.4.7 as an example:

  1. Download the Spark 2.4.7 release binary spark-2.4.7-bin-hadoop2.7.tgz

  2. Ensure that common.sharedStoragePersistence.enabled is turned on

  3. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  4. Copy the Spark 2.4.7 release binary into the Docker container

  1. kubectl cp spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
  2. kubectl cp -n test spark-2.4.7-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace

Because the volume sharedStoragePersistence is mounted on /opt/soft, all files in /opt/soft will not be lost

  1. Attach the container and ensure that SPARK_HOME2 exists
  1. kubectl exec -it dolphinscheduler-worker-0 bash
  2. kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
  3. cd /opt/soft
  4. tar zxf spark-2.4.7-bin-hadoop2.7.tgz
  5. rm -f spark-2.4.7-bin-hadoop2.7.tgz
  6. ln -s spark-2.4.7-bin-hadoop2.7 spark2 # or just mv
  7. $SPARK_HOME2/bin/spark-submit --version

The last command will print the Spark version if everything goes well

  1. Verify Spark under a Shell task
  1. $SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.11-2.4.7.jar

Check whether the task log contains the output like Pi is roughly 3.146015

  1. Verify Spark under a Spark task

The file spark-examples_2.11-2.4.7.jar needs to be uploaded to the resources first, and then create a Spark task with:

  • Spark Version: SPARK2
  • Main Class: org.apache.spark.examples.SparkPi
  • Main Package: spark-examples_2.11-2.4.7.jar
  • Deploy Mode: local

Similarly, check whether the task log contains the output like Pi is roughly 3.146015

  1. Verify Spark on YARN

Spark on YARN (Deploy Mode is cluster or client) requires Hadoop support. Similar to Spark support, the operation of supporting Hadoop is almost the same as the previous steps

Ensure that $HADOOP_HOME and $HADOOP_CONF_DIR exists

How to support Spark 3?

In fact, the way to submit applications with spark-submit is the same, regardless of Spark 1, 2 or 3. In other words, the semantics of SPARK_HOME2 is the second SPARK_HOME instead of SPARK2‘s HOME, so just set SPARK_HOME2=/path/to/spark3

Take Spark 3.1.1 as an example:

  1. Download the Spark 3.1.1 release binary spark-3.1.1-bin-hadoop2.7.tgz

  2. Ensure that common.sharedStoragePersistence.enabled is turned on

  3. Run a DolphinScheduler release in Kubernetes (See Installing the Chart)

  4. Copy the Spark 3.1.1 release binary into the Docker container

  1. kubectl cp spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft
  2. kubectl cp -n test spark-3.1.1-bin-hadoop2.7.tgz dolphinscheduler-worker-0:/opt/soft # with test namespace
  1. Attach the container and ensure that SPARK_HOME2 exists
  1. kubectl exec -it dolphinscheduler-worker-0 bash
  2. kubectl exec -n test -it dolphinscheduler-worker-0 bash # with test namespace
  3. cd /opt/soft
  4. tar zxf spark-3.1.1-bin-hadoop2.7.tgz
  5. rm -f spark-3.1.1-bin-hadoop2.7.tgz
  6. ln -s spark-3.1.1-bin-hadoop2.7 spark2 # or just mv
  7. $SPARK_HOME2/bin/spark-submit --version

The last command will print the Spark version if everything goes well

  1. Verify Spark under a Shell task
  1. $SPARK_HOME2/bin/spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME2/examples/jars/spark-examples_2.12-3.1.1.jar

Check whether the task log contains the output like Pi is roughly 3.146015

How to support shared storage between Master, Worker and Api server?

For example, Master, Worker and API server may use Hadoop at the same time

  1. Modify the following configurations in values.yaml
  1. common:
  2. sharedStoragePersistence:
  3. enabled: false
  4. mountPath: "/opt/soft"
  5. accessModes:
  6. - "ReadWriteMany"
  7. storageClassName: "-"
  8. storage: "20Gi"

storageClassName and storage need to be modified to actual values

Note: storageClassName must support the access mode: ReadWriteMany

  1. Copy the Hadoop into the directory /opt/soft

  2. Ensure that $HADOOP_HOME and $HADOOP_CONF_DIR are correct

How to support local file resource storage instead of HDFS and S3?

Modify the following configurations in values.yaml

  1. common:
  2. configmap:
  3. RESOURCE_STORAGE_TYPE: "HDFS"
  4. RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
  5. FS_DEFAULT_FS: "file:///"
  6. fsFileResourcePersistence:
  7. enabled: true
  8. accessModes:
  9. - "ReadWriteMany"
  10. storageClassName: "-"
  11. storage: "20Gi"

storageClassName and storage need to be modified to actual values

Note: storageClassName must support the access mode: ReadWriteMany

How to support S3 resource storage like MinIO?

Take MinIO as an example: Modify the following configurations in values.yaml

  1. common:
  2. configmap:
  3. RESOURCE_STORAGE_TYPE: "S3"
  4. RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
  5. FS_DEFAULT_FS: "s3a://BUCKET_NAME"
  6. FS_S3A_ENDPOINT: "http://MINIO_IP:9000"
  7. FS_S3A_ACCESS_KEY: "MINIO_ACCESS_KEY"
  8. FS_S3A_SECRET_KEY: "MINIO_SECRET_KEY"

BUCKET_NAME, MINIO_IP, MINIO_ACCESS_KEY and MINIO_SECRET_KEY need to be modified to actual values

Note: MINIO_IP can only use IP instead of domain name, because DolphinScheduler currently doesn’t support S3 path style access

How to configure SkyWalking?

Modify SKYWALKING configurations in values.yaml:

  1. common:
  2. configmap:
  3. SKYWALKING_ENABLE: "true"
  4. SW_AGENT_COLLECTOR_BACKEND_SERVICES: "127.0.0.1:11800"
  5. SW_GRPC_LOG_SERVER_HOST: "127.0.0.1"
  6. SW_GRPC_LOG_SERVER_PORT: "11800"

Appendix-Configuration

ParameterDescriptionDefault
timezoneWorld time and date for cities in all time zonesAsia/Shanghai
image.repositoryDocker image repository for the DolphinSchedulerapache/dolphinscheduler
image.tagDocker image version for the DolphinSchedulerlatest
image.pullPolicyImage pull policy. One of Always, Never, IfNotPresentIfNotPresent
image.pullSecretImage pull secret. An optional reference to secret in the same namespace to use for pulling any of the imagesnil
postgresql.enabledIf not exists external PostgreSQL, by default, the DolphinScheduler will use a internal PostgreSQLtrue
postgresql.postgresqlUsernameThe username for internal PostgreSQLroot
postgresql.postgresqlPasswordThe password for internal PostgreSQLroot
postgresql.postgresqlDatabaseThe database for internal PostgreSQLdolphinscheduler
postgresql.persistence.enabledSet postgresql.persistence.enabled to true to mount a new volume for internal PostgreSQLfalse
postgresql.persistence.sizePersistentVolumeClaim size20Gi
postgresql.persistence.storageClassPostgreSQL data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
externalDatabase.typeIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database type will use itpostgresql
externalDatabase.driverIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database driver will use itorg.postgresql.Driver
externalDatabase.hostIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database host will use itlocalhost
externalDatabase.portIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database port will use it5432
externalDatabase.usernameIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database username will use itroot
externalDatabase.passwordIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database password will use itroot
externalDatabase.databaseIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database database will use itdolphinscheduler
externalDatabase.paramsIf exists external PostgreSQL, and set postgresql.enabled value to false. DolphinScheduler’s database params will use itcharacterEncoding=utf8
zookeeper.enabledIf not exists external Zookeeper, by default, the DolphinScheduler will use a internal Zookeepertrue
zookeeper.fourlwCommandsWhitelistA list of comma separated Four Letter Words commands to usesrvr,ruok,wchs,cons
zookeeper.persistence.enabledSet zookeeper.persistence.enabled to true to mount a new volume for internal Zookeeperfalse
zookeeper.persistence.sizePersistentVolumeClaim size20Gi
zookeeper.persistence.storageClassZookeeper data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
zookeeper.zookeeperRootSpecify dolphinscheduler root directory in Zookeeper/dolphinscheduler
externalZookeeper.zookeeperQuorumIf exists external Zookeeper, and set zookeeper.enabled value to false. Specify Zookeeper quorum127.0.0.1:2181
externalZookeeper.zookeeperRootIf exists external Zookeeper, and set zookeeper.enabled value to false. Specify dolphinscheduler root directory in Zookeeper/dolphinscheduler
common.configmap.DOLPHINSCHEDULER_OPTSThe jvm options for dolphinscheduler, suitable for all servers“”
common.configmap.DATA_BASEDIR_PATHUser data directory path, self configuration, please make sure the directory exists and have read write permissions/tmp/dolphinscheduler
common.configmap.RESOURCE_STORAGE_TYPEResource storage type: HDFS, S3, NONEHDFS
common.configmap.RESOURCE_UPLOAD_PATHResource store on HDFS/S3 path, please make sure the directory exists on hdfs and have read write permissions/dolphinscheduler
common.configmap.FS_DEFAULT_FSResource storage file system like file:///, hdfs://mycluster:8020 or s3a://dolphinschedulerfile:///
common.configmap.FS_S3A_ENDPOINTS3 endpoint when common.configmap.RESOURCE_STORAGE_TYPE is set to S3s3.xxx.amazonaws.com
common.configmap.FS_S3A_ACCESS_KEYS3 access key when common.configmap.RESOURCE_STORAGE_TYPE is set to S3xxxxxxx
common.configmap.FS_S3A_SECRET_KEYS3 secret key when common.configmap.RESOURCE_STORAGE_TYPE is set to S3xxxxxxx
common.configmap.HADOOP_SECURITY_AUTHENTICATION_STARTUP_STATEWhether to startup kerberosfalse
common.configmap.JAVA_SECURITY_KRB5_CONF_PATHThe java.security.krb5.conf path/opt/krb5.conf
common.configmap.LOGIN_USER_KEYTAB_USERNAMEThe login user from keytab usernamehdfs@HADOOP.COM
common.configmap.LOGIN_USER_KEYTAB_PATHThe login user from keytab path/opt/hdfs.keytab
common.configmap.KERBEROS_EXPIRE_TIMEThe kerberos expire time, the unit is hour2
common.configmap.HDFS_ROOT_USERThe HDFS root user who must have the permission to create directories under the HDFS root pathhdfs
common.configmap.RESOURCE_MANAGER_HTTPADDRESS_PORTSet resource manager httpaddress port for yarn8088
common.configmap.YARN_RESOURCEMANAGER_HA_RM_IDSIf resourcemanager HA is enabled, please set the HA IPsnil
common.configmap.YARN_APPLICATION_STATUS_ADDRESSIf resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname, otherwise keep defaulthttp://ds1:%s/ws/v1/cluster/apps/%s
common.configmap.SKYWALKING_ENABLESet whether to enable skywalkingfalse
common.configmap.SW_AGENT_COLLECTOR_BACKEND_SERVICESSet agent collector backend services for skywalking127.0.0.1:11800
common.configmap.SW_GRPC_LOG_SERVER_HOSTSet grpc log server host for skywalking127.0.0.1
common.configmap.SW_GRPC_LOG_SERVER_PORTSet grpc log server port for skywalking11800
common.configmap.HADOOP_HOMESet HADOOP_HOME for DolphinScheduler’s task environment/opt/soft/hadoop
common.configmap.HADOOP_CONF_DIRSet HADOOP_CONF_DIR for DolphinScheduler’s task environment/opt/soft/hadoop/etc/hadoop
common.configmap.SPARK_HOME1Set SPARK_HOME1 for DolphinScheduler’s task environment/opt/soft/spark1
common.configmap.SPARK_HOME2Set SPARK_HOME2 for DolphinScheduler’s task environment/opt/soft/spark2
common.configmap.PYTHON_HOMESet PYTHON_HOME for DolphinScheduler’s task environment/usr/bin/python
common.configmap.JAVA_HOMESet JAVA_HOME for DolphinScheduler’s task environment/usr/local/openjdk-8
common.configmap.HIVE_HOMESet HIVE_HOME for DolphinScheduler’s task environment/opt/soft/hive
common.configmap.FLINK_HOMESet FLINK_HOME for DolphinScheduler’s task environment/opt/soft/flink
common.configmap.DATAX_HOMESet DATAX_HOME for DolphinScheduler’s task environment/opt/soft/datax
common.sharedStoragePersistence.enabledSet common.sharedStoragePersistence.enabled to true to mount a shared storage volume for Hadoop, Spark binary and etcfalse
common.sharedStoragePersistence.mountPathThe mount path for the shared storage volume/opt/soft
common.sharedStoragePersistence.accessModesPersistentVolumeClaim access modes, must be ReadWriteMany[ReadWriteMany]
common.sharedStoragePersistence.storageClassNameShared Storage persistent volume storage class, must support the access mode: ReadWriteMany-
common.sharedStoragePersistence.storagePersistentVolumeClaim size20Gi
common.fsFileResourcePersistence.enabledSet common.fsFileResourcePersistence.enabled to true to mount a new file resource volume for api and workerfalse
common.fsFileResourcePersistence.accessModesPersistentVolumeClaim access modes, must be ReadWriteMany[ReadWriteMany]
common.fsFileResourcePersistence.storageClassNameResource persistent volume storage class, must support the access mode: ReadWriteMany-
common.fsFileResourcePersistence.storagePersistentVolumeClaim size20Gi
master.podManagementPolicyPodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling downParallel
master.replicasReplicas is the desired number of replicas of the given Template3
master.annotationsThe annotations for master server{}
master.affinityIf specified, the pod’s scheduling constraints{}
master.nodeSelectorNodeSelector is a selector which must be true for the pod to fit on a node{}
master.tolerationsIf specified, the pod’s tolerations{}
master.resourcesThe resource limit and request config for master server{}
master.configmap.MASTER_SERVER_OPTSThe jvm options for master server-Xms1g -Xmx1g -Xmn512m
master.configmap.MASTER_EXEC_THREADSMaster execute thread number to limit process instances100
master.configmap.MASTER_EXEC_TASK_NUMMaster execute task number in parallel per process instance20
master.configmap.MASTER_DISPATCH_TASK_NUMMaster dispatch task number per batch3
master.configmap.MASTER_HOST_SELECTORMaster host selector to select a suitable worker, optional values include Random, RoundRobin, LowerWeightLowerWeight
master.configmap.MASTER_HEARTBEAT_INTERVALMaster heartbeat interval, the unit is second10
master.configmap.MASTER_TASK_COMMIT_RETRYTIMESMaster commit task retry times5
master.configmap.MASTER_TASK_COMMIT_INTERVALmaster commit task interval, the unit is second1
master.configmap.MASTER_MAX_CPULOAD_AVGMaster max cpuload avg, only higher than the system cpu load average, master server can schedule-1 (the number of cpu cores 2)
master.configmap.MASTER_RESERVED_MEMORYMaster reserved memory, only lower than system available memory, master server can schedule, the unit is G0.3
master.livenessProbe.enabledTurn on and off liveness probetrue
master.livenessProbe.initialDelaySecondsDelay before liveness probe is initiated30
master.livenessProbe.periodSecondsHow often to perform the probe30
master.livenessProbe.timeoutSecondsWhen the probe times out5
master.livenessProbe.failureThresholdMinimum consecutive successes for the probe3
master.livenessProbe.successThresholdMinimum consecutive failures for the probe1
master.readinessProbe.enabledTurn on and off readiness probetrue
master.readinessProbe.initialDelaySecondsDelay before readiness probe is initiated30
master.readinessProbe.periodSecondsHow often to perform the probe30
master.readinessProbe.timeoutSecondsWhen the probe times out5
master.readinessProbe.failureThresholdMinimum consecutive successes for the probe3
master.readinessProbe.successThresholdMinimum consecutive failures for the probe1
master.persistentVolumeClaim.enabledSet master.persistentVolumeClaim.enabled to true to mount a new volume for masterfalse
master.persistentVolumeClaim.accessModesPersistentVolumeClaim access modes[ReadWriteOnce]
master.persistentVolumeClaim.storageClassNameMaster logs data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
master.persistentVolumeClaim.storagePersistentVolumeClaim size20Gi
worker.podManagementPolicyPodManagementPolicy controls how pods are created during initial scale up, when replacing pods on nodes, or when scaling downParallel
worker.replicasReplicas is the desired number of replicas of the given Template3
worker.annotationsThe annotations for worker server{}
worker.affinityIf specified, the pod’s scheduling constraints{}
worker.nodeSelectorNodeSelector is a selector which must be true for the pod to fit on a node{}
worker.tolerationsIf specified, the pod’s tolerations{}
worker.resourcesThe resource limit and request config for worker server{}
worker.configmap.LOGGER_SERVER_OPTSThe jvm options for logger server-Xms512m -Xmx512m -Xmn256m
worker.configmap.WORKER_SERVER_OPTSThe jvm options for worker server-Xms1g -Xmx1g -Xmn512m
worker.configmap.WORKER_EXEC_THREADSWorker execute thread number to limit task instances100
worker.configmap.WORKER_HEARTBEAT_INTERVALWorker heartbeat interval, the unit is second10
worker.configmap.WORKER_MAX_CPULOAD_AVGWorker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks-1 (the number of cpu cores 2)
worker.configmap.WORKER_RESERVED_MEMORYWorker reserved memory, only lower than system available memory, worker server can be dispatched tasks, the unit is G0.3
worker.configmap.WORKER_GROUPSWorker groupsdefault
worker.livenessProbe.enabledTurn on and off liveness probetrue
worker.livenessProbe.initialDelaySecondsDelay before liveness probe is initiated30
worker.livenessProbe.periodSecondsHow often to perform the probe30
worker.livenessProbe.timeoutSecondsWhen the probe times out5
worker.livenessProbe.failureThresholdMinimum consecutive successes for the probe3
worker.livenessProbe.successThresholdMinimum consecutive failures for the probe1
worker.readinessProbe.enabledTurn on and off readiness probetrue
worker.readinessProbe.initialDelaySecondsDelay before readiness probe is initiated30
worker.readinessProbe.periodSecondsHow often to perform the probe30
worker.readinessProbe.timeoutSecondsWhen the probe times out5
worker.readinessProbe.failureThresholdMinimum consecutive successes for the probe3
worker.readinessProbe.successThresholdMinimum consecutive failures for the probe1
worker.persistentVolumeClaim.enabledSet worker.persistentVolumeClaim.enabled to true to enable persistentVolumeClaim for workerfalse
worker.persistentVolumeClaim.dataPersistentVolume.enabledSet worker.persistentVolumeClaim.dataPersistentVolume.enabled to true to mount a data volume for workerfalse
worker.persistentVolumeClaim.dataPersistentVolume.accessModesPersistentVolumeClaim access modes[ReadWriteOnce]
worker.persistentVolumeClaim.dataPersistentVolume.storageClassNameWorker data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
worker.persistentVolumeClaim.dataPersistentVolume.storagePersistentVolumeClaim size20Gi
worker.persistentVolumeClaim.logsPersistentVolume.enabledSet worker.persistentVolumeClaim.logsPersistentVolume.enabled to true to mount a logs volume for workerfalse
worker.persistentVolumeClaim.logsPersistentVolume.accessModesPersistentVolumeClaim access modes[ReadWriteOnce]
worker.persistentVolumeClaim.logsPersistentVolume.storageClassNameWorker logs data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
worker.persistentVolumeClaim.logsPersistentVolume.storagePersistentVolumeClaim size20Gi
alert.replicasReplicas is the desired number of replicas of the given Template1
alert.strategy.typeType of deployment. Can be “Recreate” or “RollingUpdate”RollingUpdate
alert.strategy.rollingUpdate.maxSurgeThe maximum number of pods that can be scheduled above the desired number of pods25%
alert.strategy.rollingUpdate.maxUnavailableThe maximum number of pods that can be unavailable during the update25%
alert.annotationsThe annotations for alert server{}
alert.affinityIf specified, the pod’s scheduling constraints{}
alert.nodeSelectorNodeSelector is a selector which must be true for the pod to fit on a node{}
alert.tolerationsIf specified, the pod’s tolerations{}
alert.resourcesThe resource limit and request config for alert server{}
alert.configmap.ALERT_SERVER_OPTSThe jvm options for alert server-Xms512m -Xmx512m -Xmn256m
alert.configmap.XLS_FILE_PATHXLS file path/tmp/xls
alert.configmap.MAIL_SERVER_HOSTMail SERVER HOSTnil
alert.configmap.MAIL_SERVER_PORTMail SERVER PORTnil
alert.configmap.MAIL_SENDERMail SENDERnil
alert.configmap.MAIL_USERMail USERnil
alert.configmap.MAIL_PASSWDMail PASSWORDnil
alert.configmap.MAIL_SMTP_STARTTLS_ENABLEMail SMTP STARTTLS enablefalse
alert.configmap.MAIL_SMTP_SSL_ENABLEMail SMTP SSL enablefalse
alert.configmap.MAIL_SMTP_SSL_TRUSTMail SMTP SSL TRUSTnil
alert.configmap.ENTERPRISE_WECHAT_ENABLEEnterprise Wechat enablefalse
alert.configmap.ENTERPRISE_WECHAT_CORP_IDEnterprise Wechat corp idnil
alert.configmap.ENTERPRISE_WECHAT_SECRETEnterprise Wechat secretnil
alert.configmap.ENTERPRISE_WECHAT_AGENT_IDEnterprise Wechat agent idnil
alert.configmap.ENTERPRISE_WECHAT_USERSEnterprise Wechat usersnil
alert.livenessProbe.enabledTurn on and off liveness probetrue
alert.livenessProbe.initialDelaySecondsDelay before liveness probe is initiated30
alert.livenessProbe.periodSecondsHow often to perform the probe30
alert.livenessProbe.timeoutSecondsWhen the probe times out5
alert.livenessProbe.failureThresholdMinimum consecutive successes for the probe3
alert.livenessProbe.successThresholdMinimum consecutive failures for the probe1
alert.readinessProbe.enabledTurn on and off readiness probetrue
alert.readinessProbe.initialDelaySecondsDelay before readiness probe is initiated30
alert.readinessProbe.periodSecondsHow often to perform the probe30
alert.readinessProbe.timeoutSecondsWhen the probe times out5
alert.readinessProbe.failureThresholdMinimum consecutive successes for the probe3
alert.readinessProbe.successThresholdMinimum consecutive failures for the probe1
alert.persistentVolumeClaim.enabledSet alert.persistentVolumeClaim.enabled to true to mount a new volume for alertfalse
alert.persistentVolumeClaim.accessModesPersistentVolumeClaim access modes[ReadWriteOnce]
alert.persistentVolumeClaim.storageClassNameAlert logs data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
alert.persistentVolumeClaim.storagePersistentVolumeClaim size20Gi
api.replicasReplicas is the desired number of replicas of the given Template1
api.strategy.typeType of deployment. Can be “Recreate” or “RollingUpdate”RollingUpdate
api.strategy.rollingUpdate.maxSurgeThe maximum number of pods that can be scheduled above the desired number of pods25%
api.strategy.rollingUpdate.maxUnavailableThe maximum number of pods that can be unavailable during the update25%
api.annotationsThe annotations for api server{}
api.affinityIf specified, the pod’s scheduling constraints{}
api.nodeSelectorNodeSelector is a selector which must be true for the pod to fit on a node{}
api.tolerationsIf specified, the pod’s tolerations{}
api.resourcesThe resource limit and request config for api server{}
api.configmap.API_SERVER_OPTSThe jvm options for api server-Xms512m -Xmx512m -Xmn256m
api.livenessProbe.enabledTurn on and off liveness probetrue
api.livenessProbe.initialDelaySecondsDelay before liveness probe is initiated30
api.livenessProbe.periodSecondsHow often to perform the probe30
api.livenessProbe.timeoutSecondsWhen the probe times out5
api.livenessProbe.failureThresholdMinimum consecutive successes for the probe3
api.livenessProbe.successThresholdMinimum consecutive failures for the probe1
api.readinessProbe.enabledTurn on and off readiness probetrue
api.readinessProbe.initialDelaySecondsDelay before readiness probe is initiated30
api.readinessProbe.periodSecondsHow often to perform the probe30
api.readinessProbe.timeoutSecondsWhen the probe times out5
api.readinessProbe.failureThresholdMinimum consecutive successes for the probe3
api.readinessProbe.successThresholdMinimum consecutive failures for the probe1
api.persistentVolumeClaim.enabledSet api.persistentVolumeClaim.enabled to true to mount a new volume for apifalse
api.persistentVolumeClaim.accessModesPersistentVolumeClaim access modes[ReadWriteOnce]
api.persistentVolumeClaim.storageClassNameapi logs data persistent volume storage class. If set to “-“, storageClassName: “”, which disables dynamic provisioning-
api.persistentVolumeClaim.storagePersistentVolumeClaim size20Gi
api.service.typetype determines how the Service is exposed. Valid options are ExternalName, ClusterIP, NodePort, and LoadBalancerClusterIP
api.service.clusterIPclusterIP is the IP address of the service and is usually assigned randomly by the masternil
api.service.nodePortnodePort is the port on each node on which this service is exposed when type=NodePortnil
api.service.externalIPsexternalIPs is a list of IP addresses for which nodes in the cluster will also accept traffic for this service[]
api.service.externalNameexternalName is the external reference that kubedns or equivalent will return as a CNAME record for this servicenil
api.service.loadBalancerIPloadBalancerIP when service.type is LoadBalancer. LoadBalancer will get created with the IP specified in this fieldnil
api.service.annotationsannotations may need to be set when service.type is LoadBalancer{}
ingress.enabledEnable ingressfalse
ingress.hostIngress hostdolphinscheduler.org
ingress.pathIngress path/dolphinscheduler
ingress.tls.enabledEnable ingress tlsfalse
ingress.tls.secretNameIngress tls secret namedolphinscheduler-tls