Configuration

Preface

This document explains the DolphinScheduler application configurations.

Directory Structure

The directory structure of DolphinScheduler is as follows:

  1. ├── LICENSE
  2. ├── NOTICE
  3. ├── licenses directory of licenses
  4. ├── bin directory of DolphinScheduler application commands, configrations scripts
  5. ├── dolphinscheduler-daemon.sh script to start or shut down DolphinScheduler application
  6. ├── env directory of scripts to load environment variables
  7. ├── dolphinscheduler_env.sh script to export environment variables [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...] when you start or stop service using script `dolphinscheduler-daemon.sh`
  8. └── install_env.sh script to export environment variables for DolphinScheduler installation when you use scripts `install.sh` `start-all.sh` `stop-all.sh` `status-all.sh`
  9. ├── install.sh script to auto-setup services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
  10. ├── remove-zk-node.sh script to cleanup ZooKeeper caches
  11. ├── scp-hosts.sh script to copy installation files to target hosts
  12. ├── start-all.sh script to start all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
  13. ├── status-all.sh script to check the status of all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
  14. └── stop-all.sh script to shut down all services when you deploy DolphinScheduler in `psuedo-cluster` mode or `cluster` mode
  15. ├── alert-server directory of DolphinScheduler alert-server commands, configrations scripts and libs
  16. ├── bin
  17. └── start.sh script to start DolphinScheduler alert-server
  18. ├── conf
  19. ├── application.yaml configurations of alert-server
  20. ├── common.properties configurations of common-service like storage, credentials, etc.
  21. ├── dolphinscheduler_env.sh script to load environment variables for alert-server
  22. └── logback-spring.xml configurations of alert-service log
  23. └── libs directory of alert-server libs
  24. ├── api-server directory of DolphinScheduler api-server commands, configrations scripts and libs
  25. ├── bin
  26. └── start.sh script to start DolphinScheduler api-server
  27. ├── conf
  28. ├── application.yaml configurations of api-server
  29. ├── common.properties configurations of common-service like storage, credentials, etc.
  30. ├── dolphinscheduler_env.sh script to load environment variables for api-server
  31. └── logback-spring.xml configurations of api-service log
  32. ├── libs directory of api-server libs
  33. └── ui directory of api-server related front-end web resources
  34. ├── master-server directory of DolphinScheduler master-server commands, configrations scripts and libs
  35. ├── bin
  36. └── start.sh script to start DolphinScheduler master-server
  37. ├── conf
  38. ├── application.yaml configurations of master-server
  39. ├── common.properties configurations of common-service like storage, credentials, etc.
  40. ├── dolphinscheduler_env.sh script to load environment variables for master-server
  41. └── logback-spring.xml configurations of master-service log
  42. └── libs directory of master-server libs
  43. ├── standalone-server directory of DolphinScheduler standalone-server commands, configrations scripts and libs
  44. ├── bin
  45. └── start.sh script to start DolphinScheduler standalone-server
  46. ├── conf
  47. ├── application.yaml configurations of standalone-server
  48. ├── common.properties configurations of common-service like storage, credentials, etc.
  49. ├── dolphinscheduler_env.sh script to load environment variables for standalone-server
  50. ├── logback-spring.xml configurations of standalone-service log
  51. └── sql .sql files to create or upgrade DolphinScheduler metadata
  52. ├── libs directory of standalone-server libs
  53. └── ui directory of standalone-server related front-end web resources
  54. ├── tools directory of DolphinScheduler metadata tools commands, configrations scripts and libs
  55. ├── bin
  56. └── upgrade-schema.sh script to initialize or upgrade DolphinScheduler metadata
  57. ├── conf
  58. ├── application.yaml configurations of tools
  59. └── common.properties configurations of common-service like storage, credentials, etc.
  60. ├── libs directory of tool libs
  61. └── sql .sql files to create or upgrade DolphinScheduler metadata
  62. ├── worker-server directory of DolphinScheduler worker-server commands, configrations scripts and libs
  63. ├── bin
  64. └── start.sh script to start DolphinScheduler worker-server
  65. ├── conf
  66. ├── application.yaml configurations of worker-server
  67. ├── common.properties configurations of common-service like storage, credentials, etc.
  68. ├── dolphinscheduler_env.sh script to load environment variables for worker-server
  69. └── logback-spring.xml configurations of worker-service log
  70. └── libs directory of worker-server libs
  71. └── ui directory of front-end web resources

Configurations in Details

dolphinscheduler-daemon.sh [startup or shutdown DolphinScheduler application]

dolphinscheduler-daemon.sh is responsible for DolphinScheduler startup and shutdown. Essentially, start-all.sh or stop-all.sh startup and shutdown the cluster via dolphinscheduler-daemon.sh. Currently, DolphinScheduler just makes a basic config, remember to config further JVM options based on your practical situation of resources.

Default simplified parameters are:

  1. export DOLPHINSCHEDULER_OPTS="
  2. -server
  3. -Xmx16g
  4. -Xms1g
  5. -Xss512k
  6. -XX:+UseConcMarkSweepGC
  7. -XX:+CMSParallelRemarkEnabled
  8. -XX:+UseFastAccessorMethods
  9. -XX:+UseCMSInitiatingOccupancyOnly
  10. -XX:CMSInitiatingOccupancyFraction=70
  11. "

“-XX:DisableExplicitGC” is not recommended due to may lead to memory link (DolphinScheduler dependent on Netty to communicate).

DolphinScheduler uses Spring Hikari to manage database connections, configuration file location:

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml
Worker Serverworker-server/conf/application.yaml
Alert Serveralert-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault valueDescription
spring.datasource.driver-class-nameorg.postgresql.Driverdatasource driver
spring.datasource.urljdbc:postgresql://127.0.0.1:5432/dolphinschedulerdatasource connection url
spring.datasource.usernamerootdatasource username
spring.datasource.passwordrootdatasource password
spring.datasource.hikari.connection-test-queryselect 1validate connection by running the SQL
spring.datasource.hikari.minimum-idle5minimum connection pool size number
spring.datasource.hikari.auto-committruewhether auto commit
spring.datasource.hikari.pool-nameDolphinSchedulername of the connection pool
spring.datasource.hikari.maximum-pool-size50maximum connection pool size number
spring.datasource.hikari.connection-timeout30000connection timeout
spring.datasource.hikari.idle-timeout600000Maximum idle connection survival time
spring.datasource.hikari.leak-detection-threshold0Connection leak detection threshold
spring.datasource.hikari.initialization-fail-timeout1Connection pool initialization failed timeout

Note that DolphinScheduler also supports database configuration through bin/env/dolphinscheduler_env.sh.

DolphinScheduler uses Zookeeper for cluster management, fault tolerance, event monitoring and other functions. Configuration file location:

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml
Worker Serverworker-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault valueDescription
registry.zookeeper.namespacedolphinschedulernamespace of zookeeper
registry.zookeeper.connect-stringlocalhost:2181the connection string of zookeeper
registry.zookeeper.retry-policy.base-sleep-time60mstime to wait between subsequent retries
registry.zookeeper.retry-policy.max-sleep300msmaximum time to wait between subsequent retries
registry.zookeeper.retry-policy.max-retries5maximum retry times
registry.zookeeper.session-timeout30ssession timeout
registry.zookeeper.connection-timeout30sconnection timeout
registry.zookeeper.block-until-connected600mswaiting time to block until the connection succeeds
registry.zookeeper.digest{username}:{password}digest of zookeeper to access znode, works only when acl is enabled, for more details please check https://zookeeper.apache.org/doc/r3.4.14/zookeeperAdmin.html

Note that DolphinScheduler also supports zookeeper related configuration through bin/env/dolphinscheduler_env.sh.

common.properties [hadoop、s3、yarn config properties]

Currently, common.properties mainly configures Hadoop,s3a related configurations.

ParametersDefault valueDescription
data.basedir.path/tmp/dolphinschedulerlocal directory used to store temp files
resource.storage.typeNONEtype of resource files: HDFS, S3, NONE
resource.storage.upload.base.path/dolphinschedulerstorage path of resource files
resource.aws.access.key.idminioadminaccess key id of S3
resource.aws.secret.access.keyminioadminsecret access key of S3
resource.aws.regionus-east-1region of S3
resource.aws.s3.bucket.namedolphinschedulerbucket name of S3
resource.aws.s3.endpointhttp://minio:9000endpoint of S3
resource.hdfs.root.userhdfsconfigure users with corresponding permissions if storage type is HDFS
resource.hdfs.fs.defaultFShdfs://mycluster:8020If resource.storage.type=S3, then the request url would be similar to ‘s3a://dolphinscheduler’. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into ‘conf’ directory
hadoop.security.authentication.startup.statefalsewhether hadoop grant kerberos permission
java.security.krb5.conf.path/opt/krb5.confkerberos config directory
login.user.keytab.usernamehdfs-mycluster@ESZ.COMkerberos username
login.user.keytab.path/opt/hdfs.headless.keytabkerberos user keytab
kerberos.expire.time2kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.idsspecify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.addresshttp://ds1:8088/ws/v1/cluster/apps/%skeep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
dolphinscheduler.env.pathenv/dolphinscheduler_env.shload environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME …]
development.statefalsespecify whether in development state
task.resource.limit.statefalsespecify whether in resource limit state

application-api.properties [API-service log config]

ServiceConfiguration file
Master Servermaster-server/conf/common.properties
Api Serverapi-server/conf/common.properties
Worker Serverworker-server/conf/common.properties
Alert Serveralert-server/conf/common.properties

The default configuration is as follows:

ParametersDefault valueDescription
data.basedir.path/tmp/dolphinschedulerlocal directory used to store temp files
resource.storage.typeNONEtype of resource files: HDFS, S3, NONE
resource.upload.path/dolphinschedulerstorage path of resource files
aws.access.key.idminioadminaccess key id of S3
aws.secret.access.keyminioadminsecret access key of S3
aws.regionus-east-1region of S3
aws.s3.endpointhttp://minio:9000endpoint of S3
hdfs.root.userhdfsconfigure users with corresponding permissions if storage type is HDFS
fs.defaultFShdfs://mycluster:8020If resource.storage.type=S3, then the request url would be similar to ‘s3a://dolphinscheduler’. Otherwise if resource.storage.type=HDFS and hadoop supports HA, copy core-site.xml and hdfs-site.xml into ‘conf’ directory
hadoop.security.authentication.startup.statefalsewhether hadoop grant kerberos permission
java.security.krb5.conf.path/opt/krb5.confkerberos config directory
login.user.keytab.usernamehdfs-mycluster@ESZ.COMkerberos username
login.user.keytab.path/opt/hdfs.headless.keytabkerberos user keytab
kerberos.expire.time2kerberos expire time,integer,the unit is hour
yarn.resourcemanager.ha.rm.ids192.168.xx.xx,192.168.xx.xxspecify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.addresshttp://ds1:8088/ws/v1/cluster/apps/%skeep default if ResourceManager supports HA or not use ResourceManager, or replace ds1 with corresponding hostname if ResourceManager in standalone mode
development.statefalsespecify whether in development state
dolphin.scheduler.network.interface.preferredNONEdisplay name of the network card
dolphin.scheduler.network.priority.strategydefaultIP acquisition strategy, give priority to finding the internal network or the external network
resource.manager.httpaddress.port8088the port of resource manager
yarn.job.history.status.addresshttp://ds1:19888/ws/v1/history/mapreduce/jobs/%sjob history status url of yarn
datasource.encryption.enablefalsewhether to enable datasource encryption
datasource.encryption.salt!@#$%^&*the salt of the datasource encryption
data-quality.jar.namedolphinscheduler-data-quality-dev-SNAPSHOT.jarthe jar of data quality
support.hive.oneSessionfalsespecify whether hive SQL is executed in the same session
sudo.enabletruewhether to enable sudo
alert.rpc.port50052the RPC port of Alert Server
zeppelin.rest.urlhttp://localhost:8080the RESTful API url of zeppelin

Location: api-server/conf/application.yaml

ParametersDefault valueDescription
server.port12345api service communication port
server.servlet.session.timeout120msession timeout
server.servlet.context-path/dolphinscheduler/request path
spring.servlet.multipart.max-file-size1024MBmaximum file size
spring.servlet.multipart.max-request-size1024MBmaximum request size
server.jetty.max-http-post-size5000000jetty maximum post size
spring.banner.charsetUTF-8message encoding
spring.jackson.time-zoneUTCtime zone
spring.jackson.date-format“yyyy-MM-dd HH:mm:ss”time format
spring.messages.basenamei18n/messagesi18n config
security.authentication.typePASSWORDauthentication type
security.authentication.ldap.user.adminread-only-adminadmin user account when you log-in with LDAP
security.authentication.ldap.urlsldap://ldap.forumsys.com:389/LDAP urls
security.authentication.ldap.base.dndc=example,dc=comLDAP base dn
security.authentication.ldap.usernamecn=read-only-admin,dc=example,dc=comLDAP username
security.authentication.ldap.passwordpasswordLDAP password
security.authentication.ldap.user.identity.attributeuidLDAP user identity attribute
security.authentication.ldap.user.email.attributemailLDAP user email attribute
traffic.control.global.switchfalsetraffic control global switch
traffic.control.max-global-qps-rate300global max request number per second
traffic.control.tenant-switchfalsetraffic control tenant switch
traffic.control.default-tenant-qps-rate10default tenant max request number per second
traffic.control.customize-tenant-qps-ratecustomize tenant max request number per second

Location: master-server/conf/application.yaml

ParametersDefault valueDescription
master.listen-port5678master listen port
master.fetch-command-num10the number of commands fetched by master
master.pre-exec-threads10master prepare execute thread number to limit handle commands in parallel
master.exec-threads100master execute thread number to limit process instances in parallel
master.dispatch-task-number3master dispatch task number per batch
master.host-selectorlower_weightmaster host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight
master.heartbeat-interval10master heartbeat interval, the unit is second
master.task-commit-retry-times5master commit task retry times
master.task-commit-interval1000master commit task interval, the unit is millisecond
master.state-wheel-interval5time to check status
master.max-cpu-load-avg-1master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved-memory0.3master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
master.failover-interval10failover interval, the unit is minute
master.kill-yarn-job-when-task-failovertruewhether to kill yarn job when failover taskInstance

Location: worker-server/conf/application.yaml

ParametersDefault valueDescription
worker.listen-port1234worker-service listen port
worker.exec-threads100worker-service execute thread number, used to limit the number of task instances in parallel
worker.heartbeat-interval10worker-service heartbeat interval, the unit is second
worker.host-weight100worker host weight to dispatch tasks
worker.tenant-auto-createtruetenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
worker.max-cpu-load-avg-1worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
worker.reserved-memory0.3worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groupsdefaultworker groups separated by comma, e.g., ‘worker.groups=default,test’
worker will join corresponding group according to this config when startup
worker.alert-listen-hostlocalhostthe alert listen host of worker
worker.alert-listen-port50052the alert listen port of worker

Location: alert-server/conf/application.yaml

ParametersDefault valueDescription
server.port50053the port of Alert Server
alert.port50052the port of alert

This part describes quartz configs and configure them based on your practical situation and resources.

ServiceConfiguration file
Master Servermaster-server/conf/application.yaml
Api Serverapi-server/conf/application.yaml

The default configuration is as follows:

ParametersDefault value
spring.quartz.properties.org.quartz.threadPool.threadPriority5
spring.quartz.properties.org.quartz.jobStore.isClusteredtrue
spring.quartz.properties.org.quartz.jobStore.classorg.quartz.impl.jdbcjobstore.JobStoreTX
spring.quartz.properties.org.quartz.scheduler.instanceIdAUTO
spring.quartz.properties.org.quartz.jobStore.tablePrefixQRTZ_
spring.quartz.properties.org.quartz.jobStore.acquireTriggersWithinLocktrue
spring.quartz.properties.org.quartz.scheduler.instanceNameDolphinScheduler
spring.quartz.properties.org.quartz.threadPool.classorg.quartz.simpl.SimpleThreadPool
spring.quartz.properties.org.quartz.jobStore.usePropertiesfalse
spring.quartz.properties.org.quartz.threadPool.makeThreadsDaemonstrue
spring.quartz.properties.org.quartz.threadPool.threadCount25
spring.quartz.properties.org.quartz.jobStore.misfireThreshold60000
spring.quartz.properties.org.quartz.scheduler.makeSchedulerThreadDaemontrue
spring.quartz.properties.org.quartz.jobStore.driverDelegateClassorg.quartz.impl.jdbcjobstore.PostgreSQLDelegate
spring.quartz.properties.org.quartz.jobStore.clusterCheckinInterval5000

dolphinscheduler_env.sh [load environment variables configs]

When using shell to commit tasks, DolphinScheduler will export environment variables from bin/env/dolphinscheduler_env.sh. The mainly configuration including JAVA_HOME, mata database, registry center, and task configuration.

  1. # JAVA_HOME, will use it to start DolphinScheduler server
  2. export JAVA_HOME=${JAVA_HOME:-/opt/soft/java}
  3. # Database related configuration, set database type, username and password
  4. export DATABASE=${DATABASE:-postgresql}
  5. export SPRING_PROFILES_ACTIVE=${DATABASE}
  6. export SPRING_DATASOURCE_URL
  7. export SPRING_DATASOURCE_USERNAME
  8. export SPRING_DATASOURCE_PASSWORD
  9. # DolphinScheduler server related configuration
  10. export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
  11. export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
  12. export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
  13. # Registry center configuration, determines the type and link of the registry center
  14. export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
  15. export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
  16. # Tasks related configurations, need to change the configuration if you use the related tasks.
  17. export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
  18. export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
  19. export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
  20. export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
  21. export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
  22. export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
  23. export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
  24. export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
  25. export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$PATH
ServiceConfiguration file
Master Servermaster-server/conf/logback-spring.xml
Api Serverapi-server/conf/logback-spring.xml
Worker Serverworker-server/conf/logback-spring.xml
Alert Serveralert-server/conf/logback-spring.xml