Preface

This document explains the DolphinScheduler application configurations according to DolphinScheduler-1.3.x versions.

Directory Structure

Currently, all the configuration files are under [conf ] directory. Please check the following simplified DolphinScheduler installation directories to have a direct view about the position [conf] directory in and configuration files inside. This document only describes DolphinScheduler configurations and other modules are not going into.

[Note: the DolphinScheduler (hereinafter called the ‘DS’) .]

  1. ├─bin DS application commands directory
  2. ├─dolphinscheduler-daemon.sh startup/shutdown DS application
  3. ├─start-all.sh A startup all DS services with configurations
  4. ├─stop-all.sh shutdown all DS services with configurations
  5. ├─conf configurations directory
  6. ├─application-api.properties API-service config properties
  7. ├─datasource.properties datasource config properties
  8. ├─registry.properties registry config properties
  9. ├─master.properties master config properties
  10. ├─worker.properties worker config properties
  11. ├─quartz.properties quartz config properties
  12. ├─common.properties common-service[storage] config properties
  13. ├─alert.properties alert-service config properties
  14. ├─config environment variables config directory
  15. ├─install_config.conf DS environment variables configuration script[install/start DS]
  16. ├─env load environment variables configs script directory
  17. ├─dolphinscheduler_env.sh load environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME ...]
  18. ├─org mybatis mapper files directory
  19. ├─i18n i18n configs directory
  20. ├─logback-api.xml API-service log config
  21. ├─logback-master.xml master-service log config
  22. ├─logback-worker.xml worker-service log config
  23. ├─logback-alert.xml alert-service log config
  24. ├─sql DS metadata to create/upgrade .sql directory
  25. ├─create create SQL scripts directory
  26. ├─upgrade upgrade SQL scripts directory
  27. ├─dolphinscheduler_postgre.sql postgre database init script
  28. ├─dolphinscheduler_mysql.sql mysql database init script
  29. ├─soft_version current DS version-id file
  30. ├─script DS services deployment, database create/upgrade scripts directory
  31. ├─create-dolphinscheduler.sh DS database init script
  32. ├─upgrade-dolphinscheduler.sh DS database upgrade script
  33. ├─monitor-server.sh DS monitor-server start script
  34. ├─scp-hosts.sh transfer installation files script
  35. ├─remove-zk-node.sh cleanup zookeeper caches script
  36. ├─ui front-end web resources directory
  37. ├─lib DS .jar dependencies directory
  38. ├─install.sh auto-setup DS services script

Configurations in Details

serial numberservice classificationconfig file
1startup/shutdown DS applicationdolphinscheduler-daemon.sh
2datasource config propertiesdatasource.properties
3registry config propertiesregistry.properties
4common-service[storage] config propertiescommon.properties
5API-service config propertiesapplication-api.properties
6master config propertiesmaster.properties
7worker config propertiesworker.properties
8alert-service config propertiesalert.properties
9quartz config propertiesquartz.properties
10DS environment variables configuration script[install/start DS]install_config.conf
11load environment variables configs
[eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME …]
dolphinscheduler_env.sh
12services log config filesAPI-service log config : logback-api.xml
master-service log config : logback-master.xml
worker-service log config : logback-worker.xml
alert-service log config : logback-alert.xml

1.dolphinscheduler-daemon.sh [startup/shutdown DS application]

dolphinscheduler-daemon.sh is responsible for DS startup & shutdown. Essentially, start-all.sh/stop-all.sh startup/shutdown the cluster via dolphinscheduler-daemon.sh. Currently, DS just makes a basic config, please config further JVM options based on your practical situation of resources.

Default simplified parameters are:

  1. export DOLPHINSCHEDULER_OPTS="
  2. -server
  3. -Xmx16g
  4. -Xms1g
  5. -Xss512k
  6. -XX:+UseConcMarkSweepGC
  7. -XX:+CMSParallelRemarkEnabled
  8. -XX:+UseFastAccessorMethods
  9. -XX:+UseCMSInitiatingOccupancyOnly
  10. -XX:CMSInitiatingOccupancyFraction=70
  11. "

“-XX:DisableExplicitGC” is not recommended due to may lead to memory link (DS dependent on Netty to communicate).

2.datasource.properties [datasource config properties]

DS uses Druid to manage database connections and default simplified configs are:

ParametersDefault valueDescription
spring.datasource.driver-class-namedatasource driver
spring.datasource.urldatasource connection url
spring.datasource.usernamedatasource username
spring.datasource.passworddatasource password
spring.datasource.initialSize5initail connection pool size number
spring.datasource.minIdle5minimum connection pool size number
spring.datasource.maxActive5maximum connection pool size number
spring.datasource.maxWait60000max wait mili-seconds
spring.datasource.timeBetweenEvictionRunsMillis60000idle connection check interval
spring.datasource.timeBetweenConnectErrorMillis60000retry interval
spring.datasource.minEvictableIdleTimeMillis300000connections over minEvictableIdleTimeMillis will be collect when idle check
spring.datasource.validationQuerySELECT 1validate connection by running the SQL
spring.datasource.validationQueryTimeout3validate connection timeout[seconds]
spring.datasource.testWhileIdletrueset whether the pool validates the allocated connection when a new connection request comes
spring.datasource.testOnBorrowtruevalidity check when the program requests a new connection
spring.datasource.testOnReturnfalsevalidity check when the program recalls a connection
spring.datasource.defaultAutoCommittruewhether auto commit
spring.datasource.keepAlivetrueruns validationQuery SQL to avoid the connection closed by pool when the connection idles over minEvictableIdleTimeMillis
spring.datasource.poolPreparedStatementstrueOpen PSCache
spring.datasource.maxPoolPreparedStatementPerConnectionSize20specify the size of PSCache on each connection

3.registry.properties [registry config properties, default is zookeeper]

ParametersDefault valueDescription
registry.plugin.namezookeeperplugin name
registry.serverslocalhost:2181zookeeper cluster connection info
registry.namespacedolphinschedulerDS is stored under zookeeper root directory(Start without /)
registry.base.sleep.time.ms60time to wait between subsequent retries
registry.max.sleep.ms300maximum time to wait between subsequent retries
registry.max.retries5maximum retry times
registry.session.timeout.ms30000session timeout
registry.connection.timeout.ms7500connection timeout

4.common.properties [hadoop、s3、yarn config properties]

Currently, common.properties mainly configures hadoop/s3a related configurations.

ParametersDefault valueDescription
data.basedir.path/tmp/dolphinschedulerlocal directory used to store temp files
resource.storage.typeNONEtype of resource files: HDFS, S3, NONE
resource.upload.path/dolphinschedulerstorage path of resource files
hadoop.security.authentication.startup.statefalsewhether hadoop grant kerberos permission
java.security.krb5.conf.path/opt/krb5.confkerberos config directory
login.user.keytab.usernamehdfs-mycluster@ESZ.COMkerberos username
login.user.keytab.path/opt/hdfs.headless.keytabkerberos user keytab
kerberos.expire.time2kerberos expire time,integer,the unit is hour
resource.view.suffixstxt,log,sh,conf,cfg,py,java,sql,hql,xml,propertiesfile types supported by resource center
hdfs.root.userhdfsconfigure users with corresponding permissions if storage type is HDFS
fs.defaultFShdfs://mycluster:8020If resource.storage.type=S3, then the request url would be similar to ‘s3a://dolphinscheduler’. Otherwise if resource.storage.type=HDFS and hadoop supports HA, please copy core-site.xml and hdfs-site.xml into ‘conf’ directory
fs.s3a.endpoints3 endpoint url
fs.s3a.access.keys3 access key
fs.s3a.secret.keys3 secret key
yarn.resourcemanager.ha.rm.idsspecify the yarn resourcemanager url. if resourcemanager supports HA, input HA IP addresses (separated by comma), or input null for standalone
yarn.application.status.addresshttp://ds1:8088/ws/v1/cluster/apps/%skeep default if resourcemanager supports HA or not use resourcemanager. Or replace ds1 with corresponding hostname if resourcemanager in standalone mode
dolphinscheduler.env.pathenv/dolphinscheduler_env.shload environment variables configs [eg: JAVA_HOME,HADOOP_HOME, HIVE_HOME …]
development.statefalsespecify whether in development state

5.application-api.properties [API-service log config]

ParametersDefault valueDescription
server.port12345api service communication port
server.servlet.session.timeout7200session timeout
server.servlet.context-path/dolphinschedulerrequest path
spring.servlet.multipart.max-file-size1024MBmaximum file size
spring.servlet.multipart.max-request-size1024MBmaximum request size
server.jetty.max-http-post-size5000000jetty maximum post size
spring.messages.encodingUTF-8message encoding
spring.jackson.time-zoneGMT+8time zone
spring.messages.basenamei18n/messagesi18n config
security.authentication.typePASSWORDauthentication type

6.master.properties [master-service log config]

ParametersDefault valueDescription
master.listen.port5678master listen port
master.exec.threads100master execute thread number to limit process instances in parallel
master.exec.task.num20master execute task number in parallel per process instance
master.dispatch.task.num3master dispatch task number per batch
master.host.selectorLowerWeightmaster host selector to select a suitable worker, default value: LowerWeight. Optional values include Random, RoundRobin, LowerWeight
master.heartbeat.interval10master heartbeat interval, the unit is second
master.task.commit.retryTimes5master commit task retry times
master.task.commit.interval1000master commit task interval, the unit is millisecond
master.max.cpuload.avg-1master max CPU load avg, only higher than the system CPU load average, master server can schedule. default value -1: the number of CPU cores * 2
master.reserved.memory0.3master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G

7.worker.properties [worker-service log config]

ParametersDefault valueDescription
worker.listen.port1234worker listen port
worker.exec.threads100worker execute thread number to limit task instances in parallel
worker.heartbeat.interval10worker heartbeat interval, the unit is second
worker.max.cpuload.avg-1worker max CPU load avg, only higher than the system CPU load average, worker server can be dispatched tasks. default value -1: the number of CPU cores * 2
worker.reserved.memory0.3worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
worker.groupsdefaultworker groups separated by comma, like ‘worker.groups=default,test’
worker will join corresponding group according to this config when startup

8.alert.properties [alert-service log config]

ParametersDefault valueDescription
alert.typeEMAILalter type
mail.protocolSMTPmail server protocol
mail.server.hostxxx.xxx.commail server host
mail.server.port25mail server port
mail.senderxxx@xxx.commail sender email
mail.userxxx@xxx.commail sender email name
mail.passwd111111mail sender email password
mail.smtp.starttls.enabletruespecify mail whether open tls
mail.smtp.ssl.enablefalsespecify mail whether open ssl
mail.smtp.ssl.trustxxx.xxx.comspecify mail ssl trust list
xls.file.path/tmp/xlsmail attachment temp storage directory
following configure WeCom[optional]
enterprise.wechat.enablefalsespecify whether enable WeCom
enterprise.wechat.corp.idxxxxxxxWeCom corp id
enterprise.wechat.secretxxxxxxxWeCom secret
enterprise.wechat.agent.idxxxxxxxWeCom agent id
enterprise.wechat.usersxxxxxxxWeCom users
enterprise.wechat.token.urlhttps://qyapi.weixin.qq.com/cgi-bin/gettoken?
corpid=corpId&corpsecret=secret
WeCom token url
enterprise.wechat.push.urlhttps://qyapi.weixin.qq.com/cgi-bin/message/send?
access_token=$token
WeCom push url
enterprise.wechat.user.send.msgsend message format
enterprise.wechat.team.send.msggroup message format
plugin.dir/Users/xx/your/path/to/plugin/dirplugin directory

9.quartz.properties [quartz config properties]

This part describes quartz configs and please configure them based on your practical situation and resources.

ParametersDefault valueDescription
org.quartz.jobStore.driverDelegateClassorg.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.driverDelegateClassorg.quartz.impl.jdbcjobstore.PostgreSQLDelegate
org.quartz.scheduler.instanceNameDolphinScheduler
org.quartz.scheduler.instanceIdAUTO
org.quartz.scheduler.makeSchedulerThreadDaemontrue
org.quartz.jobStore.usePropertiesfalse
org.quartz.threadPool.classorg.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.makeThreadsDaemonstrue
org.quartz.threadPool.threadCount25
org.quartz.threadPool.threadPriority5
org.quartz.jobStore.classorg.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.tablePrefixQRTZ_
org.quartz.jobStore.isClusteredtrue
org.quartz.jobStore.misfireThreshold60000
org.quartz.jobStore.clusterCheckinInterval5000
org.quartz.jobStore.acquireTriggersWithinLocktrue
org.quartz.jobStore.dataSourcemyDs
org.quartz.dataSource.myDs.connectionProvider.classorg.apache.dolphinscheduler.service.quartz.DruidConnectionProvider

10.install_config.conf [DS environment variables configuration script[install/start DS]]

install_config.conf is a bit complicated and is mainly used in the following two places.

  • 1.DS cluster auto installation

System will load configs in the install_config.conf and auto-configure files below, based on the file content when executing ‘install.sh‘. Files such as dolphinscheduler-daemon.sh、datasource.properties、registry.properties、common.properties、application-api.properties、master.properties、worker.properties、alert.properties、quartz.properties and etc.

  • 2.Startup/shutdown DS cluster

The system will load masters, workers, alertServer, apiServers and other parameters inside the file to startup/shutdown DS cluster.

File content as follows:

  1. # Note: please escape the character if the file contains special characters such as `.*[]^${}\+?|()@#&`.
  2. # eg: `[` escape to `\[`
  3. # Database type (DS currently only supports PostgreSQL and MySQL)
  4. dbtype="mysql"
  5. # Database url & port
  6. dbhost="192.168.xx.xx:3306"
  7. # Database name
  8. dbname="dolphinscheduler"
  9. # Database username
  10. username="xx"
  11. # Database password
  12. password="xx"
  13. # Zookeeper url
  14. zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
  15. # DS installation path, such as '/data1_1T/dolphinscheduler'
  16. installPath="/data1_1T/dolphinscheduler"
  17. # Deployment user
  18. # Note: Deployment user needs 'sudo' privilege and has rights to operate HDFS
  19. # Root directory must be created by the same user if using HDFS, otherwise permission related issues will be raised.
  20. deployUser="dolphinscheduler"
  21. # Followings are alert-service configs
  22. # Mail server host
  23. mailServerHost="smtp.exmail.qq.com"
  24. # Mail server port
  25. mailServerPort="25"
  26. # Mail sender
  27. mailSender="xxxxxxxxxx"
  28. # Mail user
  29. mailUser="xxxxxxxxxx"
  30. # Mail password
  31. mailPassword="xxxxxxxxxx"
  32. # Mail supports TLS set true if not set false
  33. starttlsEnable="true"
  34. # Mail supports SSL set true if not set false. Note: starttlsEnable and sslEnable cannot both set true
  35. sslEnable="false"
  36. # Mail server host, same as mailServerHost
  37. sslTrust="smtp.exmail.qq.com"
  38. # Specify which resource upload function to use for resources storage such as sql files. And supported options are HDFS, S3 and NONE. HDFS for upload to HDFS and NONE for not using this function.
  39. resourceStorageType="NONE"
  40. # if S3, write S3 address. HA, for example: s3a://dolphinscheduler,
  41. # Note: s3 make sure to create the root directory /dolphinscheduler
  42. defaultFS="hdfs://mycluster:8020"
  43. # If parameter 'resourceStorageType' is S3, following configs are needed:
  44. s3Endpoint="http://192.168.xx.xx:9010"
  45. s3AccessKey="xxxxxxxxxx"
  46. s3SecretKey="xxxxxxxxxx"
  47. # If ResourceManager supports HA, then input master and standby node IP or hostname, eg: '192.168.xx.xx,192.168.xx.xx'. Or else ResourceManager run in standalone mode, please set yarnHaIps="" and "" for not using yarn.
  48. yarnHaIps="192.168.xx.xx,192.168.xx.xx"
  49. # If ResourceManager runs in standalone, then set ResourceManager node ip or hostname, or else remain default.
  50. singleYarnIp="yarnIp1"
  51. # Storage path when using HDFS/S3
  52. resourceUploadPath="/dolphinscheduler"
  53. # HDFS/S3 root user
  54. hdfsRootUser="hdfs"
  55. # Followings are Kerberos configs
  56. # Spicify Kerberos enable or not
  57. kerberosStartUp="false"
  58. # Kdc krb5 config file path
  59. krb5ConfPath="$installPath/conf/krb5.conf"
  60. # Keytab username
  61. keytabUserName="hdfs-mycluster@ESZ.COM"
  62. # Username keytab path
  63. keytabPath="$installPath/conf/hdfs.headless.keytab"
  64. # API-service port
  65. apiServerPort="12345"
  66. # All hosts deploy DS
  67. ips="ds1,ds2,ds3,ds4,ds5"
  68. # Ssh port, default 22
  69. sshPort="22"
  70. # Master service hosts
  71. masters="ds1,ds2"
  72. # All hosts deploy worker service
  73. # Note: Each worker needs to set a worker group name and default name is "default"
  74. workers="ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"
  75. # Host deploy alert-service
  76. alertServer="ds3"
  77. # Host deploy API-service
  78. apiServers="ds1"

11.dolphinscheduler_env.sh [load environment variables configs]

When using shell to commit tasks, DS will load environment variables inside dolphinscheduler_env.sh into the host. Types of tasks involved are: Shell task、Python task、Spark task、Flink task、Datax task and etc.

  1. export HADOOP_HOME=/opt/soft/hadoop
  2. export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
  3. export SPARK_HOME1=/opt/soft/spark1
  4. export SPARK_HOME2=/opt/soft/spark2
  5. export PYTHON_HOME=/opt/soft/python
  6. export JAVA_HOME=/opt/soft/java
  7. export HIVE_HOME=/opt/soft/hive
  8. export FLINK_HOME=/opt/soft/flink
  9. export DATAX_HOME=/opt/soft/datax/bin/datax.py
  10. export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH

12. Services logback configs

Services namelogback config name
API-service logback configlogback-api.xml
master-service logback configlogback-master.xml
worker-service logback configlogback-worker.xml
alert-service logback configlogback-alert.xml