Resource Center Configuration

  • You could use Resource Center to upload text files, UDFs and other task-related files.
  • You could configure Resource Center to use distributed file system like Hadoop (2.6+), MinIO cluster or remote storage products like AWS S3, Alibaba Cloud OSS, etc.
  • You could configure Resource Center to use local file system. If you deploy DolphinScheduler in Standalone mode, you could configure it to use local file system for Resouce Center without the need of an external HDFS system or S3.
  • Furthermore, if you deploy DolphinScheduler in Cluster mode, you could use S3FS-FUSE to mount S3 or JINDO-FUSE to mount OSS to your machines and use the local file system for Resouce Center. In this way, you could operate remote files as if on your local machines.

Use Local File System

Configure common.properties

If you deploy DolphinScheduler in Cluster or Pseudo-Cluster mode, you need to configure api-server/conf/common.properties and worker-server/conf/common.properties. If you deploy DolphinScheduler in Standalone mode, you only need to configure standalone-server/conf/common.properties as follows:

  • Change resource.storage.upload.base.path to your local directory path. Please make sure the tenant resource.hdfs.root.user has read and write permissions for resource.storage.upload.base.path, e,g. /tmp/dolphinscheduler. DolphinScheduler will create the directory you configure if it does not exist.
  • Modify resource.storage.type=HDFS and resource.hdfs.fs.defaultFS=file:///.

NOTE: Please modify the value of resource.storage.upload.base.path if you do not want to use the default value as the base path.

HDFS Resource Configuration

When it is necessary to use the Resource Center to create or upload relevant files, all files and resources will be stored on HDFS. Therefore the following configuration is required.

Configuring the common.properties

After version 3.0.0-alpha, if you want to upload resources to Resource Center connected to HDFS or S3, you need to configure api-server/conf/common.properties and worker-server/conf/common.properties.

  1. #
  2. # Licensed to the Apache Software Foundation (ASF) under one or more
  3. # contributor license agreements. See the NOTICE file distributed with
  4. # this work for additional information regarding copyright ownership.
  5. # The ASF licenses this file to You under the Apache License, Version 2.0
  6. # (the "License"); you may not use this file except in compliance with
  7. # the License. You may obtain a copy of the License at
  8. #
  9. # http://www.apache.org/licenses/LICENSE-2.0
  10. #
  11. # Unless required by applicable law or agreed to in writing, software
  12. # distributed under the License is distributed on an "AS IS" BASIS,
  13. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. # See the License for the specific language governing permissions and
  15. # limitations under the License.
  16. #
  17. # user data local directory path, please make sure the directory exists and have read write permissions
  18. data.basedir.path=/tmp/dolphinscheduler
  19. # resource view suffixs
  20. #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
  21. # resource storage type: HDFS, S3, NONE
  22. resource.storage.type=NONE
  23. # resource store on HDFS/S3 path, resource file will store to this base path, self configuration, please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
  24. resource.storage.upload.base.path=/tmp/dolphinscheduler
  25. # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  26. resource.aws.access.key.id=minioadmin
  27. # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  28. resource.aws.secret.access.key=minioadmin
  29. # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  30. resource.aws.region=cn-north-1
  31. # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
  32. resource.aws.s3.bucket.name=dolphinscheduler
  33. # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
  34. resource.aws.s3.endpoint=http://localhost:9000
  35. # resource store on HDFS/S3 path, resource file will store to this hadoop hdfs path, self configuration,
  36. # please make sure the directory exists on hdfs and have read write permissions. "/dolphinscheduler" is recommended
  37. resource.storage.upload.base.path=/tmp/dolphinscheduler
  38. # The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  39. resource.aws.access.key.id=minioadmin
  40. # The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  41. resource.aws.secret.access.key=minioadmin
  42. # The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
  43. resource.aws.region=cn-north-1
  44. # The name of the bucket. You need to create them by yourself. Otherwise, the system cannot start. All buckets in Amazon S3 share a single namespace; ensure the bucket is given a unique name.
  45. resource.aws.s3.bucket.name=dolphinscheduler
  46. # You need to set this parameter when private cloud s3. If S3 uses public cloud, you only need to set resource.aws.region or set to the endpoint of a public cloud such as S3.cn-north-1.amazonaws.com.cn
  47. resource.aws.s3.endpoint=http://localhost:9000
  48. # if resource.storage.type=HDFS, the user must have the permission to create directories under the HDFS root path
  49. resource.hdfs.root.user=root
  50. # if resource.storage.type=S3, the value like: s3a://dolphinscheduler;
  51. # if resource.storage.type=HDFS and namenode HA is enabled, you need to copy core-site.xml and hdfs-site.xml to conf dir
  52. resource.hdfs.fs.defaultFS=hdfs://localhost:8020
  53. # whether to startup kerberos
  54. hadoop.security.authentication.startup.state=false
  55. # java.security.krb5.conf path
  56. java.security.krb5.conf.path=/opt/krb5.conf
  57. # login user from keytab username
  58. login.user.keytab.username=hdfs-mycluster@ESZ.COM
  59. # login user from keytab path
  60. login.user.keytab.path=/opt/hdfs.headless.keytab
  61. # kerberos expire time, the unit is hour
  62. kerberos.expire.time=2
  63. # resource view suffixs
  64. #resource.view.suffixs=txt,log,sh,bat,conf,cfg,py,java,sql,xml,hql,properties,json,yml,yaml,ini,js
  65. # resourcemanager port, the default value is 8088 if not specified
  66. resource.manager.httpaddress.port=8088
  67. # if resourcemanager HA is enabled, please set the HA IPs; if resourcemanager is single, keep this value empty
  68. yarn.resourcemanager.ha.rm.ids=192.168.xx.xx,192.168.xx.xx
  69. # if resourcemanager HA is enabled or not use resourcemanager, please keep the default value; If resourcemanager is single, you only need to replace ds1 to actual resourcemanager hostname
  70. yarn.application.status.address=http://ds1:%s/ws/v1/cluster/apps/%s
  71. # job history status url when application number threshold is reached(default 10000, maybe it was set to 1000)
  72. yarn.job.history.status.address=http://ds1:19888/ws/v1/history/mapreduce/jobs/%s
  73. # datasource encryption enable
  74. datasource.encryption.enable=false
  75. # datasource encryption salt
  76. datasource.encryption.salt=!@#$%^&*
  77. # data quality option
  78. data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
  79. #data-quality.error.output.path=/tmp/data-quality-error-data
  80. # Network IP gets priority, default inner outer
  81. # Whether hive SQL is executed in the same session
  82. support.hive.oneSession=false
  83. # use sudo or not, if set true, executing user is tenant user and deploy user needs sudo permissions; if set false, executing user is the deploy user and doesn't need sudo permissions
  84. sudo.enable=true
  85. # network interface preferred like eth0, default: empty
  86. #dolphin.scheduler.network.interface.preferred=
  87. # network IP gets priority, default: inner outer
  88. #dolphin.scheduler.network.priority.strategy=default
  89. # system env path
  90. #dolphinscheduler.env.path=dolphinscheduler_env.sh
  91. # development state
  92. development.state=false
  93. # rpc port
  94. alert.rpc.port=50052
  95. # Url endpoint for zeppelin RESTful API
  96. zeppelin.rest.url=http://localhost:8080
  97. # set path of conda.sh
  98. conda.path=/opt/anaconda3/etc/profile.d/conda.sh
  99. # Task resource limit state
  100. task.resource.limit.state=false

Note:

  • If only the api-server/conf/common.properties file is configured, then resource uploading is enabled, but you can not use resources in task. If you want to use or execute the files in the workflow you need to configure worker-server/conf/common.properties too.
  • If you want to use the resource upload function, the deployment user in installation and deployment must have relevant operation authority.
  • If you using a Hadoop cluster with HA, you need to enable HDFS resource upload, and you need to copy the core-site.xml and hdfs-site.xml under the Hadoop cluster to worker-server/conf and api-server/conf, otherwise skip this copy step.