DolphinScheduler Expansion and Reduction

1. Expansion

This article describes how to add a new master service or worker service to an existing DolphinScheduler cluster.

  1. Attention: There cannot be more than one master service process or worker service process on a physical machine.
  2. If the physical machine where the expansion master or worker node is located has already installed the scheduled service, skip to [1.4 Modify configuration] Edit the configuration file `conf/config/install_config.conf` on **all ** nodes, add masters or workers parameter, and restart the scheduling cluster.

1.1 Basic software installation (please install the mandatory items yourself)

  • [required] JDK (1.8+):Must be installed, please install and configure JAVA_HOME and PATH variables under /etc/profile
  • [optional] If the expansion is a worker node, you need to consider whether to install an external client, such as Hadoop, Hive, Spark Client.
  1. Attention: DolphinScheduler itself does not depend on Hadoop, Hive, Spark, but will only call their Client for the corresponding task submission.

1.2 Get installation package

  • Check which version of DolphinScheduler is used in your existing environment, and get the installation package of the corresponding version, if the versions are different, there may be compatibility problems.
  • Confirm the unified installation directory of other nodes, this article assumes that DolphinScheduler is installed in /opt/ directory, and the full path is /opt/dolphinscheduler.
  • Please download the corresponding version of the installation package to the server installation directory, uncompress it and rename it to dolphinscheduler and store it in the /opt directory.
  • Add database dependency package, this article uses Mysql database, add mysql-connector-java driver package to /opt/dolphinscheduler/lib directory.
  1. # create the installation directory, please do not create the installation directory in /root, /home and other high privilege directories
  2. mkdir -p /opt
  3. cd /opt
  4. # decompress
  5. tar -zxvf apache-dolphinscheduler-2.0.6-bin.tar.gz -C /opt
  6. cd /opt
  7. mv apache-dolphinscheduler-2.0.6-bin dolphinscheduler
  1. Attention: The installation package can be copied directly from an existing environment to an expanded physical machine for use.

1.3 Create Deployment Users

  • Create deployment users on all expansion machines, and be sure to configure sudo-free. If we plan to deploy scheduling on four expansion machines, ds1, ds2, ds3, and ds4, we first need to create deployment users on each machine
  1. # to create a user, you need to log in with root and set the deployment user name, please modify it yourself, later take dolphinscheduler as an example
  2. useradd dolphinscheduler;
  3. # set the user password, please change it by yourself, later take dolphinscheduler123 as an example
  4. echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
  5. # configure sudo password-free
  6. echo 'dolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
  7. sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
  1. Attention:
  2. - Since it is sudo -u {linux-user} to switch between different Linux users to run multi-tenant jobs, the deploying user needs to have sudo privileges and be password free.
  3. - If you find the line "Default requiretty" in the /etc/sudoers file, please also comment it out.
  4. - If resource uploads are used, you also need to assign read and write permissions to the deployment user on `HDFS or MinIO`.

1.4 Modify configuration

  • From an existing node such as Master/Worker, copy the conf directory directly to replace the conf directory in the new node. After copying, check if the configuration items are correct.

    1. Highlights:
    2. datasource.properties: database connection information
    3. zookeeper.properties: information for connecting zk
    4. common.properties: Configuration information about the resource store (if hadoop is set up, please check if the core-site.xml and hdfs-site.xml configuration files exist).
    5. env/dolphinscheduler_env.sh: environment Variables
  • Modify the dolphinscheduler_env.sh environment variable in the conf/env directory according to the machine configuration (take the example that the software used is installed in /opt/soft)

    1. export HADOOP_HOME=/opt/soft/hadoop
    2. export HADOOP_CONF_DIR=/opt/soft/hadoop/etc/hadoop
    3. # export SPARK_HOME1=/opt/soft/spark1
    4. export SPARK_HOME2=/opt/soft/spark2
    5. export PYTHON_HOME=/opt/soft/python
    6. export JAVA_HOME=/opt/soft/jav
    7. export HIVE_HOME=/opt/soft/hive
    8. export FLINK_HOME=/opt/soft/flink
    9. export DATAX_HOME=/opt/soft/datax/bin/datax.py
    10. export PATH=$HADOOP_HOME/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$DATAX_HOME:$PATH

    Attention: This step is very important, such as JAVA_HOME and PATH is necessary to configure, not used can be ignored or commented out

  • Softlink the JDK to /usr/bin/java (still using JAVA_HOME=/opt/soft/java as an example)

    1. sudo ln -s /opt/soft/java/bin/java /usr/bin/java
  • Modify the configuration file conf/config/install_config.conf on the all nodes, synchronizing the following configuration.

    • To add a new master node, you need to modify the ips and masters parameters.
    • To add a new worker node, modify the ips and workers parameters.
  1. # which machines to deploy DS services on, separated by commas between multiple physical machines
  2. ips="ds1,ds2,ds3,ds4"
  3. # ssh port,default 22
  4. sshPort="22"
  5. # which machine the master service is deployed on
  6. masters="existing master01,existing master02,ds1,ds2"
  7. # the worker service is deployed on which machine, and specify the worker belongs to which worker group, the following example of "default" is the group name
  8. workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"
  • If the expansion is for worker nodes, you need to set the worker group. Please refer to the security Worker grouping

  • On all new nodes, change the directory permissions so that the deployment user has access to the dolphinscheduler directory

  1. sudo chown -R dolphinscheduler:dolphinscheduler dolphinscheduler

1.4. Restart the cluster & verify

  • restart the cluster
  1. # stop command:
  2. bin/stop-all.sh # stop all services
  3. sh bin/dolphinscheduler-daemon.sh stop master-server # stop master service
  4. sh bin/dolphinscheduler-daemon.sh stop worker-server # stop worker service
  5. sh bin/dolphinscheduler-daemon.sh stop logger-server # stop logger service
  6. sh bin/dolphinscheduler-daemon.sh stop api-server # stop api service
  7. sh bin/dolphinscheduler-daemon.sh stop alert-server # stop alert service
  8. # start command::
  9. bin/start-all.sh # start all services
  10. sh bin/dolphinscheduler-daemon.sh start master-server # start master service
  11. sh bin/dolphinscheduler-daemon.sh start worker-server # start worker service
  12. sh bin/dolphinscheduler-daemon.sh start logger-server # start logger service
  13. sh bin/dolphinscheduler-daemon.sh start api-server # start api service
  14. sh bin/dolphinscheduler-daemon.sh start alert-server # start alert service
  1. Attention: When using stop-all.sh or stop-all.sh, if the physical machine executing the command is not configured to be ssh-free on all machines, it will prompt for the password
  • After the script is completed, use the jps command to see if each node service is started (jps comes with the Java JDK)
  1. MasterServer ----- master service
  2. WorkerServer ----- worker service
  3. LoggerServer ----- logger service
  4. ApiApplicationServer ----- api service
  5. AlertServer ----- alert service

After successful startup, you can view the logs, which are stored in the logs folder.

  1. logs/
  2. ├── dolphinscheduler-alert-server.log
  3. ├── dolphinscheduler-master-server.log
  4. |—— dolphinscheduler-worker-server.log
  5. |—— dolphinscheduler-api-server.log
  6. |—— dolphinscheduler-logger-server.log

If the above services are started normally and the scheduling system page is normal, check whether there is an expanded Master or Worker service in the [Monitor] of the web system. If it exists, the expansion is complete.


2. Reduction

The reduction is to reduce the master or worker services for the existing DolphinScheduler cluster. There are two steps for shrinking. After performing the following two steps, the shrinking operation can be completed.

2.1 Stop the service on the scaled-down node

  • If you are scaling down the master node, identify the physical machine where the master service is located, and stop the master service on the physical machine.
  • If the worker node is scaled down, determine the physical machine where the worker service is to be scaled down and stop the worker and logger services on the physical machine.
  1. # stop command:
  2. bin/stop-all.sh # stop all services
  3. sh bin/dolphinscheduler-daemon.sh stop master-server # stop master service
  4. sh bin/dolphinscheduler-daemon.sh stop worker-server # stop worker service
  5. sh bin/dolphinscheduler-daemon.sh stop logger-server # stop logger service
  6. sh bin/dolphinscheduler-daemon.sh stop api-server # stop api service
  7. sh bin/dolphinscheduler-daemon.sh stop alert-server # stop alert service
  8. # start command:
  9. bin/start-all.sh # start all services
  10. sh bin/dolphinscheduler-daemon.sh start master-server # start master service
  11. sh bin/dolphinscheduler-daemon.sh start worker-server # start worker service
  12. sh bin/dolphinscheduler-daemon.sh start logger-server # start logger service
  13. sh bin/dolphinscheduler-daemon.sh start api-server # start api service
  14. sh bin/dolphinscheduler-daemon.sh start alert-server # start alert service
  1. Attention: When using stop-all.sh or stop-all.sh, if the machine without the command is not configured to be ssh-free for all machines, it will prompt for the password.
  • After the script is completed, use the jps command to see if each node service was successfully shut down (jps comes with the Java JDK)
  1. MasterServer ----- master service
  2. WorkerServer ----- worker service
  3. LoggerServer ----- logger service
  4. ApiApplicationServer ----- api service
  5. AlertServer ----- alert service

If the corresponding master service or worker service does not exist, then the master/worker service is successfully shut down.

2.2 Modify the configuration file

  • modify the configuration file conf/config/install_config.conf on the all nodes, synchronizing the following configuration.

    • to scale down the master node, modify the ips and masters parameters.
    • to scale down worker nodes, modify the ips and workers parameters.
  1. # which machines to deploy DS services on, "localhost" for this machine
  2. ips="ds1,ds2,ds3,ds4"
  3. # ssh port,default: 22
  4. sshPort="22"
  5. # which machine the master service is deployed on
  6. masters="existing master01,existing master02,ds1,ds2"
  7. # The worker service is deployed on which machine, and specify which worker group this worker belongs to, the following example of "default" is the group name
  8. workers="existing worker01:default,existing worker02:default,ds3:default,ds4:default"