服务自动拉起

本文档主要介绍如何配置Doris集群的自动拉起,保证生产环境中出现特殊情况导致服务宕机后未及时拉起服务从而影响到业务的正常运行。

Doris集群必须完全搭建完成后再配置FE和BE的自动拉起服务。

Systemd配置Doris服务

systemd具体使用以及参数解析可以参考这里

sudo 权限控制

在使用 systemd 控制 doris 服务时,需要有 sudo 权限。为了保证最小粒度的 sudo 权限分配,可以将 doris-fe 与 doris-be 服务的 systemd 控制权限分配给指定的非 root 用户。在 visudo 来配置 doris-fe 与 doris-be 的 systemctl 管理权限。

  1. Cmnd_Alias DORISCTL=/usr/bin/systemctl start doris-fe,/usr/bin/systemctl stop doris-fe,/usr/bin/systemctl start doris-be,/usr/bin/systemctl stop doris-be
  2. ## Allow root to run any commands anywhere
  3. root ALL=(ALL) ALL
  4. doris ALL=(ALL) NOPASSWD:DORISCTL

配置步骤

  1. 分别在fe.conf和be.conf中添加 JAVA_HOME变量配置,否则使用systemctl start 将无法启动服务

    1. echo "JAVA_HOME=your_java_home" >> /home/doris/fe/conf/fe.conf
    2. echo "JAVA_HOME=your_java_home" >> /home/doris/be/conf/be.conf
  2. 下载doris-fe.service文件: doris-fe.service

  3. doris-fe.service具体内容如下:

    1. # Licensed to the Apache Software Foundation (ASF) under one
    2. # or more contributor license agreements. See the NOTICE file
    3. # distributed with this work for additional information
    4. # regarding copyright ownership. The ASF licenses this file
    5. # to you under the Apache License, Version 2.0 (the
    6. # "License"); you may not use this file except in compliance
    7. # with the License. You may obtain a copy of the License at
    8. #
    9. # http://www.apache.org/licenses/LICENSE-2.0
    10. #
    11. # Unless required by applicable law or agreed to in writing,
    12. # software distributed under the License is distributed on an
    13. # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    14. # KIND, either express or implied. See the License for the
    15. # specific language governing permissions and limitations
    16. # under the License.
    17. [Unit]
    18. Description=Doris FE
    19. After=network-online.target
    20. Wants=network-online.target
    21. [Service]
    22. Type=forking
    23. User=root
    24. Group=root
    25. LimitCORE=infinity
    26. LimitNOFILE=200000
    27. Restart=on-failure
    28. RestartSec=30
    29. StartLimitInterval=120
    30. StartLimitBurst=3
    31. KillMode=none
    32. ExecStart=/home/doris/fe/bin/start_fe.sh --daemon
    33. ExecStop=/home/doris/fe/bin/stop_fe.sh
    34. [Install]
    35. WantedBy=multi-user.target

注意事项

  • ExecStart、ExecStop根据实际部署的fe的路径进行配置
  1. 下载doris-be.service文件: doris-be.service

  2. doris-be.service具体内容如下:

    1. # Licensed to the Apache Software Foundation (ASF) under one
    2. # or more contributor license agreements. See the NOTICE file
    3. # distributed with this work for additional information
    4. # regarding copyright ownership. The ASF licenses this file
    5. # to you under the Apache License, Version 2.0 (the
    6. # "License"); you may not use this file except in compliance
    7. # with the License. You may obtain a copy of the License at
    8. #
    9. # http://www.apache.org/licenses/LICENSE-2.0
    10. #
    11. # Unless required by applicable law or agreed to in writing,
    12. # software distributed under the License is distributed on an
    13. # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    14. # KIND, either express or implied. See the License for the
    15. # specific language governing permissions and limitations
    16. # under the License.
    17. [Unit]
    18. Description=Doris BE
    19. After=network-online.target
    20. Wants=network-online.target
    21. [Service]
    22. Type=forking
    23. User=root
    24. Group=root
    25. LimitCORE=infinity
    26. LimitNOFILE=200000
    27. Restart=on-failure
    28. RestartSec=30
    29. StartLimitInterval=120
    30. StartLimitBurst=3
    31. KillMode=none
    32. ExecStart=/home/doris/be/bin/start_be.sh --daemon
    33. ExecStop=/home/doris/be/bin/stop_be.sh
    34. [Install]
    35. WantedBy=multi-user.target

注意事项

  • ExecStart、ExecStop根据实际部署的be的路径进行配置
  1. 服务配置

    将doris-fe.service、doris-be.service两个文件放到 /usr/lib/systemd/system 目录下

  2. 设置自启动

    添加或修改配置文件后,需要重新加载

    1. systemctl daemon-reload

    设置自启动,实质就是在 /etc/systemd/system/multi-user.target.wants/ 添加服务文件的链接

    1. systemctl enable doris-fe
    2. systemctl enable doris-be
  3. 服务启动

    1. systemctl start doris-fe
    2. systemctl start doris-be

Supervisor配置Doris服务

Supervisor 具体使用以及参数解析可以参考这里

Supervisor 配置自动拉起可以使用 yum 命令直接安装,也可以通过pip手工安装,pip手工安装流程比较复杂,只展示yum方式部署,手工部署请参考这里进行安装部署。

配置步骤

  1. yum安装supervisor

    1. yum install epel-release
    2. yum install -y supervisor
  2. 启动服务并查看状态

    1. systemctl enable supervisord # 开机自启动
    2. systemctl start supervisord # 启动supervisord服务
    3. systemctl status supervisord # 查看supervisord服务状态
    4. ps -ef|grep supervisord # 查看是否存在supervisord进程
  3. 配置BE进程管理

    1. 修改start_be.sh脚本,去掉最后的 & 符号
    2. vim /path/doris/be/bin/start_be.sh
    3. nohup $LIMIT ${DORIS_HOME}/lib/palo_be "$@" >> $LOG_DIR/be.out 2>&1 </dev/null &
    4. 修改为 nohup $LIMIT ${DORIS_HOME}/lib/palo_be "$@" >> $LOG_DIR/be.out 2>&1 </dev/null

    创建 BE 的 supervisor进程管理配置文件

    1. vim /etc/supervisord.d/doris-be.ini
    2. [program:doris_be]
    3. process_name=%(program_name)s
    4. directory=/path/doris/be/be
    5. command=sh /path/doris/be/bin/start_be.sh
    6. autostart=true
    7. autorestart=true
    8. user=root
    9. numprocs=1
    10. startretries=3
    11. stopasgroup=true
    12. killasgroup=true
    13. startsecs=5
    14. #redirect_stderr = true
    15. #stdout_logfile_maxbytes = 20MB
    16. #stdout_logfile_backups = 10
    17. #stdout_logfile=/var/log/supervisor-palo_be.log
  4. 配置FE进程管理

    1. 修改start_fe.sh脚本,去掉最后的 & 符号
    2. vim /path/doris/fe/bin/start_fe.sh
    3. nohup $LIMIT $JAVA $final_java_opt org.apache.doris.PaloFe ${HELPER} "$@" >> $LOG_DIR/fe.out 2>&1 </dev/null &
    4. 修改为 nohup $LIMIT $JAVA $final_java_opt org.apache.doris.PaloFe ${HELPER} "$@" >> $LOG_DIR/fe.out 2>&1 </dev/null

    创建 FE 的 supervisor进程管理配置文件

    1. vim /etc/supervisord.d/doris-fe.ini
    2. [program:PaloFe]
    3. environment = JAVA_HOME="/path/jdk8"
    4. process_name=PaloFe
    5. directory=/path/doris/fe
    6. command=sh /path/doris/fe/bin/start_fe.sh
    7. autostart=true
    8. autorestart=true
    9. user=root
    10. numprocs=1
    11. startretries=3
    12. stopasgroup=true
    13. killasgroup=true
    14. startsecs=10
    15. #redirect_stderr=true
    16. #stdout_logfile_maxbytes=20MB
    17. #stdout_logfile_backups=10
    18. #stdout_logfile=/var/log/supervisor-PaloFe.log
  5. 配置Broker进程管理

    1. 修改 start_broker.sh 脚本,去掉最后的 & 符号
    2. vim /path/apache_hdfs_broker/bin/start_broker.sh
    3. nohup $LIMIT $JAVA $JAVA_OPTS org.apache.doris.broker.hdfs.BrokerBootstrap "$@" >> $BROKER_LOG_DIR/apache_hdfs_broker.out 2>&1 </dev/null &
    4. 修改为 nohup $LIMIT $JAVA $JAVA_OPTS org.apache.doris.broker.hdfs.BrokerBootstrap "$@" >> $BROKER_LOG_DIR/apache_hdfs_broker.out 2>&1 </dev/null

    创建 Broker 的 supervisor进程管理配置文件

    1. vim /etc/supervisord.d/doris-broker.ini
    2. [program:BrokerBootstrap]
    3. environment = JAVA_HOME="/usr/local/java"
    4. process_name=%(program_name)s
    5. directory=/path/apache_hdfs_broker
    6. command=sh /path/apache_hdfs_broker/bin/start_broker.sh
    7. autostart=true
    8. autorestart=true
    9. user=root
    10. numprocs=1
    11. startretries=3
    12. stopasgroup=true
    13. killasgroup=true
    14. startsecs=5
    15. #redirect_stderr=true
    16. #stdout_logfile_maxbytes=20MB
    17. #stdout_logfile_backups=10
    18. #stdout_logfile=/var/log/supervisor-BrokerBootstrap.log
  6. 首先确定Doris服务是停止状态,然后使用supervisor将Doris自动拉起,然后确定进程是否正常启动

    1. supervisorctl reload # 重新加载Supervisor中的所有配置文件
    2. supervisorctl status # 查看supervisor状态,验证Doris服务进程是否正常启动
    3. 其他命令 :
    4. supervisorctl start all # supervisorctl start 可以开启进程
    5. supervisorctl stop doris-be # 通过supervisorctl stop,停止进程

注意事项:

  • 如果使用 yum 安装的 supervisor 启动报错 : pkg_resources.DistributionNotFound: The ‘supervisor==3.4.0’ distribution was not found
  1. 这个是 python 版本不兼容问题,通过yum命令直接安装的 supervisor 只支持 python2 版本,所以需要将 /usr/bin/supervisord /usr/bin/supervisorctl 中文件内容开头 #!/usr/bin/python 改为 #!/usr/bin/python2 ,前提是要装 python2 版本
  • 如果配置了 supervisor 对 Doris 进程进行自动拉起,此时如果 Doris 出现非正常因素导致BE节点宕机,那么此时本来应该输出到 be.out 中的错误堆栈信息会被supervisor 拦截,需要在 supervisor 的log中查找来进一步分析。