检查办法

通过openGauss提供的gs_check工具可以开展openGauss健康状态检查。

注意事项

  • 扩容新节点检查只能在root用户下执行,其他场景都必须在omm用户下执行。
  • 必须指定-i或-e参数,-i会检查指定的单项,-e会检查对应场景配置中的多项。
  • 如果-i参数中不包含root类检查项或-e场景配置列表中没有root类检查项,则不需要交互输入root权限的用户及其密码。
  • 可使用–skip-root-items跳过检查项中包含的root类检查,以免需要输入root权限用户及密码。
  • 检查扩容新节点与现有节点之间的一致性,在现有节点执行gs_check命令指定–hosts参数进行检查,其中hosts文件中需要写入新节点ip。

操作步骤

方式1:

  1. 以操作系统用户omm登录数据库主节点。
  2. 执行如下命令对openGauss数据库状态进行检查。

    1. gs_check -i CheckClusterState

    其中,-i指定检查项,注意区分大小写。格式:-i CheckClusterState、-i CheckCPU或-i CheckClusterState,CheckCPU。

    取值范围为所有支持的检查项名称,详细列表请参见《openGauss 工具参考》中“服务端工具 > gs_checkos > openGauss状态检查表”,用户可以根据需求自己编写新检查项。

方式2:

  1. 以操作系统用户omm登录数据库主节点。
  2. 执行如下命令对openGauss数据库进行健康检查。

    1. gs_check -e inspect

    其中,-e指定场景名,注意区分大小写。格式:-e inspect或-e upgrade。

    取值范围为所有支持的巡检场景名称,默认列表包括:inspect(例行巡检)、upgrade(升级前巡检)、expand(扩容前巡检)、binary_upgrade(就地升级前巡检)、health(健康检查巡检),用户可以根据需求自己编写场景。

方式3:

  1. 以操作系统用户omm登录数据库主节点。
  2. 将巡检工具gs_check及相关目录inspection拷贝分发到所有扩容新节点。
  3. 将扩容新节点ip写到文件_ipListFile_中,以换行符进行分隔。
  4. 执行如下命令对扩容新节点进行扩容前检查。

    1. gs_check -e expand_new_node --hosts ipListFile

    -e必须为expand_new_node,即扩容前新节点检查。

openGauss巡检的主要作用是在openGauss运行过程中,检查整个openGauss状态是否正常,或者重大操作前(升级、扩容),确保openGauss满足操作所需的环境条件和状态条件。详细的巡检项目和场景请参见《openGauss 工具参考》中“服务端工具 > gs_checkos > openGauss状态检查表”。

示例

执行单项检查结果:

  1. perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU
  2. Parsing the check items config file successfully
  3. Distribute the context file to remote hosts successfully
  4. Start to health check for the cluster. Total Items:1 Nodes:3
  5. Checking... [=========================] 1/1
  6. Start to analysis the check result
  7. CheckCPU....................................OK
  8. The item run on 3 nodes. success: 3
  9. Analysis the check result successfully
  10. Success. All check items run completed. Total:1 Success:1 Failed:0
  11. For more information please refer to /opt/huawei/wisequery/script/gspylib/inspection/output/CheckReport_201902193704661604.tar.gz

本地执行结果:

  1. perfadm@lfgp000700749:/opt/huawei/perfadm/tool/script> gs_check -i CheckCPU -L
  2. 2017-12-29 17:09:29 [NAM] CheckCPU
  3. 2017-12-29 17:09:29 [STD] 检查主机CPU占用率,如果idle 大于30%并且iowait 小于 30%.则检查项通过,否则检查项不通过
  4. 2017-12-29 17:09:29 [RST] OK
  5. 2017-12-29 17:09:29 [RAW]
  6. Linux 4.4.21-69-default (lfgp000700749) 12/29/17 _x86_64_
  7. 17:09:24 CPU %user %nice %system %iowait %steal %idle
  8. 17:09:25 all 0.25 0.00 0.25 0.00 0.00 99.50
  9. 17:09:26 all 0.25 0.00 0.13 0.00 0.00 99.62
  10. 17:09:27 all 0.25 0.00 0.25 0.13 0.00 99.37
  11. 17:09:28 all 0.38 0.00 0.25 0.00 0.13 99.25
  12. 17:09:29 all 1.00 0.00 0.88 0.00 0.00 98.12
  13. Average: all 0.43 0.00 0.35 0.03 0.03 99.17

执行场景检查结果:

  1. [perfadm@SIA1000131072 Check]$ gs_check -e inspect
  2. Parsing the check items config file successfully
  3. The below items require root privileges to execute:[CheckBlockdev CheckIOrequestqueue CheckIOConfigure CheckCheckMultiQueue CheckFirewall CheckSshdService CheckSshdConfig CheckCrondService CheckNoCheckSum CheckSctpSeProcMemory CheckBootItems CheckFilehandle CheckNICModel CheckDropCache]
  4. Please enter root privileges user[root]:root
  5. Please enter password for user[root]:
  6. Please enter password for user[root] on the node[10.244.57.240]:
  7. Check root password connection successfully
  8. Distribute the context file to remote hosts successfully
  9. Start to health check for the cluster. Total Items:59 Nodes:2
  10. Checking... [ ] 21/59
  11. Checking... [=========================] 59/59
  12. Start to analysis the check result
  13. CheckClusterState...........................OK
  14. The item run on 2 nodes. success: 2
  15. CheckDBParams...............................OK
  16. The item run on 1 nodes. success: 1
  17. CheckDebugSwitch............................OK
  18. The item run on 2 nodes. success: 2
  19. CheckDirPermissions.........................OK
  20. The item run on 2 nodes. success: 2
  21. CheckReadonlyMode...........................OK
  22. The item run on 1 nodes. success: 1
  23. CheckEnvProfile.............................OK
  24. The item run on 2 nodes. success: 2 (consistent)
  25. The success on all nodes value:
  26. GAUSSHOME /usr1/gaussdb/app
  27. LD_LIBRARY_PATH /usr1/gaussdb/app/lib
  28. PATH /usr1/gaussdb/app/bin
  29. CheckBlockdev...............................OK
  30. The item run on 2 nodes. success: 2
  31. CheckCurConnCount...........................OK
  32. The item run on 1 nodes. success: 1
  33. CheckCursorNum..............................OK
  34. The item run on 1 nodes. success: 1
  35. CheckPgxcgroup..............................OK
  36. The item run on 1 nodes. success: 1
  37. CheckDiskFormat.............................OK
  38. The item run on 2 nodes. success: 2
  39. CheckSpaceUsage.............................OK
  40. The item run on 2 nodes. success: 2
  41. CheckInodeUsage.............................OK
  42. The item run on 2 nodes. success: 2
  43. CheckSwapMemory.............................OK
  44. The item run on 2 nodes. success: 2
  45. CheckLogicalBlock...........................OK
  46. The item run on 2 nodes. success: 2
  47. CheckIOrequestqueue.....................WARNING
  48. The item run on 2 nodes. warning: 2
  49. The warning[host240,host157] value:
  50. On device (vdb) 'IO Request' RealValue '256' ExpectedValue '32768'
  51. On device (vda) 'IO Request' RealValue '256' ExpectedValue '32768'
  52. CheckMaxAsyIOrequests.......................OK
  53. The item run on 2 nodes. success: 2
  54. CheckIOConfigure............................OK
  55. The item run on 2 nodes. success: 2
  56. CheckMTU....................................OK
  57. The item run on 2 nodes. success: 2 (consistent)
  58. The success on all nodes value:
  59. 1500
  60. CheckPing...................................OK
  61. The item run on 2 nodes. success: 2
  62. CheckRXTX...................................NG
  63. The item run on 2 nodes. ng: 2
  64. The ng[host240,host157] value:
  65. NetWork[eth0]
  66. RX: 256
  67. TX: 256
  68. CheckNetWorkDrop............................OK
  69. The item run on 2 nodes. success: 2
  70. CheckMultiQueue.............................OK
  71. The item run on 2 nodes. success: 2
  72. CheckEncoding...............................OK
  73. The item run on 2 nodes. success: 2 (consistent)
  74. The success on all nodes value:
  75. LANG=en_US.UTF-8
  76. CheckFirewall...............................OK
  77. The item run on 2 nodes. success: 2
  78. CheckKernelVer..............................OK
  79. The item run on 2 nodes. success: 2 (consistent)
  80. The success on all nodes value:
  81. 3.10.0-957.el7.x86_64
  82. CheckMaxHandle..............................OK
  83. The item run on 2 nodes. success: 2
  84. CheckNTPD...................................OK
  85. host240: NTPD service is running, 2020-06-02 17:00:28
  86. host157: NTPD service is running, 2020-06-02 17:00:06
  87. CheckOSVer..................................OK
  88. host240: The current OS is centos 7.6 64bit.
  89. host157: The current OS is centos 7.6 64bit.
  90. CheckSysParams..........................WARNING
  91. The item run on 2 nodes. warning: 2
  92. The warning[host240,host157] value:
  93. Warning reason: variable 'net.ipv4.tcp_retries1' RealValue '3' ExpectedValue '5'.
  94. Warning reason: variable 'net.ipv4.tcp_syn_retries' RealValue '6' ExpectedValue '5'.
  95. Warning reason: variable 'net.sctp.path_max_retrans' RealValue '5' ExpectedValue '10'.
  96. Warning reason: variable 'net.sctp.max_init_retransmits' RealValue '8' ExpectedValue '10'.
  97. CheckTHP....................................OK
  98. The item run on 2 nodes. success: 2
  99. CheckTimeZone...............................OK
  100. The item run on 2 nodes. success: 2 (consistent)
  101. The success on all nodes value:
  102. +0800
  103. CheckCPU....................................OK
  104. The item run on 2 nodes. success: 2
  105. CheckSshdService............................OK
  106. The item run on 2 nodes. success: 2
  107. CheckSshdConfig.........................WARNING
  108. The item run on 2 nodes. warning: 2
  109. The warning[host240,host157] value:
  110. Warning reason: UseDNS parameter is not set; expected: no
  111. CheckCrondService...........................OK
  112. The item run on 2 nodes. success: 2
  113. CheckStack..................................OK
  114. The item run on 2 nodes. success: 2 (consistent)
  115. The success on all nodes value:
  116. 8192
  117. CheckNoCheckSum.............................OK
  118. The item run on 2 nodes. success: 2 (consistent)
  119. The success on all nodes value:
  120. Nochecksum value is N,Check items pass.
  121. CheckSysPortRange...........................OK
  122. The item run on 2 nodes. success: 2
  123. CheckMemInfo................................OK
  124. The item run on 2 nodes. success: 2 (consistent)
  125. The success on all nodes value:
  126. totalMem: 31.260929107666016G
  127. CheckHyperThread............................OK
  128. The item run on 2 nodes. success: 2
  129. CheckTableSpace.............................OK
  130. The item run on 1 nodes. success: 1
  131. CheckSctpService............................OK
  132. The item run on 2 nodes. success: 2
  133. CheckSysadminUser...........................OK
  134. The item run on 1 nodes. success: 1
  135. CheckGUCConsistent..........................OK
  136. All DN instance guc value is consistent.
  137. CheckMaxProcMemory..........................OK
  138. The item run on 1 nodes. success: 1
  139. CheckBootItems..............................OK
  140. The item run on 2 nodes. success: 2
  141. CheckHashIndex..............................OK
  142. The item run on 1 nodes. success: 1
  143. CheckPgxcRedistb............................OK
  144. The item run on 1 nodes. success: 1
  145. CheckNodeGroupName..........................OK
  146. The item run on 1 nodes. success: 1
  147. CheckTDDate.................................OK
  148. The item run on 1 nodes. success: 1
  149. CheckDilateSysTab...........................OK
  150. The item run on 1 nodes. success: 1
  151. CheckKeyProAdj..............................OK
  152. The item run on 2 nodes. success: 2
  153. CheckProStartTime.......................WARNING
  154. host157:
  155. STARTED COMMAND
  156. Tue Jun 2 16:57:18 2020 /usr1/dmuser/dmserver/metricdb1/server/bin/gaussdb --single_node -D /usr1/dmuser/dmb1/data -p 22204
  157. Mon Jun 1 16:15:15 2020 /usr1/gaussdb/app/bin/gaussdb -D /usr1/gaussdb/data/dn1 -M standby
  158. CheckFilehandle.............................OK
  159. The item run on 2 nodes. success: 2
  160. CheckRouting................................OK
  161. The item run on 2 nodes. success: 2
  162. CheckNICModel...............................OK
  163. The item run on 2 nodes. success: 2 (consistent)
  164. The success on all nodes value:
  165. version: 1.0.0
  166. model: Red Hat, Inc. Virtio network device
  167. CheckDropCache..........................WARNING
  168. The item run on 2 nodes. warning: 2
  169. The warning[host240,host157] value:
  170. No DropCache process is running
  171. CheckMpprcFile..............................NG
  172. The item run on 2 nodes. ng: 2
  173. The ng[host240,host157] value:
  174. There is no mpprc file
  175. Analysis the check result successfully
  176. Failed. All check items run completed. Total:59 Success:52 Warning:5 NG:2
  177. For more information please refer to /usr1/gaussdb/tool/script/gspylib/inspection/output/CheckReport_inspect611.tar.gz