Starting and stopping MatrixOne distributed Cluster

This document will introduce how to start and stop the MatrixOne distributed cluster.

The upgraded environment introduced in this document will be based on the environment of MatrixOne Distributed Cluster Deployment.

Cluster Shutdown

To shut down a MatrixOne cluster, stop the read-write operations and then shut down the servers directly. The shutdown sequence is as follows: first, shut down the node0 node, then shut down the master0 node, and finally, shut down the Kuboard-Spray node.

Cluster Restart

To restart a MatrixOne cluster, following the following hardware startup sequence is recommended: start the Kuboard-Spray node, then start the master0 node, and finally start the node0 node.

After the hardware startup is complete, k8s will automatically recover. At the same time, MatrixOne and MinIO-related services will be automatically restored without manual intervention. However, it should be noted that the Docker of the Kuboard-Spray node will not be automatically restored, and the Kuboard-Spray service needs to be manually started.

Check K8s Status

On the master0 node that operates k8s, you can check the status of the k8s cluster nodes.

In normal circumstances, the status of all nodes should be Ready. If the status of some nodes is abnormal, further investigation is required to determine the reason.

  1. kubectl get node
  2. # If the status is not "Ready", further investigation into the node's situation is necessary.
  3. # kubectl describe node ${NODE_NAME}

Here is an example of a diagram for codes:

Starting and stopping - 图1

Check MinIO status

On the master0 node operating k8s, you can check the status of MinIO.

After the hardware starts, MinIO will automatically resume. You can use the following command to check the status of MinIO.

  1. NS="mostorage"
  2. kubectl get pod -n${NS}

Here is an example of a diagram for codes:

Starting and stopping - 图2

Check the status of the MatrixOne cluster and components

Check MatrixOneCluster status

First, check whether the MatrixOne cluster is standard. The MatrixOne cluster corresponds to the custom resource type MatrixOneCluster.You can use the following command to check the status of MatrixOneCluster:

  1. MO_NAME="mo"
  2. NS="mo-hn"
  3. kubectl get matrixonecluster -n${NS} ${MO_NAME}

Under normal circumstances, the status should be Ready. If the status is NotReady, further troubleshooting is required.

Here is an example of a diagram for codes:

Starting and stopping - 图3

View MatrixOne cluster status details

If the MatrixOne cluster status is not normal, you can use the following command to view the details:

  1. kubectl describe matrixonecluster -n${NS} ${MO_NAME}

Starting and stopping - 图4

Starting and stopping - 图5

Check the status of DNSet/CNSet/LogSet

The main components of the current MatrixOne cluster are DN, CN, Log Service, and corresponding custom resource types DNSet, CNSet, and LogSet.These objects are generated by the MatrixOneCluster controller.

You can use the following command to check the status of each component; take DN as an example:

  1. SET_TYPE="dnset"
  2. NS="mo-hn"
  3. kubectl get ${SET_TYPE} -n${NS}

Here is an example of a diagram for codes:

Starting and stopping - 图6

Check Pod status

You can directly check the native k8s objects generated in the MO cluster to confirm the cluster’s health.Under normal circumstances, you can confirm the status of the pod:

  1. NS="mo-hn"
  2. kubectl get pod -n${NS}

Here is an example of a diagram for codes:

Starting and stopping - 图7

Generally speaking, the running state is the normal state. However, there are a few exceptions, such as the status is running, but the MO cluster is abnormal, such as the inability to connect to the MO cluster through the MySQL Client. At this time, you can further check whether the Pod’s log has abnormal information output:

  1. NS="mo-hn"
  2. POD_NAME="[The name of the pod returned above]" # For example, mo-tp-cn-3
  3. kubectl logs ${POD_NAME} -n${NS}

If the status is non-running, such as Pending, you can confirm the cause of the exception by looking at the event in the Pod status. For example, because the cluster resources cannot meet the application of mo-tp-cn-3, this pod cannot be scheduled and is pending. In this example, it can be solved by expanding the node’s resources.

  1. kubectl describe pod ${POD_NAME} -n${NS}

Here is an example of a diagram for codes:

Starting and stopping - 图8