Replication cluster status

Node status outputs, among other information, cluster status variables.

The output format is cluster_name_variable_name variable_value. Most of them are described in Galera Documentation Status Variables. Additionally we display:

  • cluster_name - name of the cluster
  • node_state - current state of the node: closed, destroyed, joining, donor, synced
  • indexes_count - number of tables managed by the cluster
  • indexes - list of table names managed by the cluster
  • nodes_set - list of nodes in the cluster defined with cluster CREATE, JOIN or ALTER UPDATE commands
  • nodes_view - actual list of nodes in cluster which this node sees
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java

SQL JSON PHP Python javascript Java

  1. SHOW STATUS
  1. POST /cli -d "
  2. SHOW STATUS
  3. "
  1. $params = [
  2. 'body' => []
  3. ];
  4. $response = $client->nodes()->status($params);
  1. utilsApi.sql('SHOW STATUS')
  1. res = await utilsApi.sql('SHOW STATUS');
  1. utilsApi.sql("SHOW STATUS");

Response

  1. +----------------------------+-------------------------------------------------------------------------------------+
  2. | Counter | Value |
  3. +----------------------------+-------------------------------------------------------------------------------------+
  4. | cluster_name | post |
  5. | cluster_post_state_uuid | fba97c45-36df-11e9-a84e-eb09d14b8ea7 |
  6. | cluster_post_conf_id | 1 |
  7. | cluster_post_status | primary |
  8. | cluster_post_size | 5 |
  9. | cluster_post_local_index | 0 |
  10. | cluster_post_node_state | synced |
  11. | cluster_post_indexes_count | 2 |
  12. | cluster_post_indexes | pq1,pq_posts |
  13. | cluster_post_nodes_set | 10.10.0.1:9312 |
  14. | cluster_post_nodes_view | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
  1. "
  2. {"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
  3. "data":[
  4. {"Counter":"cluster_name", "Value":"post"},
  5. {"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
  6. {"Counter":"cluster_post_conf_id", "Value":"1"},
  7. {"Counter":"cluster_post_status", "Value":"primary"},
  8. {"Counter":"cluster_post_size", "Value":"5"},
  9. {"Counter":"cluster_post_local_index", "Value":"0"},
  10. {"Counter":"cluster_post_node_state", "Value":"synced"},
  11. {"Counter":"cluster_post_indexes_count", "Value":"2"},
  12. {"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
  13. {"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
  14. {"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}
  15. ],
  16. "total":0,
  17. "error":"",
  18. "warning":""
  19. }
  1. (
  2. "cluster_name" => "post",
  3. "cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
  4. "cluster_post_conf_id" => 1,
  5. "cluster_post_status" => "primary",
  6. "cluster_post_size" => 5,
  7. "cluster_post_local_index" => 0,
  8. "cluster_post_node_state" => "synced",
  9. "cluster_post_indexes_count" => 2,
  10. "cluster_post_indexes" => "pq1,pq_posts",
  11. "cluster_post_nodes_set" => "10.10.0.1:9312",
  12. "cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"
  13. )
  1. {u'columns': [{u'Key': {u'type': u'string'}},
  2. {u'Value': {u'type': u'string'}}],
  3. u'data': [
  4. {u'Key': u'cluster_name', u'Value': u'post'},
  5. {u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
  6. {u'Key': u'cluster_post_conf_id', u'Value': u'1'},
  7. {u'Key': u'cluster_post_status', u'Value': u'primary'},
  8. {u'Key': u'cluster_post_size', u'Value': u'5'},
  9. {u'Key': u'cluster_post_local_index', u'Value': u'0'},
  10. {u'Key': u'cluster_post_node_state', u'Value': u'synced'},
  11. {u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
  12. {u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
  13. {u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
  14. {u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'}],
  15. u'error': u'',
  16. u'total': 0,
  17. u'warning': u''}
  1. {"columns": [{"Key": {"type": "string"}},
  2. {"Value": {"type": "string"}}],
  3. "data": [
  4. {"Key": "cluster_name", "Value": "post"},
  5. {"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
  6. {"Key": "cluster_post_conf_id", "Value": "1"},
  7. {"Key": "cluster_post_status", "Value": "primary"},
  8. {"Key": "cluster_post_size", "Value": "5"},
  9. {"Key": "cluster_post_local_index", "Value": "0"},
  10. {"Key": "cluster_post_node_state", "Value": "synced"},
  11. {"Key": "cluster_post_indexes_count", "Value": "2"},
  12. {"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
  13. {"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
  14. {"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}],
  15. "error": "",
  16. "total": 0,
  17. "warning": ""}
  1. {columns=[{ Key : { type=string }},
  2. { Value : { type=string }}],
  3. data : [
  4. { Key=cluster_name, Value=post},
  5. { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
  6. { Key=cluster_post_conf_id, Value=1},
  7. { Key=cluster_post_status, Value=primary},
  8. { Key=cluster_post_size, Value=5},
  9. { Key=cluster_post_local_index, Value=0},
  10. { Key=cluster_post_node_state, Value=synced},
  11. { Key=cluster_post_indexes_count, Value=2},
  12. { Key=cluster_post_indexes, Value=pq1,pq_posts},
  13. { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
  14. { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication}],
  15. error= ,
  16. total=0,
  17. warning= }

Restarting a cluster

A multi-master replication cluster requires its single node to be started as a reference point before all the other nodes join it and form a cluster. This is called cluster bootstrapping which introduces a primary component before others see that as a reference point to sync up the data from. The restart of a single node or reconnecting from a node after a shutdown can be done as usual.

After the whole cluster shutdown the server that was stopped last should be started first with --new-cluster command line option (or run manticore_new_cluster to start it via systemd). To make sure that the server is able to start as a reference point the grastate.dat file located at the cluster path should be updated with the value of 1 for safe_to_bootstrap option. I.e., both conditions, --new-cluster and safe_to_bootstrap=1, must be satisfied. An attempt to start any other node without these options set will trigger an error. To override this protection and start cluster from another server forcibly --new-cluster-force command line option may be used. You can alternatively run manticore_new_cluster --force to use systemd for that.

In case of a hard crash or an unclean shutdown of all the servers in the cluster you need to identify the most advanced node that has the largest seqno in the grastate.dat file located at the cluster path and start that server with the command line key --new-cluster-force.

Cluster recovery

There may be cases when Mantcore search daemon gets stopped with no node in the cluster left to serve requests. In these cases someone needs to recover the cluster or a part of it. Due to multi-master nature of Galera library used for replication, Manticore replication cluster constitutes one logical entity and takes care about each of its nodes and node’s data consistency and keeps the cluster’s status as a whole. This enables safe writes on multiple nodes at the same time and maintains cluster integrity unlike a traditional asynchronous replication.

However it comes with its challenges. Below let’s look at what will happen in this or that case. For example let’s take a cluster of nodes A, B and C and consider scenarios where some or all nodes get out of service and what one has to do to bring them back.

Case 1

Node A is stopped as usual. The other nodes receive “normal shutdown” message from node A. The cluster size is reduced and a quorum re-calculation is issued.

After node A gets started as usual, it joins the cluster nodes. Node A will not serve any write transaction until the join is complete and it’s fully synchronized with the cluster. If a writeset cache on donor node B or C (which can be controlled with a Galera cluster’s option gcache.size) still has all transactions missed at node A, node A will receive a fast incremental state transfer (IST), that is, a transfer of only missed transactions. Otherwise, a snapshot state transfer (SST) will start, that is, a transfer of table files.

Case 2

Nodes A and B are stopped as usual. That is the same situation as in the previous case but the cluster’s size is reduced to 1 and node C itself forms a primary component that allows it to handle write transactions.

Nodes A and B may be started as usual and will join the cluster after the start. Node C becomes a “donor” and provides the transfer of the state to nodes A and B.

Case 3

All nodes are stopped as usual and the cluster is off.

The problem now is how to initialize the cluster. It’s important that on a clean shutdown of searchd the nodes write the number of last executed transaction into the cluster directory grastate.dat file along with flag safe_to_bootstrap. The node which was stopped last will have option safe_to_bootstrap: 1 and the most advanced seqno number.

It is important that this node starts first to form the cluster. To bootstrap a cluster the server should be started on this node with flag --new-cluster. On Linux you can also run manticore_new_cluster which will start Manticore in --new-cluster mode via systemd.

If another node starts first and bootstraps the cluster, then the most advanced node joins that cluster, performs full SST and receives a table file where some transactions are missed in comparison with the table files it got before. That is why it is important to start first the node which was shut down last, it should have flag safe_to_bootstrap: 1 in grastate.dat.

Case 4

Node A disappears from the cluster due to a crash or a network failure.

Nodes B and C try to reconnect to the missed node A and after a failure remove node A from the cluster. The cluster quorum is valid as 2 out of 3 nodes are running and the cluster keep working as usual.

After node A is restarted it will join the cluster automatically the same way as in Case 1.

Case 5

Nodes A and B disappear. Node C is not able to form the quorum alone as 1 node is less than 1.5 (half of 3). So the cluster on node C is switched to non-primary state and node C rejects any write transactions with an error message.

Meanwhile, the single node C is waiting for other nodes to connect and try to connect to them itself. If it happens, after the network is restored and nodes A and B are running again, the cluster will be formed again automatically. If nodes A and B are just cut from node C, but they can still reach each other, they keep working as usual because they still form the quorum.

However, if both nodes A and B crashed or restarted due to power outage, someone should turn on primary component on the C node with the following statement:

  • SQL
  • JSON

SQL JSON

  1. SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
  1. POST /cli -d "
  2. SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
  3. "

But before doing that you need to make sure that the other nodes are really unreachable, otherwise split-brain happens and separate clusters get formed.

Case 6

All nodes crashed. In this case the grastate.dat file in the cluster directory is not updated and does not contain a valid sequence number seqno.

If this happens, someone should find the most advanced node and start the server on it with the --new-cluster-force command line key. All other nodes will start as usual as in Case 3). On Linux you can also run manticore_new_cluster --force. It will start Manticore in --new-cluster-force mode via systemd.

Case 7

Split-brain causes the cluster to get into non-primary state. For example, the cluster consists of even number of nodes (four), e.g. two couple of nodes located in different datacenters, and network failure interrupts the connection between the datacenters. Split-brain happens as each group of nodes has exactly half of the quorum. The both groups stop handling write transactions as Galera replication model cares about data consistency and the cluster cannot accept write transactions without quorum. But nodes in the both groups try to re-connect to the nodes from the other group to restore the cluster.

If someone wants to restore the cluster without network got restored the same steps as in Case 5 should be done, but only at one group of the nodes.

After that, the group with the node we run this statement at can successfully handle write transactions again.

  • SQL
  • JSON

SQL JSON

  1. SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
  1. POST /cli -d "
  2. SET CLUSTER posts GLOBAL 'pc.bootstrap' = 1
  3. "

However, we want to notice that if the statement gets issued at both groups it will result in two separate clusters, so the following network recovery will not make the groups to rejoin.