Managing replication nodes

ALTER CLUSTER <cluster_name> UPDATE nodes statement updates node lists on each node of the cluster to include every active node in the cluster. See Joining a cluster for more info on node lists.

  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java

SQL JSON PHP Python javascript Java

  1. ALTER CLUSTER posts UPDATE nodes
  1. POST /cli -d "
  2. ALTER CLUSTER posts UPDATE nodes
  3. "
  1. $params = [
  2. 'cluster' => 'posts',
  3. 'body' => [
  4. 'operation' => 'update',
  5. ]
  6. ];
  7. $response = $client->cluster()->alter($params);
  1. utilsApi.sql('ALTER CLUSTER posts UPDATE nodes')
  1. res = await utilsApi.sql('ALTER CLUSTER posts UPDATE nodes');
  1. utilsApi.sql("ALTER CLUSTER posts UPDATE nodes");

Response

  1. {u'error': u'', u'total': 0, u'warning': u''}
  1. {"total":0,"error":"","warning":""}

For example, when the cluster was initially created, the list of nodes used for rejoining the cluster was 10.10.0.1:9312,10.10.1.1:9312. Since then other nodes joined the cluster and now we have the following active nodes: 10.10.0.1:9312,10.10.1.1:9312,10.15.0.1:9312,10.15.0.3:9312.

But the list of nodes used for rejoining the cluster is still the same. Running the ALTER CLUSTER ... UPDATE nodes copies the list of active nodes to the list of nodes used to rejoin on restart. After this, the list of nodes used on restart includes all the active nodes in the cluster.

Both lists of nodes can be viewed using Cluster status statement (cluster_post_nodes_set and cluster_post_nodes_view).

Removing node from cluster

To remove a node from a replication cluster you need to:

  1. stop the node
  2. remove info about the cluster in <data_dir>/manticore.json (/var/lib/manticore/manticore.json in most cases) on the node you’ve stopped
  3. run ALTER CLUSTER cluster_name UPDATE nodes on any other node

After this the other nodes will forget about the detached node and the node will forget about the cluster. It won’t impact tables neither in the cluster nor on the detached node.

Replication cluster status

Node status outputs, among other information, cluster status variables.

The output format is cluster_name_variable_name variable_value. Most of them are described in Galera Documentation Status Variables. Additionally we display:

  • cluster_name - name of the cluster
  • node_state - current state of the node: closed, destroyed, joining, donor, synced
  • indexes_count - number of tables managed by the cluster
  • indexes - list of table names managed by the cluster
  • nodes_set - list of nodes in the cluster defined with cluster CREATE, JOIN or ALTER UPDATE commands
  • nodes_view - actual list of nodes in cluster which this node sees
  • SQL
  • JSON
  • PHP
  • Python
  • javascript
  • Java

SQL JSON PHP Python javascript Java

  1. SHOW STATUS
  1. POST /cli -d "
  2. SHOW STATUS
  3. "
  1. $params = [
  2. 'body' => []
  3. ];
  4. $response = $client->nodes()->status($params);
  1. utilsApi.sql('SHOW STATUS')
  1. res = await utilsApi.sql('SHOW STATUS');
  1. utilsApi.sql("SHOW STATUS");

Response

  1. +----------------------------+-------------------------------------------------------------------------------------+
  2. | Counter | Value |
  3. +----------------------------+-------------------------------------------------------------------------------------+
  4. | cluster_name | post |
  5. | cluster_post_state_uuid | fba97c45-36df-11e9-a84e-eb09d14b8ea7 |
  6. | cluster_post_conf_id | 1 |
  7. | cluster_post_status | primary |
  8. | cluster_post_size | 5 |
  9. | cluster_post_local_index | 0 |
  10. | cluster_post_node_state | synced |
  11. | cluster_post_indexes_count | 2 |
  12. | cluster_post_indexes | pq1,pq_posts |
  13. | cluster_post_nodes_set | 10.10.0.1:9312 |
  14. | cluster_post_nodes_view | 10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication |
  1. "
  2. {"columns":[{"Counter":{"type":"string"}},{"Value":{"type":"string"}}],
  3. "data":[
  4. {"Counter":"cluster_name", "Value":"post"},
  5. {"Counter":"cluster_post_state_uuid", "Value":"fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
  6. {"Counter":"cluster_post_conf_id", "Value":"1"},
  7. {"Counter":"cluster_post_status", "Value":"primary"},
  8. {"Counter":"cluster_post_size", "Value":"5"},
  9. {"Counter":"cluster_post_local_index", "Value":"0"},
  10. {"Counter":"cluster_post_node_state", "Value":"synced"},
  11. {"Counter":"cluster_post_indexes_count", "Value":"2"},
  12. {"Counter":"cluster_post_indexes", "Value":"pq1,pq_posts"},
  13. {"Counter":"cluster_post_nodes_set", "Value":"10.10.0.1:9312"},
  14. {"Counter":"cluster_post_nodes_view", "Value":"10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}
  15. ],
  16. "total":0,
  17. "error":"",
  18. "warning":""
  19. }
  1. (
  2. "cluster_name" => "post",
  3. "cluster_post_state_uuid" => "fba97c45-36df-11e9-a84e-eb09d14b8ea7",
  4. "cluster_post_conf_id" => 1,
  5. "cluster_post_status" => "primary",
  6. "cluster_post_size" => 5,
  7. "cluster_post_local_index" => 0,
  8. "cluster_post_node_state" => "synced",
  9. "cluster_post_indexes_count" => 2,
  10. "cluster_post_indexes" => "pq1,pq_posts",
  11. "cluster_post_nodes_set" => "10.10.0.1:9312",
  12. "cluster_post_nodes_view" => "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"
  13. )
  1. {u'columns': [{u'Key': {u'type': u'string'}},
  2. {u'Value': {u'type': u'string'}}],
  3. u'data': [
  4. {u'Key': u'cluster_name', u'Value': u'post'},
  5. {u'Key': u'cluster_post_state_uuid', u'Value': u'fba97c45-36df-11e9-a84e-eb09d14b8ea7'},
  6. {u'Key': u'cluster_post_conf_id', u'Value': u'1'},
  7. {u'Key': u'cluster_post_status', u'Value': u'primary'},
  8. {u'Key': u'cluster_post_size', u'Value': u'5'},
  9. {u'Key': u'cluster_post_local_index', u'Value': u'0'},
  10. {u'Key': u'cluster_post_node_state', u'Value': u'synced'},
  11. {u'Key': u'cluster_post_indexes_count', u'Value': u'2'},
  12. {u'Key': u'cluster_post_indexes', u'Value': u'pq1,pq_posts'},
  13. {u'Key': u'cluster_post_nodes_set', u'Value': u'10.10.0.1:9312'},
  14. {u'Key': u'cluster_post_nodes_view', u'Value': u'10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication'}],
  15. u'error': u'',
  16. u'total': 0,
  17. u'warning': u''}
  1. {"columns": [{"Key": {"type": "string"}},
  2. {"Value": {"type": "string"}}],
  3. "data": [
  4. {"Key": "cluster_name", "Value": "post"},
  5. {"Key": "cluster_post_state_uuid", "Value": "fba97c45-36df-11e9-a84e-eb09d14b8ea7"},
  6. {"Key": "cluster_post_conf_id", "Value": "1"},
  7. {"Key": "cluster_post_status", "Value": "primary"},
  8. {"Key": "cluster_post_size", "Value": "5"},
  9. {"Key": "cluster_post_local_index", "Value": "0"},
  10. {"Key": "cluster_post_node_state", "Value": "synced"},
  11. {"Key": "cluster_post_indexes_count", "Value": "2"},
  12. {"Key": "cluster_post_indexes", "Value": "pq1,pq_posts"},
  13. {"Key": "cluster_post_nodes_set", "Value": "10.10.0.1:9312"},
  14. {"Key": "cluster_post_nodes_view", "Value": "10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication"}],
  15. "error": "",
  16. "total": 0,
  17. "warning": ""}
  1. {columns=[{ Key : { type=string }},
  2. { Value : { type=string }}],
  3. data : [
  4. { Key=cluster_name, Value=post},
  5. { Key=cluster_post_state_uuid, Value=fba97c45-36df-11e9-a84e-eb09d14b8ea7},
  6. { Key=cluster_post_conf_id, Value=1},
  7. { Key=cluster_post_status, Value=primary},
  8. { Key=cluster_post_size, Value=5},
  9. { Key=cluster_post_local_index, Value=0},
  10. { Key=cluster_post_node_state, Value=synced},
  11. { Key=cluster_post_indexes_count, Value=2},
  12. { Key=cluster_post_indexes, Value=pq1,pq_posts},
  13. { Key=cluster_post_nodes_set, Value=10.10.0.1:9312},
  14. { Key=cluster_post_nodes_view, Value=10.10.0.1:9312,10.10.0.1:9320:replication,10.10.1.1:9312,10.10.1.1:9320:replication}],
  15. error= ,
  16. total=0,
  17. warning= }

Restarting a cluster

A multi-master replication cluster requires its single node to be started as a reference point before all the other nodes join it and form a cluster. This is called cluster bootstrapping which introduces a primary component before others see that as a reference point to sync up the data from. The restart of a single node or reconnecting from a node after a shutdown can be done as usual.

After the whole cluster shutdown the server that was stopped last should be started first with --new-cluster command line option (or run manticore_new_cluster to start it via systemd). To make sure that the server is able to start as a reference point the grastate.dat file located at the cluster path should be updated with the value of 1 for safe_to_bootstrap option. I.e., both conditions, --new-cluster and safe_to_bootstrap=1, must be satisfied. An attempt to start any other node without these options set will trigger an error. To override this protection and start cluster from another server forcibly --new-cluster-force command line option may be used. You can alternatively run manticore_new_cluster --force to use systemd for that.

In case of a hard crash or an unclean shutdown of all the servers in the cluster you need to identify the most advanced node that has the largest seqno in the grastate.dat file located at the cluster path and start that server with the command line key --new-cluster-force.