How Traffic Is Switched

How Vitess signals traffic cutover for Reshard and MoveTables

Related Persistent Vitess Objects

As the objects or keys noted below are stored in the topo server and cached locally, the processes involved will refresh their topo data throughout the cutover process. For example, each tablet on the source and target shards that are involved in a VReplication workflow will refresh their topo data multiple times as the state of things transition during the cutover. If we are not able to confirm that all tablets involved in a VReplication worfklow are able to refresh their topo data then the cutover command — e.g. vtctlclient SwitchTraffic — will cancel the operation and return an error indicating which tablet(s) are unhealthy (including for --dry_run executions).

VSchema

A VSchema allows you to describe how data is organized within keyspaces and shards.

Shard Info

The global topo contains one Shard key per keyspace which then contains one key per shard that has been created within the keyspace. For each shard that is healthy there is an attribute is_primary_serving which is set to true. The other shards which have been created but are still not healthy and serving within the keyspace will not have this attribute set. Here is an example shard info record from an unsharded keyspace named commerce (without the --cell flag being passed the global topo base path is used):

  1. $ vtctlclient --server=localhost:15999 TopoCat -- --decode_proto '/keyspaces/commerce/shards/0/Shard'
  2. primary_alias:{cell:"zone1" uid:100} primary_term_start_time:{seconds:1650341417 nanoseconds:374817485} is_primary_serving:true

SrvKeyspace

Each cell has a SrvKeyspace key in the local topo (per cell info) for each keyspace. For each tablet type (e.g. PRIMARY or REPLICA) there is one partitions object. The partitions objects contain all of the current shards in the keyspace. For sharded keyspaces, the tablets which are healthy and serving have a key range specified for that shard.

Also the primary can contain a query_service_disabled attribute which is set to true during resharding cutovers. This tells the primary in that shard to reject any queries made to it, as a signal to vtgate in case vtgate routes queries to this primary during the cutover or before it discovers the new serving graph. Here is an example using the same unsharded commerce keyspace and here we specify the --cell flag so that cell’s topo base path — stored in its CellInfo record in the global topo — is used:

  1. $ vtctlclient --server=localhost:15999 TopoCat -- --decode_proto '/cells/zone1/CellInfo'
  2. server_address:"localhost:2379" root:"/vitess/zone1"
  3. $ vtctlclient --server=localhost:15999 TopoCat -- --decode_proto --cell=zone1 '/keyspaces/commerce/SrvKeyspace'
  4. partitions:{served_type:PRIMARY shard_references:{name:"0"}} partitions:{served_type:REPLICA shard_references:{name:"0"}} partitions:{served_type:RDONLY shard_references:{name:"0"}}

Routing Rules

Routing Rules are stored in the RoutingRules key within the global topo. Routing Rules contain a list of table-specific routes. You can route a table for all or specific tablet types to another table in the same or different keyspace. Here is an example using the same commerce keyspace where we have an active MoveTables workflow to move tables to the customer keyspace but we have not switched any traffic yet:

  1. $ vtctlclient --server=localhost:15999 TopoCat -- --decode_proto '/RoutingRules'
  2. rules:{from_table:"corder@rdonly" to_tables:"commerce.corder"} rules:{from_table:"customer.corder" to_tables:"commerce.corder"} rules:{from_table:"customer.corder@replica" to_tables:"commerce.corder"} rules:{from_table:"customer@rdonly" to_tables:"commerce.customer"} rules:{from_table:"customer.customer@rdonly" to_tables:"commerce.customer"} rules:{from_table:"customer.corder@rdonly" to_tables:"commerce.corder"} rules:{from_table:"customer@replica" to_tables:"commerce.customer"} rules:{from_table:"corder@replica" to_tables:"commerce.corder"} rules:{from_table:"commerce.corder@replica" to_tables:"commerce.corder"} rules:{from_table:"commerce.corder@rdonly" to_tables:"commerce.corder"} rules:{from_table:"commerce.customer@rdonly" to_tables:"commerce.customer"} rules:{from_table:"corder" to_tables:"commerce.corder"} rules:{from_table:"customer.customer@replica" to_tables:"commerce.customer"} rules:{from_table:"commerce.customer@replica" to_tables:"commerce.customer"} rules:{from_table:"customer" to_tables:"commerce.customer"} rules:{from_table:"customer.customer" to_tables:"commerce.customer"}

In practice you would instead typically view the routing rules via the dedicated GetRoutingRules vtctl client command which will return the rules for all keyspaces in the topo.

How VTGate Routes a Query

This section walks through a simplified version of the logic used to determine which keyspace and table vtgate will route a simple query of the form select * from t1 where id = 1 (a read query) or insert into t1 (id, val) values (1,'abc') (a write query).

  1. Check to see if t1 has an appropriate routing rule defined. If so, use the specified target table as an alias for t1.
  2. Locate the keyspace for t1 using the VSchema.
  3. For a non-sharded keyspace locate the appropriate tablet (PRIMARY, by default) from the (cached) SrvKeyspace local (per cell) topo record.
  4. For a sharded keyspace the SrvKeyspace record is used to find the currently active shards. This is done by checking the list of partitions for the specific tablet type selected for the query (PRIMARY, by default, for both reads and writes) and selecting the ones whose query_service_disabled field is not set and whose is_primary_serving value is true.
  5. Finally, based on the VIndex defined for the table from the cached VSchema (stored in the global topo), the shard for the relevant row is computed based on the keyrange to which the id is mapped to using the declared VIndex function/type.

Changes Made to the Topo When Traffic Is Switched

This document outlines the steps involved in the cutover process of MoveTables and Reshard workflows when traffic is switched from the source tables/shards to the target tables/shards. We use the resharding flow provided in the local examples and show the relevant snippets from the topo for each step in the workflow.

Items in italics are topo keys and the following snippet the value of the key

What Happens When a Reshard Is Cutover

For brevity we only show the records for the 80- shard. There will be similar records for the -80 shard.

Before Resharding, After -80/80- Shards Are Created

Only shard 0 has is_primary_serving set to true. The SrvKeyspace record only has references to 0 for both PRIMARY and REPLICA tablet types.

global/keyspaces/customer/shards/0/Shard

  1. primary_alias:{cell:"zone1" uid:200}
  2. primary_term_start_time:{seconds:1627465761 nanoseconds:600070156}
  3. is_primary_serving:true

global/keyspaces/customer/shards/80-/Shard

  1. primary_alias:{cell:"zone1" uid:400}
  2. primary_term_start_time:{seconds:1627465833 nanoseconds:536524508}
  3. key_range:{start:"\x80"}

zone1/keyspaces/customer/SrvKeyspace

  1. partitions:{served_type:PRIMARY shard_references:{name:"0"}}
  2. partitions:{served_type:REPLICA shard_references:{name:"0"}}

After Replica Traffic Is Switched Using SwitchTraffic (Previously Known as SwitchReads)

Shard 0 still has the is_primary_serving set as true. The primary partition is still the same.

The replica partition has the following changes:

  • Two more shard_references for -80 and 80-
  • Key ranges are specified for these shards
  • The key range for shard 0 has been removed
  • query_service_disabled is set to true for shard 0

global/keyspaces/customer/shards/0/Shard

  1. primary_alias:{cell:"zone1" uid:200}
  2. primary_term_start_time:{seconds:1627466189 nanoseconds:587021377}
  3. is_primary_serving:true

global/keyspaces/customer/shards/80-/Shard

  1. primary_alias:{cell:"zone1" uid:400}
  2. primary_term_start_time:{seconds:1627466263 nanoseconds:16201490}
  3. key_range:{start:"\x80"}``

zone1/keyspaces/customer/SrvKeyspace

  1. partitions:{served_type:PRIMARY shard_references:{name:"0"}}
  2. partitions:{served_type:REPLICA
  3. shard_references:{name:"-80" key_range:{end:"\x80"}}
  4. shard_references:{name:"80-" key_range:{start:"\x80"}}
  5. shard_tablet_controls:{name:"0" query_service_disabled:true}
  6. shard_tablet_controls:{name:"-80" key_range:{end:"\x80"}}
  7. shard_tablet_controls:{name:"80-" key_range:{start:"\x80"}}
  8. }

After Primary Traffic Is Switched Using SwitchTraffic (Previously Known as SwitchWrites)

  • is_primary_serving is removed from shard 0
  • is_primary_serving is added to shards -80 and 80-
  • In the primary partition the shards -80 and 80- are added with their associated key ranges
  • In the primary partition the key range for shard 0 is removed
  • The replica partition remains the same as in the previous step

global/keyspaces/customer/shards/0/Shard

  1. primary_alias:{cell:"zone1" uid:200}
  2. primary_term_start_time:{seconds:1627466636 nanoseconds:405646818}

global/keyspaces/customer/shards/80-/Shard

  1. primary_alias:{cell:"zone1" uid:400}
  2. primary_term_start_time:{seconds:1627466710 nanoseconds:579634511}
  3. key_range:{start:"\x80"}
  4. is_primary_serving:true

zone1/keyspaces/customer/SrvKeyspace

  1. partitions:{served_type:PRIMARY
  2. shard_references:{name:"-80" key_range:{end:"\x80"}}
  3. shard_references:{name:"80-"
  4. key_range:{start:"\x80"}}
  5. } {name:"0"}
  6. partitions:{served_type:REPLICA
  7. shard_references:{name:"-80" key_range:{end:"\x80"}}
  8. shard_references:{name:"80-" key_range:{start:"\x80"}}}
  9. shard_tablet_controls:{name:"0" query_service_disabled:true}
  10. shard_tablet_controls:{name:"-80" key_range:{end:"\x80"}}
  11. shard_tablet_controls:{name:"80-" key_range:{start:"\x80"}}
  12. }

What Happens When a MoveTables Workflow Is Cutover

Before MoveTables Is Initiated

The VSchema for the source keyspace contains the table name, so vtgate routes queries to that keyspace.

During MoveTables

Both the source and target now contain the tables and both VSchemas refer to them. However we have routing rules that map the tables for each tablet type from the target keyspace to the source keyspace.

global/RoutingRules

  1. rules:{from_table:"customer" to_tables:"commerce.customer"}
  2. rules:{from_table:"customer.customer" to_tables:"commerce.customer"}
  3. rules:{from_table:"customer@replica" to_tables:"commerce.customer"}
  4. rules:{from_table:"customer.customer@replica" to_tables:"commerce.customer"}

On Switching Replica Traffic to Target

The routing rules for replica targeted reads are updated to map the table on the source to the target.

global/RoutingRules

  1. rules:{from_table:"customer.customer" to_tables:"commerce.customer"} rules:{from_table:"commerce.customer@replica" to_tables:"customer.customer"}
  2. rules:{from_table:"customer" to_tables:"commerce.customer"}
  3. rules:{from_table:"customer@replica" to_tables:"customer.customer"}

On Switching Primary Traffic

The routing rules for default read-write traffic are updated to map the table on the source to the target. In addition the tables are added to the “denylist” on the source keyspace which vttablet uses to reject queries for these tables on the old/inactive shards.

global/RoutingRules

  1. rules:{from_table:"commerce.customer@replica" to_tables:"customer.customer"}
  2. rules:{from_table:"customer.customer@replica" to_tables:"customer.customer"}
  3. rules:{from_table:"commerce.customer" to_tables:"customer.customer"}
  4. rules:{from_table:"customer" to_tables:"customer.customer"}

global/keyspaces/commerce/shards/0/Shard

  1. primary_alias:{cell:"zone1" uid:100}
  2. primary_term_start_time:{seconds:1627477340 nanoseconds:740407602}
  3. tablet_controls:{tablet_type:PRIMARY denylisted_tables:"customer"}
  4. is_primary_serving:true

Miscellaneous Notes

  • In VReplication workflows, cutovers are performed manually by the user executing the SwitchTraffic and ReverseTraffic actions e.g. for a MoveTables or Reshard vtctl client command.
  • When traffic for REPLICA and RDONLY tablets is switched not all read traffic is switched: primary/default reads will still be served from the source shards, until PRIMARY tablet traffic is also switched.