Vitess Sequences

This document describes the Vitess Sequences feature, and how to use it.

Motivation

MySQL provides the auto-increment feature to assign monotonically incrementingIDs to a column in a table. However, when a table is sharded across multipleinstances, maintaining the same feature is a lot more tricky.

Vitess Sequences fill that gap:

  • Inspired from the usual SQL sequences (implemented in different ways byOracle, SQL Server and PostgreSQL).

  • Very high throughput for ID creation, using a configurable in-memory block allocation.

  • Transparent use, similar to MySQL auto-increment: when the field is omitted inan insert statement, the next sequence value is used.

When not to Use Auto-Increment

Before we go any further, an auto-increment column has limitations anddrawbacks. let’s explore this topic a bit here.

Security Considerations

Using auto-increment can leak confidential information about a service. Let’stake the example of a web site that store user information, and assign user IDsto its users as they sign in. The user ID is then passed in a cookie for allsubsequent requests.

The client then knows their own user ID. It is now possible to:

  • Try other user IDs and expose potential system vulnerabilities.

  • Get an approximate number of users of the system (using the user ID).

  • Get an approximate number of sign-ins during a week (creating two accounts aweek apart, and diffing the two IDs).

Auto-incrementing IDs should be reserved for either internal applications, orexposed to the clients only when safe.

Alternatives

Alternative to auto-incrementing IDs are:

  • use a 64 bits random generator number. Try to insert a new row with thatID. If taken (because the statement returns an integrity error), try anotherID.

  • use a UUID scheme, and generate truely unique IDs.

Now that this is out of the way, let’s get to MySQL auto-increment.

MySQL Auto-increment Feature

Let’s start by looking at the MySQL auto-increment feature:

  • A row that has no value for the auto-increment value will be given the next ID.

  • The current value is stored in the table metadata.

  • Values may be ‘burned’ (by rolled back transactions).

  • Inserting a row with a given value that is higher than the current value willset the current value.

  • The value used by the master in a statement is sent in the replication stream,so replicas will have the same value when re-playing the stream.

  • There is no strict guarantee about ordering: two concurrent statements mayhave their commit time in one order, but their auto-incrementing ID in theopposite order (as the value for the ID is reserved when the statement isissued, not when the transaction is committed).

  • MySQL has multiple options for auto-increment, like only using every N number(for multi-master configurations), or performance related features (lockingthat table’s current ID may have concurrency implications).

  • When inserting a row in a table with an auto-increment column, if the valuefor the auto-increment row is not set, the value for the column is returned tothe client alongside the statement result.

Vitess Sequences

An early design was to use a single unsharded database and a table with anauto-increment value to generate new values. However, this has seriouslimitations, in particular throughtput, and storing one entry for each value inthat table, for no reason.

So we decided instead to base sequences on a MySQL table, and use a single valuein that table to describe which values the sequence should have next. Toincrease performance, we also support block allocation of IDs: each update tothe MySQL table is only done every N IDs (N being configurable), and in betweenonly memory structures in vttablet are updated, making the QPS only limited byRPC latency.

The sequence table then is an unsharded single row table that Vitess can use to generate monotonically increasing ids. The VSchema allows you to associate a column of a table with the sequence table. Once they are associated, an insert on that table will transparently fetch an id from the sequence table, fill in the value, and route the row to the appropriate shard.

Since sequences are unsharded tables, they will be stored in the database (in our tutorial example, this is the commerce database).

The final goal is to have Sequences supported with SQL statements, like:

  1. /* DDL support */
  2. CREATE SEQUENCE my_sequence;
  3. SELECT NEXT VALUE FROM my_sequence;
  4. ALTER SEQUENCE my_sequence ...;
  5. DROP SEQUENCE my_sequence;
  6. SHOW CREATE SEQUENCE my_sequence;

In the current implementation, we support the query access to Sequences, but notthe administration commands yet.

Creating a Sequence

Note: The names in this section are extracted from the examples/demo sampleapplication.

To create a Sequence, a backing table must first be created and initialized with a single row. The columns for that table have to be respected.

This is an example:

  1. create table user_seq(id int, next_id bigint, cache bigint, primary key(id)) comment 'vitess_sequence';
  2. insert into user_seq(id, next_id, cache) values(0, 1, 100);

Then, the Sequence has to be defined in the VSchema for that keyspace:

  1. {
  2. "sharded": false,
  3. "tables": {
  4. "user_seq": {
  5. "type": "sequence"
  6. },
  7. ...
  8. }
  9. }

And the table it is going to be using it can also reference the Sequence in its VSchema:

  1. {
  2. ...
  3. "tables" : {
  4. "user": {
  5. "column_vindexes": [
  6. ...
  7. ],
  8. "auto_increment": {
  9. "column": "user_id",
  10. "sequence": "user_seq"
  11. }
  12. },

After this done (and the Schema has been reloaded on master tablet, and theVSchema has been pushed), the sequence can be used.

Accessing a Sequence

If a Sequence is used to fill in a column for a table, nothing further needs tobe done. Just sending no value for the column will make vtgate insert the nextSequence value in its place.

It is also possible to access the Sequence directly with the following SQL constructs:

  1. /* Returns the next value for the sequence */
  2. select next value from my_sequence;
  3. /* Returns the next value for the sequence, and also reserve 4 values after that. */
  4. select next 5 values from my_sequence;

TO-DO List

DDL Support

We want to add DDL support for sequences, as previously mentioned:

  1. CREATE SEQUENCE my_sequence;
  2. ALTER SEQUENCE my_sequence ...;
  3. DROP SEQUENCE my_sequence;
  4. SHOW CREATE SEQUENCE my_sequence;

But for now, the Sequence backing table has to be created and managed using theusual schema management features, with the right column definitions and table comment.