Configuring Components

Managed MySQL

The following describes the requirements for Vitess when fully managing MySQL with mysqlctl (see VTTablet Modes).

When using Unmanaged or Remote MySQL instead, the requirement is only that the server speak the MySQL protocol.

Version and Flavor

mysqlctl supports MySQL/Percona Server 5.6 to 8.0, and MariaDB 10.0 to 10.3. MariaDB 10.4 is currently known to have installation issues (#5362).

Base Configuration

Starting with Vitess 4.0, mysqlctl will auto-detect the version and flavor of MySQL you are using, and automatically-include a base configuration file in config/mycnf/*.

Auto-dection works by searching for mysqld in the $PATH, as well as in the environment variable $VT_MYSQL_ROOT. If auto-detection fails, mysqlctl will apply version detection based on the $MYSQL_FLAVOR environment variable. Auto-detection will always take precedence over $MYSQL_FLAVOR.

Specifying Additional Configuration

The automatically-included base configuration makes only the required settings changes for Vitess to operate correctly. It is recommended to configure InnoDB settings such as innodb_buffer_pool_size and innodb_log_file_size according to your available system resources.

mysqlctl will not read configuration files from common locations such as /etc/my.cnf or /etc/mysql/my.cnf. To include a custom my.cnf file as part of the initialization of tablets, set the $EXTRA_MY_CNF environment variable to a list of colon-separated files. Each file must be an absolute path.

In Kubernetes, you can use a ConfigMap to overwrite the entire $VTROOT/config/mycnf directory with your custom versions, rather than baking them into a custom container image.

Unsupported Configuration Changes

When specifying additional configuration changes to Vitess, please keep in mind that changing the following settings is unsupported:

SettingReason
auto_commitMySQL autocommit needs to be turned on. VTTablet uses connection pools to MySQL. If autocommit is turned off, MySQL will start an implicit transaction (with a point in time snapshot) for each connection and will work very hard at keeping the current view unchanged, which would be counter-productive.
log-binSeveral Vitess features rely on the binary log being enabled.
binlog-formatVitess only supports row-based replication. Do not change this setting from the included configuration files.
binlog-row-imageVitess only supports the default value (FULL)
log-slave-updatesVitess requires this setting enabled, as it is in the included configuration files.
character-set*Vitess only supports utf8 (and variants such as utf8mb4)
gtid-modeVitess relies on GTIDs to track changes to topology.
gtid-strict-mode/enforce-gtid-consistencyVitess requires this setting to be unchanged.
sql-modeVitess can operate with non-default SQL modes, but VTGate will not allow you to change the sql-mode on a per-session basis. This can create compatibility issues for applications that require changes to this setting.

init_db.sql

When a new instance is initialized with mysqlctl init (as opposed to restarting in a previously initialized data dir with mysqlctl start), the init_db.sql file is applied to the server immediately after running the bootstrap procedure (either mysqld —initialize-insecure or mysql_install_db, depending on the MySQL version). This file is also responsible for removing unprivileged users, as well as adding the necessary tables and grants for Vitess.

Note that changes to this file will not be reflected in shards that have already been initialized and had at least one backup taken. New instances in such shards will automatically restore the latest backup upon vttablet startup, overwriting the data dir created by mysqlctl.

Vitess Servers

Logging

Vitess servers write to log files, and they are rotated when they reach a maximum size. It’s recommended that you run at INFO level logging. The information printed in the log files come in handy for troubleshooting. You can limit the disk usage by running cron jobs that periodically purge or archive them.

Vitess supports both MySQL protocol and gRPC for communication between client and Vitess and uses gRPC for communication between Vitess servers. By default, Vitess does not use SSL.

Also, even without using SSL, we allow the use of an application-provided CallerID object. It allows unsecure but easy to use authorization using Table ACLs.

See the Transport Security Model document for more information on how to setup both of these features, and what command line parameters exist.

Topology Service configuration

Vttablet, vtgate and vtctld need the right command line parameters to find the topology service. First the topo_implementation flag needs to be set to one of zk2, etcd2, or consul. Then they’re all configured as follows:

  • The topo_global_server_address contains the server address / addresses of the global topology service.
  • The topo_global_root contains the directory / path to use.Note that the local cell for the tablet must exist and be configured properly in the Topology Service for vttablet to start. Local cells are configured inside the topology service, by using the vtctl AddCellInfo command. See the Topology Service documentation for more information.

VTTablet

VTTablet has a large number of command line options. Some important ones will be covered here. In terms of provisioning these are the recommended values

  • 2-4 cores (in proportion to MySQL cores)
  • 2-4 GB RAM

Directory Configuration

vttablet supports a number of command line options and environment variables to facilitate its setup.

The VTDATAROOT environment variable specifies the toplevel directory for all data files. If not set, it defaults to /vt.

By default, a vttablet will use a subdirectory in VTDATAROOT named vt_NNNNNNNNNN where NNNNNNNNNN is the tablet id. The tablet_dir command-line parameter allows overriding this relative path. This is useful in containers where the filesystem only contains one vttablet, in order to have a fixed root directory.

When starting up and using mysqlctl to manage MySQL, the MySQL files will be in subdirectories of the tablet root. For instance, bin-logs for the binary logs, data for the data files, and relay-logs for the relay logs.

It is possible to host different parts of a MySQL server files on different partitions. For instance, the data file may reside in flash, while the bin logs and relay logs are on spindle. To achieve this, create a symlink from $VTDATAROOT/<dir name> to the proper location on disk. When MySQL is configured by mysqlctl, it will realize this directory exists, and use it for the files it would otherwise have put in the tablet directory. For instance, to host the binlogs in /mnt/bin-logs:

  • Create a symlink from $VTDATAROOT/bin-logs to /mnt/bin-logs.
  • When starting up a tablet:
    • /mnt/bin-logs/vt_NNNNNNNNNN will be created.
    • $VTDATAROOT/vt_NNNNNNNNNN/bin-logs will be a symlink to /mnt/bin-logs/vt_NNNNNNNNNN

Initialization

  • Init_keyspace, init_shard, init_tablet_type: These parameters should be set at startup with the keyspace / shard / tablet type to start the tablet as. Note ‘master’ is not allowed here, instead use ‘replica’, as the tablet when starting will figure out if it is the master (this way, all replica tablets start with the same command line parameters, independently of which one is the master).

Query server parameters

  • queryserver-config-pool-size: This value should typically be set to the max number of simultaneous queries you want MySQL to run. This should typically be around 2-3x the number of allocated CPUs. Around 4-16. There is not much harm in going higher with this value, but you may see no additional benefits.
  • queryserver-config-stream-pool-size: This value is relevant only if you plan to run streaming queries against the database. It’s recommended that you use rdonly instances for such streaming queries. This value depends on how many simultaneous streaming queries you plan to run. Typical values are in the low 100s.
  • queryserver-config-transaction-cap: This value should be set to how many concurrent transactions you wish to allow. This should be a function of transaction QPS and transaction length. Typical values are in the low 100s.
  • queryserver-config-query-timeout: This value should be set to the upper limit you’re willing to allow a query to run before it’s deemed too expensive or detrimental to the rest of the system. VTTablet will kill any query that exceeds this timeout. This value is usually around 15-30s.
  • queryserver-config-transaction-timeout: This value is meant to protect the situation where a client has crashed without completing a transaction. Typical value for this timeout is 30s.
  • queryserver-config-max-result-size: This parameter prevents the OLTP application from accidentally requesting too many rows. If the result exceeds the specified number of rows, VTTablet returns an error. The default value is 10,000.

DB config parameters

VTTablet requires multiple user credentials to perform its tasks. Since it’s required to run on the same machine as MySQL, it’s most beneficial to use the more efficient unix socket connections.

connection parameters

  • db_socket: The unix socket to connect on. If this is specified, host and port will not be used.
  • db_host: The host name for the tcp connection.
  • db_port: The tcp port to be used with the db_host.
  • db_charset: Character set. Only utf8 or latin1 based character sets are supported.
  • db_flags: Flag values as defined by MySQL.
  • db_ssl_ca, db_ssl_ca_path, db_ssl_cert, db_ssl_key: SSL flags.app credentials are for serving app queries:

  • db_app_user: App username.

  • db_app_password: Password for the app username. If you need a more secure way of managing and supplying passwords, VTTablet does allow you to plug into a “password server” that can securely supply and refresh usernames and passwords. Please contact the Vitess team for help if you’d like to write such a custom plugin.
  • db_app_use_ssl: Set this flag to false if you don’t want to use SSL for this connection. This will allow you to turn off SSL for all users except for repl, which may have to be turned on for replication that goes over open networks.appdebug credentials are for the appdebug user:

  • db_appdebug_user

  • db_appdebug_password
  • db_appdebug_use_ssldba credentials will be used for housekeeping work like loading the schema or killing runaway queries:

  • db_dba_user

  • db_dba_password
  • db_dba_use_sslrepl credentials are for managing replication.

  • db_repl_user

  • db_repl_password
  • db_repl_use_sslfiltered credentials are for performing resharding:

  • db_filtered_user

  • db_filtered_password
  • db_filtered_use_ssl

Monitoring

VTTablet exports a wealth of real-time information about itself. This section will explain the essential ones:

/debug/status

This page has a variety of human-readable information about the current VTTablet. You can look at this page to get a general overview of what’s going on. It also has links to various other diagnostic URLs below.

/debug/vars

This is the most important source of information for monitoring. There are other URLs below that can be used to further drill down.

Queries (as described in /debug/vars section)

Vitess has a structured way of exporting certain performance stats. The most common one is the Histogram structure, which is used by Queries:

  1. "Queries": {
  2. "Histograms": {
  3. "PASS\_SELECT": {
  4. "1000000": 1138196,
  5. "10000000": 1138313,
  6. "100000000": 1138342,
  7. "1000000000": 1138342,
  8. "10000000000": 1138342,
  9. "500000": 1133195,
  10. "5000000": 1138277,
  11. "50000000": 1138342,
  12. "500000000": 1138342,
  13. "5000000000": 1138342,
  14. "Count": 1138342,
  15. "Time": 387710449887,
  16. "inf": 1138342
  17. }
  18. },
  19. "TotalCount": 1138342,
  20. "TotalTime": 387710449887
  21. },

The histograms are broken out into query categories. In the above case, “PASS_SELECT” is the only category. An entry like "500000": 1133195 means that 1133195 queries took under 500000 nanoseconds to execute.

Queries.Histograms.PASS_SELECT.Count is the total count in the PASS_SELECT category.

Queries.Histograms.PASS_SELECT.Time is the total time in the PASS_SELECT category.

Queries.TotalCount is the total count across all categories.

Queries.TotalTime is the total time across all categories.

There are other Histogram variables described below, and they will always have the same structure.

Use this variable to track:

  • QPS
  • Latency
  • Per-category QPS. For replicas, the only category will be PASS_SELECT, but there will be more for masters.
  • Per-category latency
  • Per-category tail latency

Results

  1. "Results": {
  2. "0": 0,
  3. "1": 0,
  4. "10": 1138326,
  5. "100": 1138326,
  6. "1000": 1138342,
  7. "10000": 1138342,
  8. "5": 1138326,
  9. "50": 1138326,
  10. "500": 1138342,
  11. "5000": 1138342,
  12. "Count": 1138342,
  13. "Total": 1140438,
  14. "inf": 1138342
  15. }

Results is a simple histogram with no timing info. It gives you a histogram view of the number of rows returned per query.

Mysql

Mysql is a histogram variable like Queries, except that it reports MySQL execution times. The categories are “Exec” and “ExecStream”.

In the past, the exec time difference between VTTablet and MySQL used to be substantial. With the newer versions of Go, the VTTablet exec time has been predominantly been equal to the mysql exec time, conn pool wait time and consolidations waits. In other words, this variable has not shown much value recently. However, it’s good to track this variable initially, until it’s determined that there are no other factors causing a big difference between MySQL performance and VTTablet performance.

Transactions

Transactions is a histogram variable that tracks transactions. The categories are “Completed” and “Aborted”.

Waits

Waits is a histogram variable that tracks various waits in the system. Right now, the only category is “Consolidations”. A consolidation happens when one query waits for the results of an identical query already executing, thereby saving the database from performing duplicate work.

This variable used to report connection pool waits, but a refactor moved those variables out into the pool related vars.

Errors

  1. "Errors": {
  2. "Deadlock": 0,
  3. "Fail": 1,
  4. "NotInTx": 0,
  5. "TxPoolFull": 0
  6. },

Errors are reported under different categories. It’s beneficial to track each category separately as it will be more helpful for troubleshooting. Right now, there are four categories. The category list may vary as Vitess evolves.

Plotting errors/query can sometimes be useful for troubleshooting.

VTTablet also exports an InfoErrors variable that tracks inconsequential errors that don’t signify any kind of problem with the system. For example, a dup key on insert is considered normal because apps tend to use that error to instead update an existing row. So, no monitoring is needed for that variable.

InternalErrors

  1. "InternalErrors": {
  2. "HungQuery": 0,
  3. "Invalidation": 0,
  4. "MemcacheStats": 0,
  5. "Mismatch": 0,
  6. "Panic": 0,
  7. "Schema": 0,
  8. "StrayTransactions": 0,
  9. "Task": 0
  10. },

An internal error is an unexpected situation in code that may possibly point to a bug. Such errors may not cause outages, but even a single error needs be escalated for root cause analysis.

Kills

  1. "Kills": {
  2. "Queries": 2,
  3. "Transactions": 0
  4. },

Kills reports the queries and transactions killed by VTTablet due to timeout. It’s a very important variable to look at during outages.

TransactionPool*

There are a few variables with the above prefix:

  1. "TransactionPoolAvailable": 300,
  2. "TransactionPoolCapacity": 300,
  3. "TransactionPoolIdleTimeout": 600000000000,
  4. "TransactionPoolMaxCap": 300,
  5. "TransactionPoolTimeout": 30000000000,
  6. "TransactionPoolWaitCount": 0,
  7. "TransactionPoolWaitTime": 0,
  • WaitCount will give you how often the transaction pool gets full that causes new transactions to wait.
  • WaitTime/WaitCount will tell you the average wait time.
  • Available is a gauge that tells you the number of available connections in the pool in real-time. Capacity-Available is the number of connections in use. Note that this number could be misleading if the traffic is spiky.

Other Pool variables

Just like TransactionPool, there are variables for other pools:

  • ConnPool: This is the pool used for read traffic.
  • StreamConnPool: This is the pool used for streaming queries.There are other internal pools used by VTTablet that are not very consequential.

TableACLAllowed, TableACLDenied, TableACLPseudoDenied

The above three variables table acl stats broken out by table, plan and user.

QueryPlanCacheSize

If the application does not make good use of bind variables, this value would reach the QueryCacheCapacity. If so, inspecting the current query cache will give you a clue about where the misuse is happening.

QueryCounts, QueryErrorCounts, QueryRowCounts, QueryTimesNs

These variables are another multi-dimensional view of Queries. They have a lot more data than Queries because they’re broken out into tables as well as plan. This is a priceless source of information when it comes to troubleshooting. If an outage is related to rogue queries, the graphs plotted from these vars will immediately show the table on which such queries are run. After that, a quick look at the detailed query stats will most likely identify the culprit.

UserTableQueryCount, UserTableQueryTimesNs, UserTransactionCount, UserTransactionTimesNs

These variables are yet another view of Queries, but broken out by user, table and plan. If you have well-compartmentalized app users, this is another priceless way of identifying a rogue “user app” that could be misbehaving.

DataFree, DataLength, IndexLength, TableRows

These variables are updated periodically from information_schema.tables. They represent statistical information as reported by MySQL about each table. They can be used for planning purposes, or to track unusual changes in table stats.

  • DataFree represents data_free
  • DataLength represents data_length
  • IndexLength represents index_length
  • TableRows represents table_rows

/debug/health

This URL prints out a simple “ok” or “not ok” string that can be used to check if the server is healthy. The health check makes sure mysqld connections work, and replication is configured (though not necessarily running) if not master.

/queryz, /debug/query_stats, /debug/query_plans, /streamqueryz

  • /debug/query_stats is a JSON view of the per-query stats. This information is pulled in real-time from the query cache. The per-table stats in /debug/vars are a roll-up of this information.
  • /queryz is a human-readable version of /debug/query_stats. If a graph shows a table as a possible source of problems, this is the next place to look at to see if a specific query is the root cause.
  • /debug/query_plans is a more static view of the query cache. It just shows how VTTablet will process or rewrite the input query.
  • /streamqueryz lists the currently running streaming queries. You have the option to kill any of them from this page.

/querylogz, /debug/querylog, /txlogz, /debug/txlog

  • /debug/querylog is a never-ending stream of currently executing queries with verbose information about each query. This URL can generate a lot of data because it streams every query processed by VTTablet. The details are as per this function: https://github.com/vitessio/vitess/blob/master/go/vt/tabletserver/logstats.go#L202
  • /querylogz is a limited human readable version of /debug/querylog. It prints the next 300 queries by default. The limit can be specified with a limit=N parameter on the URL.
  • /txlogz is like /querylogz, but for transactions.
  • /debug/txlog is the JSON counterpart to /txlogz.

/consolidations

This URL has an MRU list of consolidations. This is a way of identifying if multiple clients are spamming the same query to a server.

/schemaz, /debug/schema

  • /schemaz shows the schema info loaded by VTTablet.
  • /debug/schema is the JSON version of /schemaz.

/debug/query_rules

This URL displays the currently active query blacklist rules.

Alerting

Alerting is built on top of the variables you monitor. Before setting up alerts, you should get some baseline stats and variance, and then you can build meaningful alerting rules. You can use the following list as a guideline to build your own:

  • Query latency among all vttablets
  • Per keyspace latency
  • Errors/query
  • Memory usage
  • Unhealthy for too long
  • Too many vttablets down
  • Health has been flapping
  • Transaction pool full error rate
  • Any internal error
  • Traffic out of balance among replicas
  • Qps/core too high

VTGate

A typical VTGate should be provisioned as follows.

  • 2-4 cores
  • 2-4 GB RAMSince VTGate is stateless, you can scale it linearly by just adding more servers as needed. Beyond the recommended values, it’s better to add more VTGates than giving more resources to existing servers, as recommended in the philosophy section.

Load-balancer in front of vtgate to scale up (not covered by Vitess). Stateless, can use the health URL for health check.

Parameters

  • cells_to_watch: which cell vtgate is in and will monitor tablets from. Cross-cell master access needs multiple cells here.
  • tablet_types_to_wait: VTGate waits for at least one serving tablet per tablet type specified here during startup, before listening to the serving port. So VTGate does not serve error. It should match the available tablet types VTGate connects to (master, replica, rdonly).
  • discovery_low_replication_lag: when replication lags of all VTTablet in a particular shard and tablet type are less than or equal the flag (in seconds), VTGate does not filter them by replication lag and uses all to balance traffic.
  • degraded_threshold (30s): a tablet will publish itself as degraded if replication lag exceeds this threshold. This will cause VTGates to choose more up-to-date servers over this one. If all servers are degraded, VTGate resorts to serving from all of them.
  • unhealthy_threshold (2h): a tablet will publish itself as unhealthy if replication lag exceeds this threshold.
  • transaction_mode (multi): single: disallow multi-db transactions, multi: allow multi-db transactions with best effort commit, twopc: allow multi-db transactions with 2pc commit.
  • normalize_queries (false): Turning this flag on will cause vtgate to rewrite queries with bind vars. This is beneficial if the app doesn’t itself send normalized queries.

Monitoring

/debug/status

This is the landing page for a VTGate, which can gives you a status on how a particular server is doing. Of particular interest there is the list of tablets this vtgate process is connected to, as this is the list of tablets that can potentially serve queries.

/debug/vars

VTGateApi

This is the main histogram variable to track for vtgates. It gives you a break up of all queries by command, keyspace, and type.

HealthcheckConnections

It shows the number of tablet connections for query/healthcheck per keyspace, shard, and tablet type.

/debug/query_plans

This URL gives you all the query plans for queries going through VTGate.

/debug/vschema

This URL shows the vschema as loaded by VTGate.

Alerting

For VTGate, here’s a list of possible variables to alert on:

  • Error rate
  • Error/query rate
  • Error/query/tablet-type rate
  • VTGate serving graph is stale by x minutes (topology service is down)
  • Qps/core
  • Latency

External processes

Things that need to be configured:

Periodic backup configuration

We recommend to take backups regularly e.g. you should set up a cron job for it. See our recommendations at /user-guide/backup-and-restore/#backup-frequency.

Logs archiver/purger

You will need to run some cron jobs to archive or purge log files periodically.

Orchestrator

Orchestrator is a tool for managing MySQL replication topologies, including automated failover. It can detect master failure and initiate a recovery in a matter of seconds.

For the most part, Vitess is agnostic to the actions of Orchestrator, which operates below Vitess at the MySQL level. That means you can pretty much [https://github.com/github/orchestrator/wiki/Orchestrator-Manual[set](https://github.com/github/orchestrator/wiki/Orchestrator-Manual[set) up Orchestrator](https://github.com/github/orchestrator/wiki/Orchestrator-Manual) in the normal way, with just a few additions as described below.

For the Kubernetes example, we provide a sample script to launch Orchestrator for you with these settings applied.

Orchestrator configuration

Orchestrator needs to know some things from the Vitess side, like the tablet aliases and whether semisync is enforced (with async fallback disabled). We pass this information by telling Orchestrator to execute certain queries that return local metadata from a non-replicated table, as seen in our sample orchestrator.conf.json:

  1. "DetectClusterAliasQuery": "SELECT value FROM _vt.local_metadata WHERE name='ClusterAlias'",
  2. "DetectInstanceAliasQuery": "SELECT value FROM _vt.local_metadata WHERE name='Alias'",
  3. "DetectPromotionRuleQuery": "SELECT value FROM _vt.local_metadata WHERE name='PromotionRule'",
  4. "DetectSemiSyncEnforcedQuery": "SELECT @@global.rpl_semi_sync_master_wait_no_slave AND @@global.rpl_semi_sync_master_timeout > 1000000",

There is also one thing that Vitess needs to know from Orchestrator, which is the identity of the master for each shard, if a failover occurs.

From our experience at YouTube, we believe that this signal is too critical for data integrity to rely on bottom-up detection such as asking each MySQL if it thinks it’s the master. Instead, we rely on Orchestrator to be the source of truth, and expect it to send a top-down signal to Vitess.

This signal is sent by ensuring the Orchestrator server has access to vtctlclient, which it then uses to send an RPC to vtctld, informing Vitess of the change in mastership via the TabletExternallyReparented command.

  1. "PostMasterFailoverProcesses": [
  2. "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log",
  3. "vtctlclient -server vtctld:15999 TabletExternallyReparented {successorAlias}"
  4. ],

VTTablet configuration

Normally, you need to seed Orchestrator by giving it the addresses of MySQL instances in each shard. If you have lots of shards, this could be tedious or error-prone.

Luckily, Vitess already knows everything about all the MySQL instances that comprise your cluster. So we provide a mechanism for tablets to self-register with the Orchestrator API, configured by the following vttablet parameters:

  • orc_api_url: Address of Orchestrator’s HTTP API (e.g. http://host:port/api/). Leave empty to disable Orchestrator integration.
  • orc_discover_interval: How often (e.g. 60s) to ping Orchestrator’s HTTP API endpoint to tell it we exist. 0 means never.Not only does this relieve you from the initial seeding of addresses into Orchestrator, it also means new instances will be discovered immediately, and the topology will automatically repopulate even if Orchestrator’s backing store is wiped out. Note that Orchestrator will forget stale instances after a configurable timeout.