Materialize

Materialize the results of a query into a table

Description

Materialize is a lower level vreplication command that allows for generalized materialization of tables. The target tables can be copies, aggregations, or views. The target tables are kept in sync in near-realtime.

You can specify multiple tables to materialize using the create sub-command’s --table-settings flag.

Be careful to avoid using the INSTANT ADD COLUMN feature in MySQL 8.0+ with materialization source tables as this can cause the vreplication based materialization workflow to break.

The Basic Materialize Workflow Lifecycle

  1. Initiate the migration using Materialize
  2. Monitor the workflow using show or status
    Materialize --target-keyspace <target-keyspace> show --workflow <workflow>
    Materialize --target-keyspace <target-keyspace> status --workflow <workflow>

  3. Start accessing your views once the workflow has started Replicating

Command

Please see the Materialize command reference for a full list of sub-commands and their flags.

Example

  1. vtctldclient --server localhost:15999 Materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica

Parameters

Action

Materialize is an “umbrella” command. The action or sub-command defines the operation on the workflow.

Options

Each action or sub-command has additional options/parameters that can be used to modify its behavior. Please see the command’s reference docs for the full list of command options or flags. Below we will add additional information for a subset of key options.

--cells

optional
default local cell

A comma-separated list of cell names or cell aliases. This list is used by VReplication to determine which cells should be used to pick a tablet for selecting data from the source keyspace.

Uses
  • Improve performance by using picking a tablet in cells in network proximity with the target
  • To reduce bandwidth costs by skipping cells that are in different availability zones
  • Select cells where replica lags are lower

--tablet-types

optional
default --vreplication_tablet_type parameter value for the tablet. --vreplication_tablet_type has the default value of “in_order:REPLICA,PRIMARY”.
string

Source tablet types to replicate from (e.g. PRIMARY, REPLICA, RDONLY). The value specified impacts tablet selection for the workflow.

Uses
  • To reduce the load on PRIMARY tablets by using REPLICAs or RDONLYs
  • Reducing lag by pointing to PRIMARY

--table-settings

required
JSON

This is a JSON array where each value must contain two key/value pairs. The first required key is ‘target_table’ and it is the name of the table in the target-keyspace to store the results in. The second required key is ‘source_expression’ and its value is the select query to run against the source table. An optional key/value pair can also be specified for ‘create_ddl’ which provides the DDL to create the target table if it does not exist – you can alternatively specify a value of ‘copy’ if the target table schema should be copied as-is from the source keyspace. Here’s an example value for table-settings:

  1. [
  2. {
  3. "target_table": "customer_one_email",
  4. "source_expression": "select email from customer where customer_id = 1"
  5. },
  6. {
  7. "target_table": "states",
  8. "source_expression": "select * from states",
  9. "create_ddl": "copy"
  10. },
  11. {
  12. "target_table": "sales_by_sku",
  13. "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku",
  14. "create_ddl": "create table sales_by_sku (sku varbinary(128) not null primary key, orders bigint, revenue bigint)"
  15. }
  16. ]

Notes

There are special commands to perform common materialization tasks and you should prefer them to using Materialize directly.

  • If you just want to copy tables to a different keyspace use MoveTables
  • If you want to change sharding strategies use Reshard instead