Detailed Stream Engine Architecture

MatrixOne incorporates a built-in stream engine designed for real-time querying, processing, and enriching data stored in a series of incoming data points, known as data streams. Developers can now use SQL to define and create stream processing pipelines as a real-time data backend service. Furthermore, developers can utilize SQL to query data within streams and establish connections with non-streaming datasets, thereby further simplifying the data stack.

Technical Architecture

The technical architecture of the MatrixOne stream engine is illustrated as follows:

Detailed Stream Engine Architecture - 图1

MatrixOne introduced the ability to create streaming tables and implemented a Kafka connector to fulfill the streaming data ingestion requirements of numerous time-series scenarios.

Connectors

Connectors facilitate connecting with external data sources, such as the Kafka connector introduced in MatrixOne 1.0.

MatrixOne supports the use of the following statement to establish a connection between connectors and external data sources:

  1. CREATE SOURCE | SINK CONNECTOR [IF NOT EXISTS] connector_name CONNECTOR_TYPE WITH (property_name = expression [, ...]);

Here, the parameter CONNECTOR_TYPE is used to specify the target.

Streams

A stream represents an append-only data flow akin to an unbounded table with infinite events. Each stream maps to an event group in the storage layer, such as Kafka topics or MatrixOne tables.

  • External stream: A stream using an external storage layer via connectors.
  • Internal stream: A stream that utilizes MatrixOne tables as the event storage.

MatrixOne supports the use of the following statement to create streams:

  1. CREATE [OR REPLACE] [EXTERNAL] STREAM [IF NOT EXISTS] stream_name
  2. ({ column_name data_type [KEY | HEADERS | HEADER(key)] } [, ...])
  3. WITH (property_name = expression [, ...]);

For example, you can refer to the following examples:

  1. CREATE EXTERNAL STREAM STUDENTS (ID STRING KEY, SCORE INT)
  2. WITH (kafka_topic = 'students_topic', value_format = 'JSON', partitions = 4);

Or:

  1. CREATE STREAM STUDENTS (ID STRING KEY, SCORE INT)

You can also query streams and connect them with other tables and materialized views, as shown below:

  1. SELECT * FROM STUDENTS WHERE rank > 5;

Additionally, you can insert new events, as demonstrated below:

  1. INSERT INTO foo (ROWTIME, KEY_COL, COL_A) VALUES (1510923225000, 'key', 'A');