Glossary

Terms

It’s helpful to understand a few terms before reading our architecture documentation.

TermDefinition
A
AST (Abstract syntax tree)Abstract Syntax Trees or ASTs are tree representations of code. They are a fundamental part of the way a compiler works.
C
ClusterA distributed MatrixOne deployment, which acts as a single logical application.
D
Data StorageA DataStorage is an interface for implementing distributed storage service. It must be defined in prior to using MatrixCube. DataStorage needs to be implemented based on the characteristics of storage engine.
E
Event NotifyThe machanism of synchronizing heartbeat information to all nodes is called an event notify.
F
FactorizationThe factorization method uses basic factorization formula to reduce any algebraic or quadratic equation into its simpler form. MatrixOne uses compact factorised representations at the physical layer to reduce data redundancy and boost query performance.
H
HeartbeatEvery node in a MatrixOne cluster will periodically sends its status information, this information is called a heartbeat.
M
MatrixCubeMatrixCube is a framework for building distributed systems, which offers guarantees about reliability, consistency, and scalability. It is designed to facilitate distributed, stateful application building to allow developers only need to focus on the business logic on a single node.
P
ProphetProphet is a scheduling module in MatrixCube. It takes charge of Auto-Rebalance, which keeps the system storage level and read/write throughput level balanced across Stores. The inital 3 Stores of a MatrixCube cluster are all Prophet Nodes.
Pure StorageIn contrast to Prophet, pure storage is another type of node, which doesn’t handle any scheduling job and works as simple storage.
R
ReplicaTo provide reliable service, each shard is stored not only once, it will have several copy stored in different stores. Every copy of a shard is called a Replica.
S
Snapshot Isolation (SI)Snapshot Isolation is a multi-version concurrency control approach that is widely used in practice. MatrixOne supports distributed transaction of snapshot isolation level.
StoreA MatrixCube distributed system consists of several physical servers, our data are stored across these physical server. We call each server inside this cluster a Store.
ShardIn MatrixOne, the data are split into different partitions to store in order to get better scalability. Each partition is called a Shard. In our design, a new created table is initially a Shard. When the size of the table exceeds the Shard size limit, the Shard will split.
Shard SplittingThere is a certain limit to a Shard size. Whenever a Shard exceeds its storage limit, MatrixCube splits a Shard into two Shards and keep each Shard with the same storage level.
Shard ProxyThe Shard Proxy is a central module to accept all user read/write requests and route requests to corresponding nodes.

Concepts

MatrixOne relies heavily on the following concepts. Being familiar with them will help you understand what our architecture achieves.

TermDefinition
A
Auto-RebalanceA modern distributed database should do more than just split data amongst a number of servers. The automatic process of storage and workload distribution among servers is called an Auto-Rebalance.
C
ConsistencyMatrixOne supports a strong consistency. It is guaranted that after any successful data write, the reading afterwards will get the latest value, no matter from which store.
E
Execution PlanAn execution plan in a database is a simple graphical representation of the operations that the query optimizer generates to calculate the most efficient way to return a set of results.
F
Fault-ToleranceFault-Tolerance simply means a system’s ability to continue operating uninterrupted despite the failure of one or more of its components.
J
JIT CompilationTurns SQL plan tree or Intermediate Representation code into a native program using LLVM at runtime.
M
Monolitic EngineA monolithic database engine is designed to support hybrid workloads: transactional, analytical, streaming, time-series, machine learning, etc.
Materialized ViewA materialized view is a pre-computed data set derived from a query specification (the SELECT in the view definition) and stored for later use. Materialized view is usually used for increasing performance and efficiency.
MetadataMetadata is the data that describes the structure and creation method of data in a database.
P
PaxosPaxos is an algorithm that is used to achieve consensus among a distributed set of computers that communicate via an asynchronous network.
R
RaftRaft is a consensus algorithm that is designed to be easy to understand. It’s equivalent to Paxos in fault-tolerance and performance. The difference is that it’s decomposed into relatively independent subproblems, and it cleanly addresses all major pieces needed for practical systems.
Raft Group and LeaderRaft defines a strong, single leader and number of followers in a group of peers. The group represents a replicated state machine. Only the leader may service client requests. The leader replicates actions to the followers.
S
SIMD instructionSIMD is short for Single Instruction/Multiple Data, while the term SIMD operations refers to a computing method that enables processing of multiple data with a single instruction.
T
TransactionA set of operations performed on your database that satisfy the requirements of ACID semantics.
V
Vectorized ExecutionVectorized data processing helps with developing faster analytical query engines by making efficient utilization of CPU cache. Arrow’s columnar format allows to use lightweight schemes like dictionary encoding, bit packing, and run length encoding, which favor query performance over compression ratio.