Storage Engines

At the very bottom of the ArangoDB database lies the storageengine. The storage engine is responsible for persisting the documentson disk, holding copies in memory, providing indexes and caches tospeed up queries.

Up to version 3.1 ArangoDB only supported memory mapped files (MMFiles)as sole storage engine. Beginning with 3.2 ArangoDB has support forpluggable storage engines. The second supported engine is RocksDB fromFacebook.

MMFilesRocksDB
defaultoptional
dataset needs to fit into memorywork with as much data as fits on disk
indexes in memoryhot set in memory, data and indexes on disk
slow restart due to index rebuildingfast startup (no rebuilding of indexes)
volatile collections (only in memory, optional)collection data always persisted
collection level locking (writes block reads)concurrent reads and writes

Blog article: Comparing new RocksDB and MMFiles storage engines

RocksDB is an embeddable persistent key-value store. It is a logstructure database and is optimized for fast storage.

The MMFiles engine is optimized for the use-case where the data fitsinto the main memory. It allows for very fast concurrentreads. However, writes block reads and locking is on collectionlevel. Indexes are always in memory and are rebuilt on startup. Thisgives better performance but imposes a longer startup time.

The RocksDB engine is optimized for large data-sets and allows for asteady insert performance even if the data-set is much larger than themain memory. Indexes are always stored on disk but caches are used tospeed up performance. RocksDB uses document-level locks allowing forconcurrent writes. Writes do not block reads. Reads do not block writes.

The engine must be selected for the whole server / cluster. It is notpossible to mix engines. The transaction handling and write-ahead-logformat in the individual engines is very different and therefore cannot be mixed.

RocksDB

Advantages

RocksDB is a very flexible engine that can be configured for various use cases.

The main advantages of RocksDB are

  • document-level locks
  • support for large data-sets
  • persistent indexes

Caveats

RocksDB allows concurrent writes. However, when touching the same document awrite conflict is raised. This cannot happen with the MMFiles engine, thereforeapplications that switch to RocksDB need to be prepared that such exception canarise. It is possible to exclusively lock collections when executing AQL. Thiswill avoid write conflicts but also inhibits concurrent writes.

Currently, another restriction is due to the transaction handling inRocksDB. Transactions are limited in total size. If you have a statementmodifying a lot of documents it is necessary to commit data inbetween. This willbe done automatically for AQL by default.

Performance

RocksDB is a based on log-structured merge tree. A good introduction can befound in:

RocksDB itself provides a lot of different knobs to fine tune the storageengine according to your use-case. ArangoDB supports the most common onesusing the options below.

Performance reports for the storage engine can be found here:

ArangoDB options

ArangoDB has a cache for the persistent indexes in RocksDB. The total size of this cache is controlled by the option

  1. --cache.size

RocksDB also has a cache for the blocks stored on disk. The size ofthis cache is controlled by the option

  1. --rocksdb.block-cache-size

ArangoDB distributes the available memory equally between the twocaches by default.

ArangoDB chooses a size for the various levels in RocksDB that issuitable for general purpose applications.

RocksDB log strutured data levels have increasing size

  1. MEM: --
  2. L0: --
  3. L1: -- --
  4. L2: -- -- -- --
  5. ...

New or updated Documents are first stored in memory. If this memtablereaches the limit given by

  1. --rocksdb.write-buffer-size

it will converted to an SST file and inserted at level 0.

The following option controls the size of each level and the depth.

  1. --rocksdb.num-levels N

Limits the number of levels to N. By default it is 7 and there isseldom a reason to change this. A new level is only opened if there istoo much data in the previous one.

  1. --rocksdb.max-bytes-for-level-base B

L0 will hold at most B bytes.

  1. --rocksdb.max-bytes-for-level-multiplier M

Each level is at most M times as much bytes as the previousone. Therefore the maximum number of bytes forlevel L can becalculated as

  1. max-bytes-for-level-base * (max-bytes-for-level-multiplier ^ (L-1))

Future

RocksDB imposes a limit on the transaction size. It is optimized tohandle small transactions very efficiently, but is effectively limiting the total size of transactions.

ArangoDB currently uses RocksDB’s transactions to implement the ArangoDB transaction handling. Therefore the same restrictions apply for ArangoDBtransactions when using the RocksDB engine.

We will improve this by introducing distributed transactions in a futureversion of ArangoDB. This will allow handling large transactions as a series of small RocksDB transactions and hence removing the size restriction.