Backup and Restore

ArangoDB supports three backup methods:

  • Physical (raw or “cold”) backups
  • Logical backups
  • Hot backupsThese backup methods save the data which is in the database system. In addition,make sure to backup things like configuration files, startup scripts, Foxxservices, access tokens, secrets, certificates etc. and store them in adifferent location securely.

Performing frequent backups is important and a recommended best practices thatcan allow you to recover your data in case unexpected problems occur.Hardware failures, system crashes, or users mistakenly deleting data can alwayshappen. Furthermore, while a big effort is put into the development and testingof ArangoDB (in all its deployment modes), ArangoDB, as any other softwareproduct, might include bugs or errors and data loss could occur.It is therefore important to regularly backup your data to be able to recoverand get up and running again in case of serious problems.

Creating backups of your data before an ArangoDB upgrade is also a best practice.

Making use of a high availability deployment mode of ArangoDB, like Active Failover,Cluster or data-center to data-center replication, does not remove the need oftaking frequent backups, which are recommended also when using such deployment modes.

Physical backups

Physical (raw or “cold”) backups can be done when the ArangoDB Server is not runningby making a raw copy of the ArangoDB data directory.

Such backups are extremely fast as they only involve file copying.

If ArangoDB is running in Active Failover or Cluster mode, it will be necessaryto copy the data directories of all the involved processes (Agents, Coordinators andDBServers).

It is extremely important that physical backups are taken only after all the ArangoDBprocesses have been shut down and the processes are not running anymore.Otherwise files might still be written to, likely resulting in a corrupt and incomplete backup.

It is not always possible to take a physical backup as this method requires a shutdownof the ArangoDB processes. However in some occasions such backups are useful, oftenin conjunction to the backup coming from another backup method.

Logical Backups

Logical backups can be created and restored with the toolsarangodump andarangorestore.

In order to speed up the arangorestore performance in a Cluster environment,the Fast Cluster Restoreprocedure is recommended.

Hot Backups

Introduced in: v3.5.1

Hot backup and restore associated operations can be performed with thearangobackup client tool and theHot Backup HTTP API.

Arangobackup and the Hot Backup API are only available in theEnterprise Edition,also available as managed service.

Many operations cannot afford downtimes and thus require administrators andoperators to create consistent freezes of the data during normal operation.Such use cases imply that near instantaneous hot backups must beobtained in sync across say a cluster’s deployment. For this purpose thehot backup mechanism was created.

The process of creating hot backups is ideally an instantaneous event duringnormal operations, that consists of a few subsequent steps behind the scenes:

  • Stop all write accesses to the entire installation using a write transaction lock.
  • Create a new local directory under <data-dir>/backups/<timestamp>_<backup-label>.
  • Create hard links to the active database files in <data-dir> in the newlycreated backup directory.
  • Release the write transaction lock to resume normal operation.
  • Report success of the operation.

The above quite precisely describes the tasks in a single instance installationand could technically finish in under a millisecond. The unknown factor above isof course, when the hot backup process is able to obtain the write transaction lock.

When considering the ArangoDB cluster two more steps need to integrate whileothers just become slightly more exciting. On the Coordinator tasked with thehot backup the following is done:

  • Using the Agency, make sure that no two hot backups collide.
  • Obtain a dump of the Agency’s Plan key.
  • Stop all write access to the entire cluster installation using aglobal write transaction lock, this amounts to get each local writetransaction lock on each DB-Server, all at the same time.
  • Getting all the locks on the DB-Servers is tried using subsequently growingtime periods, and if not all local locks can be acquired during a period,all locks are released again to allow writes to continue. If it is notpossible to acquire all local locks in the same period, and this continuesfor an extended, configurable amount of time, the Coordinator givesup. With the allowInconsistent option set to true, it proceeds insteadto create a potentially non-consistent hot backup.
  • On each DB-Server create a new local directory under<data-dir>/backups/<timestamp>_<backup-label>.
  • On each DB-Server create hard links to the active database filesin <data-dir> in the newly created backup directory.
  • On each DB-Server store a redundant copy of the above Agency dump.
  • Release the global write transaction lock to resume normal operation.
  • Report success of the operation.

Again under good conditions, a complete hot backup could be obtained from acluster with many DB-Servers within a very short time in the rangeof that of the single server installation.

Technical Details

  • The Global Write Transaction Lock

The global write transaction lock mentioned above is such a determining factor,that it needs a little detailed attention.

It is obvious that in order to be able to create a consistent snapshot of theArangoDB world on a specific single server or cluster deployment, one muststop all transactional write operations at the next possible time or elseconsistency would no longer be given.

On the other hand it is also obvious, that there is no way for ArangoDB toknown, when that time will come. It might be there with the next attempt ananosecond away, but it could of course not come for the next 2 minutes.

ArangoDB tries to obtain that lock over and over again. On the single serverinstances these consecutive tries will not be noticeable. At some point thelock is obtained and the hot backup is created then within a very shortamount of time.

In clusters things are a little more complicated and noticeable.A Coordinator, which is trying to obtain the global write transactionlock must try to get local lockson all DBServers simultaneously; potentially succeeding on some and notsucceeding on others, leading to apparent dead times in the cluster’s writeoperations.

This process can happen multiple times until success is achieved.One has control over the length of the time during which the lock is tried tobe obtained each time prolonging the last wait time by 10%.

  • Agency Lock

Less of a variable, however equally important is to obtain a freeze on thecluster’s structure itself. This is done through the creation of a simple keylock in the cluster’s configuration to stop all ongoing background tasks,which are there to handle fail overs, shard movings, server removals etc.Its role is also to prevent multiple simultaneous hot backup operations.The acquisition of this key is predictably done within a matter of a few seconds.

  • Operation’s Time Scope

Once the global write transaction lock is obtained, everything goes very quickly.A new backup directory is created, the write ahead lock is flushed andhard links are made on file system level to all persistent files.The duration is not affected by the amount of data in ArangoDB and is nearinstantaneous.

  • Point in Time Recovery

One of the great advantages of the method is the consistent snapshot nature.It gives the operator of the database the ability to persist a true andcomplete time freeze at near zero impact on the ongoing operation.The recovery is easy and restores the entire ArangoDB installation to adesired snapshot.

Apart from the ability of creating such snapshots it offers a great and easyto use opportunity to experiment with ArangoDB with a means to protectagainst data loss or corruption.

  • Remote Upload and Download

We have fully integrated theRclone sync for cloud storage. Rclone is a veryversatile inter site sync facility, which opens up a vast field of transportprotocols and remote syncing APIs from Amazon’s S3 over Dropbox, WebDAV,all the way to the local file system and network storage.

One can use the upload and download functionalities to migrate entire clusterinstallations in this way, copy cluster and single server snapshots allover the world, create an intuitive and easy to use quick access safetybackbone of the data operation.

Rclone is open source and available under the MIT license, is battle testedand has garnered close to 15k stars on GitHub professing to the confidenceof lots of users.

Hot Backup Limitations

ArangoDB hot backups impose limitations with respect to storage engine,storage usage, upgrades, deployment scheme, etc. Please review the belowlist of limitations closely to conclude which operations it might or mightnot be suited for.

  • Global Scope

In order to be able to create hot backups instantaneously, they are createdon the file system level and thus well below any structural entity related todatabases, collections, indexes, users, etc.

As a consequence, a hot backup is a backup of the entire ArangoDB single serveror cluster. In other words, one cannot restore to an older hot backup of asingle collection or database. With every restore, one restores the entiredeployment including of course the _system database.

Note that this applies in particular in the case that a certain usermight have admin access for the _system database, but explicitly hasno access to certain collections. The backup will still extend acrossall collections!

It cannot be stressed enough that a restore to an earlier hot backupsnapshot will also revert users, graphs, Foxx apps - everything -back to that at the time of the hot backup.

  • Cluster’s Special Limitations

Creating hot backups can only be done while the internal structure of thecluster remains unaltered. The background of this limitation lies in thedistributed nature and the asynchronicity of creation, alteration anddropping of cluster databases, collections and indexes.

It must be ensured that for the hot backup no such changes are made to thecluster’s inventory, as this could lead to inconsistent hot backups.

  • Restoring from a Different Version

Hot backups share the same limitations with respect to different versionsas ArangoDB itself. This means that a hot backup created with some versiona.b.c can without any limitations be restored on any version a.b.d withd not equal to c, that is, the patch level can be changed arbitrarily.With respect to minor versions (second number, b), one can only upgradeand not downgrade. That is, a hot backup created with a version a.b.ccan be restored on a version a.d.e for d greater than b but not for dless than b. At this stage, we do not guarantee any compatibility betweenversions with a different major version number (first number).

  • Identical Topology

Unlike dumps created with arangodump and restoredwith arangorestore,hot backups can only be restored to the same type and structure of deployment.This means that one cannot restore a 3-node ArangoDB cluster’s hot backup toany other deployment than another 3-node ArangoDB cluster of the same version.

  • RocksDB Storage Engine Only

Hot backups rely on creation of hard links on actual RocksDB data files anddirectories. The same or according file system level mechanisms are notavailable to MMFiles deployments.

  • Storage Space

Without the creation of hot backups, RocksDB keeps compacting the file systemlevel files as the operation continues. Compacted files are subsequentlydeleted automatically. Every hot backup needs to hold on to thefiles as they were at the moment of the hot backup creation, thus preventingthe deletions and consequently growing the storage space of the ArangoDBdata directory. That growth of course depends on the amount of write operationsper time.

This is a crucial factor for sustained operation and might requiresignificantly higher storage reservation for ArangoDB instances involved anda much more fine grained monitoring of storage usage than before.

Also note that in a cluster each RocksDB instance will be backed upindividually and hence the overall storage space will be the sum of allRocksDB instances (i.e., data which is replicated between instances willnot be de-duplicated for performance reasons).

  • Global Transaction Lock

In order to be able to create consistent hot backups, it is mandatory to geta very brief global transaction lock across the entire installation.In single server deployments constant invocation of very long runningtransactions could prevent that from ever happening during a timeout period.The same holds true for clusters, where this lock must now be obtained on allDB-Servers at the same time.

Especially in the cluster the result of these successively longer tries toobtain the global transaction lock might become visible in periods of apparentdead time. Locks might be obtained on some machines and and not on others, sothat the process has to be retried over and over. Every unsuccessful try wouldthen lead to the release of all partial locks.

The arangobackup tool provides a —force option since ArangoDB v3.6.0that can be used to abort ongoing write transactions and thus to more quicklyobtain the global transaction lock.

At this stage, index creation constitutes a write transactions, which meansthat during index creation one cannot create a hot backup. We intend to liftthis limitation in a future version.

  • Services on Single Server

On a single server the installed Foxx microservices are not backed up and aretherefore also not restored. This is because in single server modethe service installation is done locally in the file system and does nottrack the information in the _apps collection.

In a cluster, the Coordinators will eventually restore the state of theservices from the _apps and _appbundles collections after a backup isrestored.

  • Encryption at Rest

Currently, the hot backup simply takes a snapshot of the database files.If one is using encryption at rest, then the backed up files will beencrypted, with the encryption key that was used in theinstance which created the backup.

Such an encrypted backup can only be restored to an instance using thesame encryption key.

  • Replication and Hot Backup

Hot backups are not automatically replicated between instances. This istrue for both the Active Failover setup with 2 (or more) single serversand for the Datacenter to Datacenter Replication between clusters.Simply take hot backups on all instances.

  • Known Issues

See the list of Known Issues in ArangoDB v3.6.