Arangorestore Examples

Arangorestore Examples

To restore data from a dump previously created with Arangodump,ArangoDB provides the arangorestore tool.

In versions older than 3.3, Arangorestoremust not be used to create several similar database instances in one installation.

This means that if you have an Arangodump output of database A, create a second database Bon the same instance of ArangoDB, and restore the dump of A into B - data integrity can notbe guaranteed. This limitation was solved starting from ArangoDB v3.3.0.

Invoking Arangorestore

arangorestore can be invoked from the command-line as follows:

arangorestore --input-directory "dump"

This will connect to an ArangoDB server (tcp://127.0.0.1:8529 by default), then restore thecollection structure and the documents from the files found in the input directory dump.Note that the input directory must have been created by running arangodump before.

arangorestore will by default connect to the _system database using the defaultendpoint. To override the endpoint, or specify a different user, use one of thefollowing startup options:

—server.endpoint <string>: endpoint to connect to
—server.username <string>: username
—server.password <string>: password to use(omit this and you’ll be prompted for the password)
—server.authentication <bool>: whether or not to use authenticationIf you want to connect to a different database or dump all databases you can additionallyuse the following startup options:
—server.database <string>: name of the database to connect to.Defaults to the _system database.
—all-databases true: restore multiple databases from a dump which used the same option.Introduced in v3.5.0.Note that the specified user must have access to the database(s).

Since version 2.6 arangorestore provides the option —create-database. Setting thisoption to true will create the target database if it does not exist. When creating thetarget database, the username and passwords passed to arangorestore (in options—server.username and —server.password) will be used to create an initial user for thenew database.

The option —force-same-database allows restricting arangorestore operations to adatabase with the same name as in the source dump’s dump.json file. It can thus be usedto prevent restoring data into a “wrong” database by accident.

For example, if a dump was taken from database A, and the restore is attempted into database B, then with the —force-same-database option set to true, arangorestorewill abort instantly.

The —force-same-database option is set to false by default to ensure backwards-compatibility.

Here’s an example of reloading data to a non-standard endpoint, using a dedicateddatabase name:

arangorestore --server.endpoint tcp://192.168.173.13:8531 --server.username backup --server.database mydb --input-directory "dump"

To create the target database whe restoring, use a command like this:

arangorestore --server.username backup --server.database newdb --create-database true --input-directory "dump"

In contrast to the above calls, when working with multiple databases using —all-databases truethe parameter —server.database mydb must not be specified:

arangorestore --server.username backup --all-databases true --create-database true --input-directory "dump-multiple"

arangorestore will print out its progress while running, and will end with a lineshowing some aggregate statistics:

Processed 2 collection(s), read 2256 byte(s) from datafiles, sent 2 batch(es)

By default, arangorestore will re-create all non-system collections found in the inputdirectory and load data into them. If the target database already contains collectionswhich are also present in the input directory, the existing collections in the databasewill be dropped and re-created with the data found in the input directory.

The following parameters are available to adjust this behavior:

—create-collection <bool>: set to true to create collections in the targetdatabase. If the target database already contains a collection with the same name,it will be dropped first and then re-created with the properties found in the inputdirectory. Set to false to keep existing collections in the target database. Ifset to false and arangorestore encounters a collection that is present in theinput directory but not in the target database, it will abort. The default value is true.
—import-data <bool>: set to true to load document data into the collections inthe target database. Set to false to not load any document data. The default value is true.
—include-system-collections <bool>: whether or not to include system collectionswhen re-creating collections or reloading data. The default value is false.For example, to (re-)create all non-system collections and load document data into them, use:

arangorestore --create-collection true --import-data true --input-directory "dump"

This will drop potentially existing collections in the target database that are also presentin the input directory.

To include system collections too, use —include-system-collections true:

arangorestore --create-collection true --import-data true --include-system-collections true --input-directory "dump"

To (re-)create all non-system collections without loading document data, use:

arangorestore --create-collection true --import-data false --input-directory "dump"

This will also drop existing collections in the target database that are also present in theinput directory.

To just load document data into all non-system collections, use:

arangorestore --create-collection false --import-data true --input-directory "dump"

To restrict reloading to just specific collections, there is is the —collection option.It can be specified multiple times if required:

arangorestore --collection myusers --collection myvalues --input-directory "dump"

Collections will be processed by in alphabetical order by arangorestore, with all documentcollections being processed before all edge collections.This remains valid also when multiple threads are in use (from v3.4.0 on).

Note however that when restoring an edge collection no internal checks are made in order to validate thatthe documents that the edges connect exist or not. As a consequence, when restoring individual collectionswhich are part of a graph you are not required to restore in a specific order.

When restoring only a subset of collections of your database, and graphs are in use, you will needto make sure you are restoring all the needed collections (the ones that are part of the graph) asotherwise you might have edges pointing to non existing documents.

To restrict reloading to specific views, there is the —view option.Should you specify the —collection parameter views will not be restored unless you explicitlyspecify them via the —view option.

arangorestore --collection myusers --view myview --input-directory "dump"

In the case of an arangosearch view you must make sure that the linked collections are eitheralso restored or already present on the server.

Encryption

See Arangodump for details.

Reloading Data into a different Collection

arangorestore will restore document and edges data with the exact same key_, rev, from_and to values as found in the input directory.

With some creativity you can also use arangodump and arangorestore to transfer data from onecollection into another (either on the same server or not). For example, to copy data froma collection myvalues in database mydb into a collection mycopyvalues in database mycopy,you can start with the following command:

arangodump --collection myvalues --server.database mydb --output-directory "dump"

This will create two files, myvalues.structure.json and myvalues.data.json, in the output directory. To load data from the datafile into an existing collection mycopyvalues in database mycopy, rename the files to mycopyvalues.structure.json and mycopyvalues.data.json.

After that, run the following command:

arangorestore --collection mycopyvalues --server.database mycopy --input-directory "dump"

Restoring in a Cluster

From v2.1 on, the arangorestore tool supports sharding and can beused to restore data into a Cluster. Simply point it to one of theCoordinators in your Cluster and it will work as usual but on shardedcollections in the Cluster.

If arangorestore is asked to restore a collection, it will use the samenumber of shards, replication factor and shard keys as when the collectionwas dumped. The distribution of the shards to the servers will also be thesame as at the time of the dump, provided that the number of DBServers inthe cluster dumped from is identical to the number of DBServers in theto-be-restored-to cluster.

To modify the number of shards or the replication factor for all or justsome collections, arangorestore provides the options —number-of-shardsand —replication-factor (starting from v3.3.22 and v3.4.2). These optionscan be specified multiple times as well, in order to override the settingsfor dedicated collections, e.g.

arangorestore --number-of-shards 2 --number-of-shards mycollection=3 --number-of-shards test=4

The above will restore all collections except “mycollection” and “test” with2 shards. “mycollection” will have 3 shards when restored, and “test” willhave 4. It is possible to omit the default value and only usecollection-specific overrides. In this case, the number of shards for anycollections not overridden will be determined by looking into the“numberOfShards” values contained in the dump.

The —replication-factor options works in the same way, e.g.

arangorestore --replication-factor 2 --replication-factor mycollection=1

will set the replication factor to 2 for all collections but “mycollection”, which will get areplication factor of just 1.

The options —number-of-shards and replication-factor, as well as the deprecatedoptions —default-number-of-shards and —default-replication-factor, arenot applicable to system collections. They are managed by the server.

If a collection was dumped from a single instance and is then restored intoa cluster, the sharding will be done by the _key attribute by default. One canmanually edit the structural description for the shard keys in the dump files ifrequired (*.structure.json).

If you restore a collection that was dumped from a cluster into a singleArangoDB instance, the number of shards, replication factor and shard keys will silentlybe ignored.

Factors affecting speed of arangorestore in a Cluster

The following factors affect speed of arangorestore in a Cluster:

Replication Factor: the higher the replication factor, the moretime the restore will take. To speed up the restore you can restoreusing a replication factor of 1 and then increase it againafter the restore. This will reduce the number of network hops neededduring the restore.
Restore Parallelization: if the collections are not restored inparallel, the restore speed is highly affected. A parallel restore canbe done from v3.4.0 by using the —threads option of arangorestore.Before v3.4.0 it is possible to achieve parallelization by restoringon multiple Coordinators at the same time. Depending on your specificcase, parallelizing on multiple Coordinators can still be useful evenwhen the —threads option is in use (from v.3.4.0).

Please refer to the Fast Cluster Restore pagefor further operative details on how to take into account, when restoringusing arangorestore, the two factors described above.

Restoring collections with sharding prototypes

arangorestore will yield an error when trying to restore acollection whose shard distribution follows a collection which doesnot exist in the cluster and which was not dumped along:

arangorestore --collection clonedCollection --server.database mydb --input-directory "dump"
ERROR got error from server: HTTP 500 (Internal Server Error): ArangoError 1486: must not have a distributeShardsLike attribute pointing to an unknown collection
Processed 0 collection(s), read 0 byte(s) from datafiles, sent 0 batch(es)

The collection can be restored by overriding the error message asfollows:

arangorestore --collection clonedCollection --server.database mydb --input-directory "dump" --ignore-distribute-shards-like-errors

Restore into an authentication-enabled ArangoDB

Of course you can restore data into a password-protected ArangoDB as well.However this requires certain user rights for the user used in the restore process.The rights are described in detail in the Managing Users chapter.For restore this short overview is sufficient:

When importing into an existing database, the given user needs Administrateaccess on this database.
When creating a new database during restore, the given user needs Administrateaccess on _system. The user will be promoted with Administrate access on thenewly created database.