Active Failover Architecture

Active Failover Architecture

An Active Failover is defined as:

One ArangoDB Single-Server instance which is read / writable by clients called Leader
One or more ArangoDB Single-Server instances, which are passive and not writable called Followers, which asynchronously replicate data from the master
At least one Agency acting as a “witness” to determine which server becomes the leader_in a _failure situation

The advantage of the Active Failover compared to the traditional Master/Slavesetup is that there is an active third party, the Agency which observes and supervisesall involved server processes. Follower instances can rely on the Agency todetermine the correct Leader server. From an operational point of view, one advantage is thatthe failover, in case the Leader goes down, is automatic. An additional operationaladvantage is that there is no need to start a replication applier manually.

The Active Failover setup is made resilient by the fact that all the officialArangoDB drivers can automatically determine the correct leader server andredirect requests appropriately. Furthermore Foxx Services do also automaticallyperform a failover: should the leader instance fail (which is also the Foxxmaster)the newly elected leader will reinstall all Foxx services and resume executingqueued Foxx tasks.Database userswhich were created on the leader will also be valid on the newly elected leader(always depending on the condition that they were synced already).

Consider the case for two arangod instances. The two servers are connected viaserver wide (global) asynchronous replication. One of the servers iselected Leader, and the other one is made a Follower automatically. At startup,the two servers race for the leadership position. This happens through the agencylocking mechanism (which means that the Agency needs to be available at server start).You can control which server will become Leader by starting it earlier thanother server instances in the beginning.

The Follower will automatically start replication from the Leader for allavailable databases, using the server-level replication introduced in v. 3.3.

When the Leader goes down, this is automatically detected by the Agency_instance, which is also started in this mode. This instance will make theprevious follower stop its replication and make it the new _Leader.

The different instances participating in an Active Failover setup are supposedto be run in the same Data Center (DC), with a reliable high-speed networkconnection between all the machines participating in the Active Failover setup.

Multi-datacenter Active Failover setups are currently not supported.

A multi-datacenter solution currently supported is the Datacenter to Datacenter replication(DC2DC) among ArangoDB Clusters. See DC2DC chapter for details.

Operative Behavior

In contrast to the normal behavior of a single-server instance, the Active-Failovermode will change the behavior of ArangoDB in some situations.

The Follower will always deny write requests from client applications. Starting from ArangoDB 3.4read requests are only permitted if the requests is marked with the X-Arango-Allow-Dirty-Read: true header,otherwise they are denied too.Only the replication itself is allowed to access the follower’s data until thefollower becomes a new Leader (should a failover happen).

When sending a request to read or write data on a Follower, the Follower willrespond with HTTP 503 (Service unavailable) and provide the address ofthe current Leader. Client applications and drivers can use this information tothen make a follow-up request to the proper Leader:

HTTP/1.1 503 Service Unavailable
X-Arango-Endpoint: http://[::1]:8531
....

Client applications can also detect who the current Leader and the Followers_are by calling the /_api/cluster/endpoints REST API. This API is accessibleon _Leader and Followers alike.

Reading from Followers

Followers in the active-failover setup are in read-only mode. It is possible to read from thesefollowers by adding a X-Arango-Allow-Dirty-Read: true header on each request. Responses will then automaticallycontain the X-Arango-Potential-Dirty-Read: true header so that clients can reject accidental dirty reads.

Depending on the driver support for your specific programming language, you should be ableto enable this option.

Tooling Support

The tool ArangoDB Starter supports starting two servers with asynchronousreplication and failover out of the box.

The arangojs driver for JavaScript, the Go driver, the Java driver, ArangoJS andthe PHP driver support active failover in case the currently accessed server endpointresponds with HTTP 503.