Failover overview

Services in your mesh may become unhealthy or unreachable for many reasons, but you can mitigate some of the effects associated with infrastructure issues by configuring Consul to automatically route traffic to and from failover service instances. This topic provides an overview of the failover strategies you can implement with Consul.

Service failover strategies in Consul

There are several methods for implementing failover strategies between datacenters in Consul. You can adopt one of the following strategies based on your deployment configuration and network requirements:

  • Configure the Failover stanza in a service resolver configuration entry to explicitly define which services should failover and the targeting logic they should follow.
  • Make a prepared query for each service that you can use to automate geo-failover.
  • Create a sameness group to identify partitions with identical namespaces and service names to establish default failover targets.

The following table compares these strategies in deployments with multiple datacenters to help you determine the best approach for your service:

Failover StrategySupports WAN FederationSupports Cluster PeeringMulti-Datacenter Failover StrengthMulti-Datacenter Usage Scenario
Failover stanzaEnables more granular logic for failover targetingConfiguring failover for a single service or service subset, especially for testing or debugging purposes
Prepared queryCentral policies that can automatically target the nearest datacenterWAN-federated deployments where a primary datacenter is configured. Prepared queries are not replicated over peer connections.
Sameness groupsGroup size changes without edits to existing member configurationsCluster peering deployments with consistently named services and namespaces

Failover configurations for a service mesh with a single datacenter

You can implement a service resolver configuration entry and specify a pool of failover service instances that other services can exchange messages with when the primary service becomes unhealthy or unreachable. We recommend adopting this strategy as a minimum baseline when implementing Consul service mesh and layering additional failover strategies to build resilience into your application network.

Refer to the Failover configuration for examples of how to configure failover services in the service resolver configuration entry on both VMs and Kubernetes deployments.

Failover configuration for WAN-federated datacenters

If your network has multiple Consul datacenters that are WAN-federated, you can configure your applications to look for failover services with prepared queries. Prepared queries are configurations that enable you to define complex service discovery lookups. This strategy hinges on the secondary datacenter containing service instances that have the same name and residing in the same namespace as their counterparts in the primary datacenter.

Refer to the Automate geo-failover with prepared queries tutorial for additional information.

Failover configuration for peered clusters and partitions

In networks with multiple datacenters or partitions that share a peer connection, each datacenter or partition functions as an independent unit. As a result, Consul does not correlate services that have the same name, even if they are in the same namespace.

You can configure sameness groups for this type of network. Sameness groups allow you to define a group of admin partitions where identical services are deployed in identical namespaces. After you configure the sameness group, you can reference the SamenessGroup parameter in service resolver, exported service, and service intention configuration entries, enabling you to add or remove cluster peers from the group without making changes to every cluster peer every time.

Refer to Sameness groups usage page for more information.