Clustering / High Availability on Docker Swarm with Consul

This guide explains how to use Traefik in high availability mode in a Docker Swarm and with Let's Encrypt.

Why do we need Traefik in cluster mode? Running multiple instances should work out of the box?

If you want to use Let's Encrypt with Traefik, sharing configuration or TLS certificates between many Traefik instances, you need Traefik cluster/HA.

Ok, could we mount a shared volume used by all my instances? Yes, you can, but it will not work. When you use Let's Encrypt, you need to store certificates, but not only. When Traefik generates a new certificate, it configures a challenge and once Let's Encrypt will verify the ownership of the domain, it will ping back the challenge. If the challenge is not known by other Traefik instances, the validation will fail.

For more information about the challenge: Automatic Certificate Management Environment (ACME)

Prerequisites

You will need a working Docker Swarm cluster.

Traefik configuration

In this guide, we will not use a TOML configuration file, but only command line flag. With that, we can use the base image without mounting configuration file or building custom image.

What Traefik should do:

  • Listen to 80 and 443
  • Redirect HTTP traffic to HTTPS
  • Generate SSL certificate when a domain is added
  • Listen to Docker Swarm event

EntryPoints configuration

TL;DR:

  1. $ traefik \
  2. --entrypoints='Name:http Address::80 Redirect.EntryPoint:https' \
  3. --entrypoints='Name:https Address::443 TLS' \
  4. --defaultentrypoints=http,https

To listen to different ports, we need to create an entry point for each.

The CLI syntax is --entrypoints='Name:a_name Address:an_ip_or_empty:a_port options'. If you want to redirect traffic from one entry point to another, it's the option Redirect.EntryPoint:entrypoint_name.

By default, we don't want to configure all our services to listen on http and https, we add a default entry point configuration: --defaultentrypoints=http,https.

Let's Encrypt configuration

TL;DR:

  1. $ traefik \
  2. --acme \
  3. --acme.storage=/etc/traefik/acme/acme.json \
  4. --acme.entryPoint=https \
  5. --acme.httpChallenge.entryPoint=http \
  6. [email protected]

Let's Encrypt needs 4 parameters: an TLS entry point to listen to, a non-TLS entry point to allow HTTP challenges, a storage for certificates, and an email for the registration.

To enable Let's Encrypt support, you need to add --acme flag.

Now, Traefik needs to know where to store the certificates, we can choose between a key in a Key-Value store, or a file path: --acme.storage=my/key or --acme.storage=/path/to/acme.json.

The acme.httpChallenge.entryPoint flag enables the HTTP-01 challenge and specifies the entryPoint to use during the challenges.

For your email and the entry point, it's --acme.entryPoint and --acme.email flags.

Docker configuration

TL;DR:

  1. $ traefik \
  2. --docker \
  3. --docker.swarmMode \
  4. --docker.domain=mydomain.ca \
  5. --docker.watch

To enable docker and swarm-mode support, you need to add --docker and --docker.swarmMode flags. To watch docker events, add --docker.watch.

Full docker-compose file

  1. version: "3"
  2. services:
  3. traefik:
  4. image: traefik:<stable v1.7 from https://hub.docker.com/_/traefik>
  5. command:
  6. - "--api"
  7. - "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
  8. - "--entrypoints=Name:https Address::443 TLS"
  9. - "--defaultentrypoints=http,https"
  10. - "--acme"
  11. - "--acme.storage=/etc/traefik/acme/acme.json"
  12. - "--acme.entryPoint=https"
  13. - "--acme.httpChallenge.entryPoint=http"
  14. - "--acme.onHostRule=true"
  15. - "--acme.onDemand=false"
  16. - "[email protected]"
  17. - "--docker"
  18. - "--docker.swarmMode"
  19. - "--docker.domain=mydomain.ca"
  20. - "--docker.watch"
  21. volumes:
  22. - /var/run/docker.sock:/var/run/docker.sock
  23. networks:
  24. - webgateway
  25. - traefik
  26. ports:
  27. - target: 80
  28. published: 80
  29. mode: host
  30. - target: 443
  31. published: 443
  32. mode: host
  33. - target: 8080
  34. published: 8080
  35. mode: host
  36. deploy:
  37. mode: global
  38. placement:
  39. constraints:
  40. - node.role == manager
  41. update_config:
  42. parallelism: 1
  43. delay: 10s
  44. restart_policy:
  45. condition: on-failure
  46. networks:
  47. webgateway:
  48. driver: overlay
  49. external: true
  50. traefik:
  51. driver: overlay

Migrate configuration to Consul

We created a special Traefik command to help configuring your Key Value store from a Traefik TOML configuration file and/or CLI flags.

Deploy a Traefik cluster

The best way we found is to have an initializer service. This service will push the config to Consul via the storeconfig sub-command.

This service will retry until finishing without error because Consul may not be ready when the service tries to push the configuration.

The initializer in a docker-compose file will be:

  1. traefik_init:
  2. image: traefik:<stable v1.7 from https://hub.docker.com/_/traefik>
  3. command:
  4. - "storeconfig"
  5. - "--api"
  6. [...]
  7. - "--consul"
  8. - "--consul.endpoint=consul:8500"
  9. - "--consul.prefix=traefik"
  10. networks:
  11. - traefik
  12. deploy:
  13. restart_policy:
  14. condition: on-failure
  15. depends_on:
  16. - consul

And now, the Traefik part will only have the Consul configuration.

  1. traefik:
  2. image: traefik:<stable v1.7 from https://hub.docker.com/_/traefik>
  3. depends_on:
  4. - traefik_init
  5. - consul
  6. command:
  7. - "--consul"
  8. - "--consul.endpoint=consul:8500"
  9. - "--consul.prefix=traefik"
  10. [...]

Note

For Traefik <1.5.0 add acme.storage=traefik/acme/account because Traefik is not reading it from Consul.

If you have some update to do, update the initializer service and re-deploy it. The new configuration will be stored in Consul, and you need to restart the Traefik node: docker service update --force traefik_traefik.

Full docker-compose file

  1. version: "3.4"
  2. services:
  3. traefik_init:
  4. image: traefik:<stable v1.7 from https://hub.docker.com/_/traefik>
  5. command:
  6. - "storeconfig"
  7. - "--api"
  8. - "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
  9. - "--entrypoints=Name:https Address::443 TLS"
  10. - "--defaultentrypoints=http,https"
  11. - "--acme"
  12. - "--acme.storage=traefik/acme/account"
  13. - "--acme.entryPoint=https"
  14. - "--acme.httpChallenge.entryPoint=http"
  15. - "--acme.onHostRule=true"
  16. - "--acme.onDemand=false"
  17. - "[email protected]"
  18. - "--docker"
  19. - "--docker.swarmMode"
  20. - "--docker.domain=example.com"
  21. - "--docker.watch"
  22. - "--consul"
  23. - "--consul.endpoint=consul:8500"
  24. - "--consul.prefix=traefik"
  25. networks:
  26. - traefik
  27. deploy:
  28. restart_policy:
  29. condition: on-failure
  30. depends_on:
  31. - consul
  32. traefik:
  33. image: traefik:<stable v1.7 from https://hub.docker.com/_/traefik>
  34. depends_on:
  35. - traefik_init
  36. - consul
  37. command:
  38. - "--consul"
  39. - "--consul.endpoint=consul:8500"
  40. - "--consul.prefix=traefik"
  41. volumes:
  42. - /var/run/docker.sock:/var/run/docker.sock
  43. networks:
  44. - webgateway
  45. - traefik
  46. ports:
  47. - target: 80
  48. published: 80
  49. mode: host
  50. - target: 443
  51. published: 443
  52. mode: host
  53. - target: 8080
  54. published: 8080
  55. mode: host
  56. deploy:
  57. mode: global
  58. placement:
  59. constraints:
  60. - node.role == manager
  61. update_config:
  62. parallelism: 1
  63. delay: 10s
  64. restart_policy:
  65. condition: on-failure
  66. consul:
  67. image: consul
  68. command: agent -server -bootstrap-expect=1
  69. volumes:
  70. - consul-data:/consul/data
  71. environment:
  72. - CONSUL_LOCAL_CONFIG={"datacenter":"us_east2","server":true}
  73. - CONSUL_BIND_INTERFACE=eth0
  74. - CONSUL_CLIENT_INTERFACE=eth0
  75. deploy:
  76. replicas: 1
  77. placement:
  78. constraints:
  79. - node.role == manager
  80. restart_policy:
  81. condition: on-failure
  82. networks:
  83. - traefik
  84. networks:
  85. webgateway:
  86. driver: overlay
  87. external: true
  88. traefik:
  89. driver: overlay
  90. volumes:
  91. consul-data:
  92. driver: [not local]