11. Well-known traps to avoid

  1. Once in a while, someone reports that after a system reboot, the haproxy
  2. service wasn't started, and that once they start it by hand it works. Most
  3. often, these people are running a clustered IP address mechanism such as
  4. keepalived, to assign the service IP address to the master node only, and while
  5. it used to work when they used to bind haproxy to address 0.0.0.0, it stopped
  6. working after they bound it to the virtual IP address. What happens here is
  7. that when the service starts, the virtual IP address is not yet owned by the
  8. local node, so when HAProxy wants to bind to it, the system rejects this
  9. because it is not a local IP address. The fix doesn't consist in delaying the
  10. haproxy service startup (since it wouldn't stand a restart), but instead to
  11. properly configure the system to allow binding to non-local addresses. This is
  12. easily done on Linux by setting the net.ipv4.ip_nonlocal_bind sysctl to 1. This
  13. is also needed in order to transparently intercept the IP traffic that passes
  14. through HAProxy for a specific target address.
  15.  
  16. Multi-process configurations involving source port ranges may apparently seem
  17. to work but they will cause some random failures under high loads because more
  18. than one process may try to use the same source port to connect to the same
  19. server, which is not possible. The system will report an error and a retry will
  20. happen, picking another port. A high value in the "retries" parameter may hide
  21. the effect to a certain extent but this also comes with increased CPU usage and
  22. processing time. Logs will also report a certain number of retries. For this
  23. reason, port ranges should be avoided in multi-process configurations.
  24.  
  25. Since HAProxy uses SO_REUSEPORT and supports having multiple independent
  26. processes bound to the same IP:port, during troubleshooting it can happen that
  27. an old process was not stopped before a new one was started. This provides
  28. absurd test results which tend to indicate that any change to the configuration
  29. is ignored. The reason is that in fact even the new process is restarted with a
  30. new configuration, the old one also gets some incoming connections and
  31. processes them, returning unexpected results. When in doubt, just stop the new
  32. process and try again. If it still works, it very likely means that an old
  33. process remains alive and has to be stopped. Linux's "netstat -lntp" is of good
  34. help here.
  35.  
  36. When adding entries to an ACL from the command line (eg: when blacklisting a
  37. source address), it is important to keep in mind that these entries are not
  38. synchronized to the file and that if someone reloads the configuration, these
  39. updates will be lost. While this is often the desired effect (for blacklisting)
  40. it may not necessarily match expectations when the change was made as a fix for
  41. a problem. See the "add acl" action of the CLI interface.