12. Debugging and performance issues
- When HAProxy is started with the "-d" option, it will stay in the foreground
- and will print one line per event, such as an incoming connection, the end of a
- connection, and for each request or response header line seen. This debug
- output is emitted before the contents are processed, so they don't consider the
- local modifications. The main use is to show the request and response without
- having to run a network sniffer. The output is less readable when multiple
- connections are handled in parallel, though the "debug2ansi" and "debug2html"
- scripts found in the examples/ directory definitely help here by coloring the
- output.
- If a request or response is rejected because HAProxy finds it is malformed, the
- best thing to do is to connect to the CLI and issue "show errors", which will
- report the last captured faulty request and response for each frontend and
- backend, with all the necessary information to indicate precisely the first
- character of the input stream that was rejected. This is sometimes needed to
- prove to customers or to developers that a bug is present in their code. In
- this case it is often possible to relax the checks (but still keep the
- captures) using "option accept-invalid-http-request" or its equivalent for
- responses coming from the server "option accept-invalid-http-response". Please
- see the configuration manual for more details.
Example :
> show errors
Total events captured on [13/Oct/2015:13:43:47.169] : 1
[13/Oct/2015:13:43:40.918] frontend HAProxyLocalStats (#2): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #0
src 127.0.0.1:51981, session #0, session flags 0x00000080
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00808002, out 0 bytes, total 31 bytes
pending 31 bytes, wrapping at 8040, error at position 13:
00000 GET /invalid request HTTP/1.1\r\n
- The output of "show info" on the CLI provides a number of useful information
- regarding the maximum connection rate ever reached, maximum SSL key rate ever
- reached, and in general all information which can help to explain temporary
- issues regarding CPU or memory usage. Example :
- > show info
- Name: HAProxy
- Version: 1.6-dev7-e32d18-17
- Release_date: 2015/10/12
- Nbproc: 1
- Process_num: 1
- Pid: 7949
- Uptime: 0d 0h02m39s
- Uptime_sec: 159
- Memmax_MB: 0
- Ulimit-n: 120032
- Maxsock: 120032
- Maxconn: 60000
- Hard_maxconn: 60000
- CurrConns: 0
- CumConns: 3
- CumReq: 3
- MaxSslConns: 0
- CurrSslConns: 0
- CumSslConns: 0
- Maxpipes: 0
- PipesUsed: 0
- PipesFree: 0
- ConnRate: 0
- ConnRateLimit: 0
- MaxConnRate: 1
- SessRate: 0
- SessRateLimit: 0
- MaxSessRate: 1
- SslRate: 0
- SslRateLimit: 0
- MaxSslRate: 0
- SslFrontendKeyRate: 0
- SslFrontendMaxKeyRate: 0
- SslFrontendSessionReuse_pct: 0
- SslBackendKeyRate: 0
- SslBackendMaxKeyRate: 0
- SslCacheLookups: 0
- SslCacheMisses: 0
- CompressBpsIn: 0
- CompressBpsOut: 0
- CompressBpsRateLim: 0
- ZlibMemUsage: 0
- MaxZlibMemUsage: 0
- Tasks: 5
- Run_queue: 1
- Idle_pct: 100
- node: wtap
- description:
- When an issue seems to randomly appear on a new version of HAProxy (eg: every
- second request is aborted, occasional crash, etc), it is worth trying to enable
- memory poisoning so that each call to malloc() is immediately followed by the
- filling of the memory area with a configurable byte. By default this byte is
- 0x50 (ASCII for 'P'), but any other byte can be used, including zero (which
- will have the same effect as a calloc() and which may make issues disappear).
- Memory poisoning is enabled on the command line using the "-dM" option. It
- slightly hurts performance and is not recommended for use in production. If
- an issue happens all the time with it or never happens when poisoning uses
- byte zero, it clearly means you've found a bug and you definitely need to
- report it. Otherwise if there's no clear change, the problem it is not related.
- When debugging some latency issues, it is important to use both strace and
- tcpdump on the local machine, and another tcpdump on the remote system. The
- reason for this is that there are delays everywhere in the processing chain and
- it is important to know which one is causing latency to know where to act. In
- practice, the local tcpdump will indicate when the input data come in. Strace
- will indicate when haproxy receives these data (using recv/recvfrom). Warning,
- openssl uses read()/write() syscalls instead of recv()/send(). Strace will also
- show when haproxy sends the data, and tcpdump will show when the system sends
- these data to the interface. Then the external tcpdump will show when the data
- sent are really received (since the local one only shows when the packets are
- queued). The benefit of sniffing on the local system is that strace and tcpdump
- will use the same reference clock. Strace should be used with "-tts200" to get
- complete timestamps and report large enough chunks of data to read them.
- Tcpdump should be used with "-nvvttSs0" to report full packets, real sequence
- numbers and complete timestamps.
- In practice, received data are almost always immediately received by haproxy
- (unless the machine has a saturated CPU or these data are invalid and not
- delivered). If these data are received but not sent, it generally is because
- the output buffer is saturated (ie: recipient doesn't consume the data fast
- enough). This can be confirmed by seeing that the polling doesn't notify of
- the ability to write on the output file descriptor for some time (it's often
- easier to spot in the strace output when the data finally leave and then roll
- back to see when the write event was notified). It generally matches an ACK
- received from the recipient, and detected by tcpdump. Once the data are sent,
- they may spend some time in the system doing nothing. Here again, the TCP
- congestion window may be limited and not allow these data to leave, waiting for
- an ACK to open the window. If the traffic is idle and the data take 40 ms or
- 200 ms to leave, it's a different issue (which is not an issue), it's the fact
- that the Nagle algorithm prevents empty packets from leaving immediately, in
- hope that they will be merged with subsequent data. HAProxy automatically
- disables Nagle in pure TCP mode and in tunnels. However it definitely remains
- enabled when forwarding an HTTP body (and this contributes to the performance
- improvement there by reducing the number of packets). Some HTTP non-compliant
- applications may be sensitive to the latency when delivering incomplete HTTP
- response messages. In this case you will have to enable "option http-no-delay"
- to disable Nagle in order to work around their design, keeping in mind that any
- other proxy in the chain may similarly be impacted. If tcpdump reports that data
- leave immediately but the other end doesn't see them quickly, it can mean there
- is a congested WAN link, a congested LAN with flow control enabled and
- preventing the data from leaving, or more commonly that HAProxy is in fact
- running in a virtual machine and that for whatever reason the hypervisor has
- decided that the data didn't need to be sent immediately. In virtualized
- environments, latency issues are almost always caused by the virtualization
- layer, so in order to save time, it's worth first comparing tcpdump in the VM
- and on the external components. Any difference has to be credited to the
- hypervisor and its accompanying drivers.
- When some TCP SACK segments are seen in tcpdump traces (using -vv), it always
- means that the side sending them has got the proof of a lost packet. While not
- seeing them doesn't mean there are no losses, seeing them definitely means the
- network is lossy. Losses are normal on a network, but at a rate where SACKs are
- not noticeable at the naked eye. If they appear a lot in the traces, it is
- worth investigating exactly what happens and where the packets are lost. HTTP
- doesn't cope well with TCP losses, which introduce huge latencies.
- The "netstat -i" command will report statistics per interface. An interface
- where the Rx-Ovr counter grows indicates that the system doesn't have enough
- resources to receive all incoming packets and that they're lost before being
- processed by the network driver. Rx-Drp indicates that some received packets
- were lost in the network stack because the application doesn't process them
- fast enough. This can happen during some attacks as well. Tx-Drp means that
- the output queues were full and packets had to be dropped. When using TCP it
- should be very rare, but will possibly indicate a saturated outgoing link.