Statistics

General

The cluster manager has a statistics tree rooted at cluster_manager. with the following statistics. Any : character in the stats name is replaced with _.

NameTypeDescription
cluster_addedCounterTotal clusters added (either via static config or CDS)
cluster_modifiedCounterTotal clusters modified (via CDS)
cluster_removedCounterTotal clusters removed (via CDS)
total_clustersGaugeNumber of currently loaded clusters

Every cluster has a statistics tree rooted at cluster.. with the following statistics:

NameTypeDescription
upstream_cx_totalCounterTotal connections
upstream_cx_activeGaugeTotal active connections
upstream_cx_http1_totalCounterTotal HTTP/1.1 connections
upstream_cx_http2_totalCounterTotal HTTP/2 connections
upstream_cx_connect_failCounterTotal connection failures
upstream_cx_connect_timeoutCounterTotal connection timeouts
upstream_cx_connect_attempts_exceededCounterTotal consecutive connection failures exceeding configured connection attempts
upstream_cx_overflowCounterTotal times that the cluster’s connection circuit breaker overflowed
upstream_cx_connect_msHistogramConnection establishment milliseconds
upstream_cx_length_msHistogramConnection length milliseconds
upstream_cx_destroyCounterTotal destroyed connections
upstream_cx_destroy_localCounterTotal connections destroyed locally
upstream_cx_destroy_remoteCounterTotal connections destroyed remotely
upstream_cx_destroy_with_active_rqCounterTotal connections destroyed with 1+ active request
upstream_cx_destroy_local_with_active_rqCounterTotal connections destroyed locally with 1+ active request
upstream_cx_destroy_remote_with_active_rqCounterTotal connections destroyed remotely with 1+ active request
upstream_cx_close_notifyCounterTotal connections closed via HTTP/1.1 connection close header or HTTP/2 GOAWAY
upstream_cx_rx_bytes_totalCounterTotal received connection bytes
upstream_cx_rx_bytes_bufferedGaugeReceived connection bytes currently buffered
upstream_cx_tx_bytes_totalCounterTotal sent connection bytes
upstream_cx_tx_bytes_bufferedGaugeSend connection bytes currently buffered
upstream_cx_protocol_errorCounterTotal connection protocol errors
upstream_cx_max_requestsCounterTotal connections closed due to maximum requests
upstream_cx_none_healthyCounterTotal times connection not established due to no healthy hosts
upstream_rq_totalCounterTotal requests
upstream_rq_activeGaugeTotal active requests
upstream_rq_pending_totalCounterTotal requests pending a connection pool connection
upstream_rq_pending_overflowCounterTotal requests that overflowed connection pool circuit breaking and were failed
upstream_rq_pending_failure_ejectCounterTotal requests that were failed due to a connection pool connection failure
upstream_rq_pending_activeGaugeTotal active requests pending a connection pool connection
upstream_rq_cancelledCounterTotal requests cancelled before obtaining a connection pool connection
upstream_rq_maintenance_modeCounterTotal requests that resulted in an immediate 503 due to maintenance mode
upstream_rq_timeoutCounterTotal requests that timed out waiting for a response
upstream_rq_per_try_timeoutCounterTotal requests that hit the per try timeout
upstream_rq_rx_resetCounterTotal requests that were reset remotely
upstream_rq_tx_resetCounterTotal requests that were reset locally
upstream_rq_retryCounterTotal request retries
upstream_rq_retry_successCounterTotal request retry successes
upstream_rq_retry_overflowCounterTotal requests not retried due to circuit breaking
upstream_flow_control_paused_reading_totalCounterTotal number of times flow control paused reading from upstream.
upstream_flow_control_resumed_reading_totalCounterTotal number of times flow control resumed reading from upstream.
upstream_flow_control_backed_up_totalCounterTotal number of times the upstream connection backed up and paused reads from downstream.
upstream_flow_control_drained_totalCounterTotal number of times the upstream connection drained and resumed reads from downstream.
membership_changeCounterTotal cluster membership changes
membership_healthyGaugeCurrent cluster healthy total (inclusive of both health checking and outlier detection)
membership_totalGaugeCurrent cluster membership total
retry_or_shadow_abandonedCounterTotal number of times shadowing or retry buffering was canceled due to buffer limits.
config_reloadCounterTotal API fetches that resulted in a config reload due to a different config
update_attemptCounterTotal cluster membership update attempts
update_successCounterTotal cluster membership update successes
update_failureCounterTotal cluster membership update failures
versionGaugeHash of the contents from the last successful API fetch
max_host_weightGaugeMaximum weight of any host in the cluster
bind_errorsCounterTotal errors binding the socket to the configured source address.

Health check statistics

If health check is configured, the cluster has an additional statistics tree rooted at cluster..health_check. with the following statistics:

NameTypeDescription
attemptCounterNumber of health checks
successCounterNumber of successful health checks
failureCounterNumber of immediately failed health checks (e.g. HTTP 503) as well as network failures
passive_failureCounterNumber of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail)
network_failureCounterNumber of health check failures due to network error
verify_clusterCounterNumber of health checks that attempted cluster name verification
healthyGaugeNumber of healthy members

Outlier detection statistics

If outlier detection is configured for a cluster, statistics will be rooted at cluster..outlier_detection. and contain the following:

NameTypeDescription
ejections_enforced_totalCounterNumber of enforced ejections due to any outlier type
ejections_activeGaugeNumber of currently ejected hosts
ejections_overflowCounterNumber of ejections aborted due to the max ejection %
ejections_enforced_consecutive_5xxCounterNumber of enforced consecutive 5xx ejections
ejections_detected_consecutive_5xxCounterNumber of detected consecutive 5xx ejections (even if unenforced)
ejections_enforced_success_rateCounterNumber of enforced success rate outlier ejections
ejections_detected_success_rateCounterNumber of detected success rate outlier ejections (even if unenforced)
ejections_enforced_consecutive_gateway_failureCounterNumber of enforced consecutive gateway failure ejections
ejections_detected_consecutive_gateway_failureCounterNumber of detected consecutive gateway failure ejections (even if unenforced)
ejections_totalCounterDeprecated. Number of ejections due to any outlier type (even if unenforced)
ejections_consecutive_5xxCounterDeprecated. Number of consecutive 5xx ejections (even if unenforced)

Dynamic HTTP statistics

If HTTP is used, dynamic HTTP response code statistics are also available. These are emitted by various internal systems as well as some filters such as the router filter and rate limit filter. They are rooted at cluster.. and contain the following statistics:

NameTypeDescription
upstreamrq<xx>CounterAggregate HTTP response codes (e.g., 2xx, 3xx, etc.)
upstreamrq<>CounterSpecific HTTP response codes (e.g., 201, 302, etc.)
upstreamrq_timeHistogramRequest time milliseconds
canary.upstream_rq<xx>CounterUpstream canary aggregate HTTP response codes
canary.upstreamrq<>CounterUpstream canary specific HTTP response codes
canary.upstreamrq_timeHistogramUpstream canary request time milliseconds
internal.upstream_rq<xx>CounterInternal origin aggregate HTTP response codes
internal.upstreamrq<>CounterInternal origin specific HTTP response codes
internal.upstreamrq_timeHistogramInternal origin request time milliseconds
external.upstream_rq<xx>CounterExternal origin aggregate HTTP response codes
external.upstreamrq<>CounterExternal origin specific HTTP response codes
external.upstream_rq_timeHistogramExternal origin request time milliseconds

Alternate tree dynamic HTTP statistics

If alternate tree statistics are configured, they will be present in the cluster... namespace. The statistics produced are the same as documented in the dynamic HTTP statistics section above.

Per service zone dynamic HTTP statistics

If the service zone is available for the local service (via --service-zone) and the upstream cluster, Envoy will track the following statistics in cluster..zone... namespace.

NameTypeDescription
upstreamrq<xx>CounterAggregate HTTP response codes (e.g., 2xx, 3xx, etc.)
upstreamrq<>CounterSpecific HTTP response codes (e.g., 201, 302, etc.)
upstream_rq_timeHistogramRequest time milliseconds

Load balancer statistics

Statistics for monitoring load balancer decisions. Stats are rooted at cluster.. and contain the following statistics:

NameTypeDescription
lb_healthy_panicCounterTotal requests load balanced with the load balancer in panic mode
lb_zone_cluster_too_smallCounterNo zone aware routing because of small upstream cluster size
lb_zone_routing_all_directlyCounterSending all requests directly to the same zone
lb_zone_routing_sampledCounterSending some requests to the same zone
lb_zone_routing_cross_zoneCounterZone aware routing mode but have to send cross zone
lb_local_cluster_not_okCounterLocal host set is not set or it is panic mode for local cluster
lb_zone_number_differsCounterNumber of zones in local and upstream cluster different

Load balancer subset statistics

Statistics for monitoring load balancer subset decisions. Stats are rooted at cluster.. and contain the following statistics:

NameTypeDescription
lb_subsets_activeGaugeNumber of currently available subsets.
lb_subsets_createdCounterNumber of subsets created.
lb_subsets_removedCounterNumber of subsets removed due to no hosts.
lb_subsets_selectedCounterNumber of times any subset was selected for load balancing.
lb_subsets_fallbackCounterNumber of times the fallback policy was invoked.