Configuring Retries

Configuring Retries

In order for Linkerd to do automatic retries of failures, there are twoquestions that need to be answered:

Which requests should be retried?
How many times should the requests be retried?

Both of these questions can be answered by specifying a bit of extra informationin the service profile for the service you'resending requests to.

The reason why these pieces of configuration are required is because retries canpotentially be dangerous. Automatically retrying a request that changes state(e.g. a request that submits a financial transaction) could potentially impactyour user's experience negatively. In addition, retries increase the load onyour system. A set of services that have requests being constantly retriedcould potentially get taken down by the retries instead of being allowed timeto recover.

Check out the retries section of the books demo for atutorial of how to configure retries.

Retries

For routes that are idempotent and don't have bodies, you can edit the serviceprofile and add isRetryable to the retryable route:

spec:
  routes:
  - name: GET /api/annotations
    condition:
      method: GET
      pathRegex: /api/annotations
    isRetryable: true ### ADD THIS LINE ###

Retry Budgets

A retry budget is a mechanism that limits the number of retries that can beperformed against a service as a percentage of original requests. Thisprevents retries from overwhelming your system. By default, retries may add atmost an additional 20% to the request load (plus an additional 10 “free”retries per second). These settings can be adjusted by setting a retryBudgeton your service profile.

spec:
  retryBudget:
    retryRatio: 0.2
    minRetriesPerSecond: 10
    ttl: 10s

Monitoring Retries

Retries can be monitored by using the linkerd routes command with the —toflag and the -o wide flag. Since retries are performed on the client-side,we need to use the —to flag to see metrics for requests that one resourceis sending to another (from the server's point of view, retries are justregular requests). When both of these flags are specified, the linkerd routescommand will differentiate between “effective” and “actual” traffic.

ROUTE                       SERVICE   EFFECTIVE_SUCCESS   EFFECTIVE_RPS   ACTUAL_SUCCESS   ACTUAL_RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
HEAD /authors/{id}.json     authors             100.00%          2.8rps           58.45%       4.7rps           7ms          25ms          37ms
[DEFAULT]                   authors               0.00%          0.0rps            0.00%       0.0rps           0ms           0ms           0ms

Actual requests represent all requests that the client actually sends, includingoriginal requests and retries. Effective requests only count the originalrequests. Since an original request may trigger one or more retries, the actualrequest volume is usually higher than the effective request volume when retriesare enabled. Since an original request may fail the first time, but a retry ofthat request might succeed, the effective success rate is usually (but notalways) higher than theactual success rate.