Overview

Configure Dapr retries, timeouts, and circuit breakers

Note

Resiliency is currently a preview feature. Before you can utilize a resiliency spec, you must first enable the resiliency preview feature.

Distributed applications are commonly comprised of many microservices, with dozens, even hundreds, of instances for any given application. With so many microservices, the likelihood of a system failure increases. For example, an instance can fail or be unresponsive due to hardware, an overwhelming number of requests, application restarts/scale outs, or several other reasons. These events can cause a network call between services to fail. Designing and implementing your application with fault tolerance, the ability to detect, mitigate, and respond to failures, allows your application to recover to a functioning state and become self healing.

Dapr provides a capability for defining and applying fault tolerance resiliency policies via a resiliency spec. Resiliency specs are saved in the same location as components specs and are applied when the Dapr sidecar starts. The sidecar determines how to apply resiliency policies to your Dapr API calls. In self-hosted mode, the resiliency spec must be named resiliency.yaml. In Kubernetes Dapr finds the named resiliency specs used by your application. Within the resiliency spec, you can define policies for popular resiliency patterns, such as:

Policies can then be applied to targets, which include:

Additionally, resiliency policies can be scoped to specific apps.

Below is the general structure of a resiliency policy:

  1. apiVersion: dapr.io/v1alpha1
  2. kind: Resiliency
  3. metadata:
  4. name: myresiliency
  5. scopes:
  6. # optionally scope the policy to specific apps
  7. spec:
  8. policies:
  9. timeouts:
  10. # timeout policy definitions
  11. retries:
  12. # retry policy definitions
  13. circuitBreakers:
  14. # circuit breaker policy definitions
  15. targets:
  16. apps:
  17. # apps and their applied policies here
  18. actors:
  19. # actor types and their applied policies here
  20. components:
  21. # components and their applied policies here

Complete example policy

  1. apiVersion: dapr.io/v1alpha1
  2. kind: Resiliency
  3. metadata:
  4. name: myresiliency
  5. # similar to subscription and configuration specs, scopes lists the Dapr App IDs that this
  6. # resiliency spec can be used by.
  7. scopes:
  8. - app1
  9. - app2
  10. spec:
  11. # policies is where timeouts, retries and circuit breaker policies are defined.
  12. # each is given a name so they can be referred to from the targets section in the resiliency spec.
  13. policies:
  14. # timeouts are simple named durations.
  15. timeouts:
  16. general: 5s
  17. important: 60s
  18. largeResponse: 10s
  19. # retries are named templates for retry configurations and are instantiated for life of the operation.
  20. retries:
  21. pubsubRetry:
  22. policy: constant
  23. duration: 5s
  24. maxRetries: 10
  25. retryForever:
  26. policy: exponential
  27. maxInterval: 15s
  28. maxRetries: -1 # retry indefinitely
  29. important:
  30. policy: constant
  31. duration: 5s
  32. maxRetries: 30
  33. someOperation:
  34. policy: exponential
  35. maxInterval: 15s
  36. largeResponse:
  37. policy: constant
  38. duration: 5s
  39. maxRetries: 3
  40. # circuit breakers are automatically instantiated per component and app instance.
  41. # circuit breakers maintain counters that live as long as the Dapr sidecar is running. They are not persisted.
  42. circuitBreakers:
  43. simpleCB:
  44. maxRequests: 1
  45. timeout: 30s
  46. trip: consecutiveFailures >= 5
  47. pubsubCB:
  48. maxRequests: 1
  49. interval: 8s
  50. timeout: 45s
  51. trip: consecutiveFailures > 8
  52. # targets are what named policies are applied to. Dapr supports 3 target types - apps, components and actors
  53. targets:
  54. apps:
  55. appB:
  56. timeout: general
  57. retry: important
  58. # circuit breakers for services are scoped app instance.
  59. # when a breaker is tripped, that route is removed from load balancing for the configured `timeout` duration.
  60. circuitBreaker: simpleCB
  61. actors:
  62. myActorType: # custom Actor Type Name
  63. timeout: general
  64. retry: important
  65. # circuit breakers for actors are scoped by type, id, or both.
  66. # when a breaker is tripped, that type or id is removed from the placement table for the configured `timeout` duration.
  67. circuitBreaker: simpleCB
  68. circuitBreakerScope: both ##
  69. circuitBreakerCacheSize: 5000
  70. components:
  71. # for state stores, policies apply to saving and retrieving state.
  72. statestore1: # any component name -- happens to be a state store here
  73. outbound:
  74. timeout: general
  75. retry: retryForever
  76. # circuit breakers for components are scoped per component configuration/instance. For example myRediscomponent.
  77. # when this breaker is tripped, all interaction to that component is prevented for the configured `timeout` duration.
  78. circuitBreaker: simpleCB
  79. pubsub1: # any component name -- happens to be a pubsub broker here
  80. outbound:
  81. retry: pubsubRetry
  82. circuitBreaker: pubsubCB
  83. pubsub2: # any component name -- happens to be another pubsub broker here
  84. outbound:
  85. retry: pubsubRetry
  86. circuitBreaker: pubsubCB
  87. inbound: # inbound only applies to delivery from sidecar to app
  88. timeout: general
  89. retry: important
  90. circuitBreaker: pubsubCB

Last modified April 6, 2022: Fix typo (c0569bb4)