Configuring concurrency

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.

For per-revision concurrency, you must configure both autoscaling.knative.dev/metricand autoscaling.knative.dev/target for a soft limit, or containerConcurrency for a hard limit.

For global concurrency, you can set the container-concurrency-target-default value.

Soft versus hard concurrency limits

It is possible to set either a soft or hard concurrency limit.

NOTE: If both a soft and a hard limit are specified, the smaller of the two values will be used. This prevents the Autoscaler from having a target value that is not permitted by the hard limit value.

The soft limit is a targeted limit rather than a strictly enforced bound. In some situations, particularly if there is a sudden burst of requests, this value can be exceeded.

The hard limit is an enforced upper bound. If concurrency reaches the hard limit, surplus requests will be buffered and must wait until enough capacity is free to execute the requests.

IMPORTANT: Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low hard limit specified may have a negative impact on the throughput and latency of an application, and may cause additional cold starts.

Soft limit

  • Global key: container-concurrency-target-default
  • Per-revision annotation key: autoscaling.knative.dev/target
  • Possible values: An integer.
  • Default: "100"

Example:

  1. apiVersion: serving.knative.dev/v1
  2. kind: Service
  3. metadata:
  4. name: helloworld-go
  5. namespace: default
  6. spec:
  7. template:
  8. metadata:
  9. annotations:
  10. autoscaling.knative.dev/target: "200"
  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: config-autoscaler
  5. namespace: knative-serving
  6. data:
  7. container-concurrency-target-default: "200"
  1. apiVersion: operator.knative.dev/v1alpha1
  2. kind: KnativeServing
  3. metadata:
  4. name: knative-serving
  5. spec:
  6. config:
  7. autoscaler:
  8. container-concurrency-target-default: "200"

Hard limit

The hard limit is specified per Revision using the containerConcurrency field on the Revision spec. This setting is not an annotation.

There is no global setting for the hard limit in the autoscaling ConfigMap, because containerConcurrency has implications outside of autoscaling, such as on buffering and queuing of requests. However, a default value can be set for the Revision’s containerConcurrency field in config-defaults.yaml.

  • The default value is 0, meaning that there is no limit on the number of requests that are allowed to flow into the revision.

  • A value greater than 0 specifies the exact number of requests that are allowed to flow to the replica at any one time.

  • Global key: container-concurrency (in config-defaults.yaml)

  • Per-revision spec key: containerConcurrency

  • Possible values: integer

  • Default: 0, meaning no limit

Example:

  1. apiVersion: serving.knative.dev/v1
  2. kind: Service
  3. metadata:
  4. name: helloworld-go
  5. namespace: default
  6. spec:
  7. template:
  8. spec:
  9. containerConcurrency: 50
  1. apiVersion: v1
  2. kind: ConfigMap
  3. metadata:
  4. name: config-defaults
  5. namespace: knative-serving
  6. data:
  7. container-concurrency: "50"
  1. apiVersion: operator.knative.dev/v1alpha1
  2. kind: KnativeServing
  3. metadata:
  4. name: knative-serving
  5. spec:
  6. config:
  7. defaults:
  8. container-concurrency: "50"