基于 Effective HPA 实现自定义指标的智能弹性实践

Effective HPA 的最佳实践.

Kubernetes HPA 支持了丰富的弹性扩展能力,Kubernetes 平台开发者部署服务实现自定义 Metric 的服务,Kubernetes 用户配置多项内置的资源指标或者自定义 Metric 指标实现自定义水平弹性。 Effective HPA 兼容社区的 Kubernetes HPA 的能力,提供了更智能的弹性策略,比如基于预测的弹性和基于 Cron 周期的弹性等。 Prometheus 是当下流行的开源监控系统,通过 Prometheus 可以获取到用户的自定义指标配置。

本文将通过一个例子介绍了如何基于 Effective HPA 实现自定义指标的智能弹性。部分配置来自于 官方文档

部署环境要求

  • Kubernetes 1.18+
  • Helm 3.1.0
  • Crane v0.6.0+
  • Prometheus

参考 安裝文档 在集群中安装 Crane,Prometheus 可以使用安装文档中的也可以是已部署的 Prometheus。

环境搭建

安装 PrometheusAdapter

Crane 组件 Metric-Adapter 和 PrometheusAdapter 都基于 custom-metric-apiserver 实现了 Custom Metric 和 External Metric 的 ApiService。在安装 Crane 时会将对应的 ApiService 安装为 Crane 的 Metric-Adapter,因此安装 PrometheusAdapter 前需要删除 ApiService 以确保 Helm 安装成功。

  1. # 查看当前集群 ApiService
  2. kubectl get apiservice

因为安装了 Crane, 结果如下:

  1. NAME SERVICE AVAILABLE AGE
  2. v1beta1.batch Local True 35d
  3. v1beta1.custom.metrics.k8s.io crane-system/metric-adapter True 18d
  4. v1beta1.discovery.k8s.io Local True 35d
  5. v1beta1.events.k8s.io Local True 35d
  6. v1beta1.external.metrics.k8s.io crane-system/metric-adapter True 18d
  7. v1beta1.flowcontrol.apiserver.k8s.io Local True 35d
  8. v1beta1.metrics.k8s.io kube-system/metrics-service True 35d

删除 crane 安装的 ApiService

  1. kubectl delete apiservice v1beta1.custom.metrics.k8s.io
  2. kubectl delete apiservice v1beta1.external.metrics.k8s.io

通过 Helm 安装 PrometheusAdapter

  1. helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
  2. helm repo update
  3. helm install prometheus-adapter -n crane-system prometheus-community/prometheus-adapter

再将 ApiService 改回 Crane 的 Metric-Adapter

  1. kubectl apply -f https://raw.githubusercontent.com/gocrane/crane/main/deploy/metric-adapter/apiservice.yaml

配置 Metric-Adapter 开启 RemoteAdapter 功能

在安装 PrometheusAdapter 时没有将 ApiService 指向 PrometheusAdapter,因此为了让 PrometheusAdapter 也可以提供自定义 Metric,通过 Crane Metric Adapter 的 RemoteAdapter 功能将请求转发给 PrometheusAdapter。

修改 Metric-Adapter 的配置,将 PrometheusAdapter 的 Service 配置成 Crane Metric Adapter 的 RemoteAdapter

  1. # 查看当前集群 ApiService
  2. kubectl edit deploy metric-adapter -n crane-system

根据 PrometheusAdapter 的配置做以下修改:

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: metric-adapter
  5. namespace: crane-system
  6. spec:
  7. template:
  8. spec:
  9. containers:
  10. - args:
  11. #添加外部 Adapter 配置
  12. - --remote-adapter=true
  13. - --remote-adapter-service-namespace=crane-system
  14. - --remote-adapter-service-name=prometheus-adapter
  15. - --remote-adapter-service-port=443

RemoteAdapter 能力

基于 Effective HPA 实现自定义指标的智能弹性实践 - 图1

Kubernetes 限制一个 ApiService 只能配置一个后端服务,因此,为了在一个集群内使用 Crane 提供的 Metric 和 PrometheusAdapter 提供的 Metric,Crane 支持了 RemoteAdapter 解决此问题

  • Crane Metric-Adapter 支持配置一个 Kubernetes Service 作为一个远程 Adapter
  • Crane Metric-Adapter 处理请求时会先检查是否是 Crane 提供的 Local Metric,如果不是,则转发给远程 Adapter

运行例子

准备应用

将以下应用部署到集群中,应用暴露了 Metric 展示每秒收到的 http 请求数量。

sample-app.deploy.yaml

  1. apiVersion: apps/v1
  2. kind: Deployment
  3. metadata:
  4. name: sample-app
  5. labels:
  6. app: sample-app
  7. spec:
  8. replicas: 1
  9. selector:
  10. matchLabels:
  11. app: sample-app
  12. template:
  13. metadata:
  14. labels:
  15. app: sample-app
  16. spec:
  17. containers:
  18. - image: luxas/autoscale-demo:v0.1.2
  19. name: metrics-provider
  20. resources:
  21. limits:
  22. cpu: 500m
  23. requests:
  24. cpu: 200m
  25. ports:
  26. - name: http
  27. containerPort: 8080

sample-app.service.yaml

  1. apiVersion: v1
  2. kind: Service
  3. metadata:
  4. labels:
  5. app: sample-app
  6. name: sample-app
  7. spec:
  8. ports:
  9. - name: http
  10. port: 80
  11. protocol: TCP
  12. targetPort: 8080
  13. selector:
  14. app: sample-app
  15. type: ClusterIP
  1. kubectl create -f sample-app.deploy.yaml
  2. kubectl create -f sample-app.service.yaml

当应用部署完成后,您可以通过命令检查 http_requests_total Metric:

  1. curl http://$(kubectl get service sample-app -o jsonpath='{ .spec.clusterIP }')/metrics

配置采集规则

配置 Prometheus 的 ScrapeConfig,收集应用的 Metric: http_requests_total

  1. kubectl edit configmap -n crane-system prometheus-server

添加以下配置

  1. - job_name: sample-app
  2. kubernetes_sd_configs:
  3. - role: pod
  4. relabel_configs:
  5. - action: keep
  6. regex: default;sample-app-(.+)
  7. source_labels:
  8. - __meta_kubernetes_namespace
  9. - __meta_kubernetes_pod_name
  10. - action: labelmap
  11. regex: __meta_kubernetes_pod_label_(.+)
  12. - action: replace
  13. source_labels:
  14. - __meta_kubernetes_namespace
  15. target_label: namespace
  16. - source_labels: [__meta_kubernetes_pod_name]
  17. action: replace
  18. target_label: pod

此时,您可以在 Prometheus 查询 psql:sum(rate(http_requests_total[5m])) by (pod)

验证 PrometheusAdapter

PrometheusAdapter 默认的 Rule 配置支持将 http_requests_total 转换成 Pods 类型的 Custom Metric,通过命令验证:

  1. kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq .

结果应包括 pods/http_requests:

  1. {
  2. "name": "pods/http_requests",
  3. "singularName": "",
  4. "namespaced": true,
  5. "kind": "MetricValueList",
  6. "verbs": [
  7. "get"
  8. ]
  9. }

这表明现在可以通过 Pod Metric 配置 HPA。

配置弹性

现在我们可以创建 Effective HPA。此时 Effective HPA 可以通过 Pod Metric http_requests 进行弹性:

如何定义一个自定义指标开启预测功能

在 Effective HPA 的 Annotation 按以下规则添加配置:

  1. annotations:
  2. # metric-query.autoscaling.crane.io 是固定的前缀,后面是 Metric 名字,需跟 spec.metrics 中的 Metric.name 相同,支持 Pods 类型和 External 类型
  3. metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"

sample-app-hpa.yaml

  1. apiVersion: autoscaling.crane.io/v1alpha1
  2. kind: EffectiveHorizontalPodAutoscaler
  3. metadata:
  4. name: php-apache
  5. annotations:
  6. # metric-query.autoscaling.crane.io 是固定的前缀,后面是 Metric 名字,需跟 spec.metrics 中的 Metric.name 相同,支持 Pods 类型和 External 类型
  7. metric-query.autoscaling.crane.io/http_requests: "sum(rate(http_requests_total[5m])) by (pod)"
  8. spec:
  9. # ScaleTargetRef is the reference to the workload that should be scaled.
  10. scaleTargetRef:
  11. apiVersion: apps/v1
  12. kind: Deployment
  13. name: sample-app
  14. minReplicas: 1 # MinReplicas is the lower limit replicas to the scale target which the autoscaler can scale down to.
  15. maxReplicas: 10 # MaxReplicas is the upper limit replicas to the scale target which the autoscaler can scale up to.
  16. scaleStrategy: Auto # ScaleStrategy indicate the strategy to scaling target, value can be "Auto" and "Manual".
  17. # Metrics contains the specifications for which to use to calculate the desired replica count.
  18. metrics:
  19. - type: Resource
  20. resource:
  21. name: cpu
  22. target:
  23. type: Utilization
  24. averageUtilization: 50
  25. - type: Pods
  26. pods:
  27. metric:
  28. name: http_requests
  29. target:
  30. type: AverageValue
  31. averageValue: 500m
  32. # Prediction defines configurations for predict resources.
  33. # If unspecified, defaults don't enable prediction.
  34. prediction:
  35. predictionWindowSeconds: 3600 # PredictionWindowSeconds is the time window to predict metrics in the future.
  36. predictionAlgorithm:
  37. algorithmType: dsp
  38. dsp:
  39. sampleInterval: "60s"
  40. historyLength: "7d"
  1. kubectl create -f sample-app-hpa.yaml

查看 TimeSeriesPrediction 状态,如果应用运行时间较短,可能会无法预测:

  1. apiVersion: prediction.crane.io/v1alpha1
  2. kind: TimeSeriesPrediction
  3. metadata:
  4. creationTimestamp: "2022-07-11T16:10:09Z"
  5. generation: 1
  6. labels:
  7. app.kubernetes.io/managed-by: effective-hpa-controller
  8. app.kubernetes.io/name: ehpa-php-apache
  9. app.kubernetes.io/part-of: php-apache
  10. autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
  11. name: ehpa-php-apache
  12. namespace: default
  13. spec:
  14. predictionMetrics:
  15. - algorithm:
  16. algorithmType: dsp
  17. dsp:
  18. estimators: {}
  19. historyLength: 7d
  20. sampleInterval: 60s
  21. resourceIdentifier: crane_pod_cpu_usage
  22. resourceQuery: cpu
  23. type: ResourceQuery
  24. - algorithm:
  25. algorithmType: dsp
  26. dsp:
  27. estimators: {}
  28. historyLength: 7d
  29. sampleInterval: 60s
  30. expressionQuery:
  31. expression: sum(rate(http_requests_total[5m])) by (pod)
  32. resourceIdentifier: crane_custom.pods_http_requests
  33. type: ExpressionQuery
  34. predictionWindowSeconds: 3600
  35. targetRef:
  36. apiVersion: apps/v1
  37. kind: Deployment
  38. name: sample-app
  39. namespace: default
  40. status:
  41. conditions:
  42. - lastTransitionTime: "2022-07-12T06:54:42Z"
  43. message: not all metric predicted
  44. reason: PredictPartial
  45. status: "False"
  46. type: Ready
  47. predictionMetrics:
  48. - ready: false
  49. resourceIdentifier: crane_pod_cpu_usage
  50. - prediction:
  51. - labels:
  52. - name: pod
  53. value: sample-app-7cfb596f98-8h5vv
  54. samples:
  55. - timestamp: 1657608900
  56. value: "0.01683"
  57. - timestamp: 1657608960
  58. value: "0.01683"
  59. ......
  60. ready: true
  61. resourceIdentifier: crane_custom.pods_http_requests

查看 Effective HPA 创建的 HPA 对象,可以观测到已经创建出基于自定义指标预测的 Metric: crane_custom.pods_http_requests

  1. apiVersion: autoscaling/v2beta2
  2. kind: HorizontalPodAutoscaler
  3. metadata:
  4. creationTimestamp: "2022-07-11T16:10:10Z"
  5. labels:
  6. app.kubernetes.io/managed-by: effective-hpa-controller
  7. app.kubernetes.io/name: ehpa-php-apache
  8. app.kubernetes.io/part-of: php-apache
  9. autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
  10. name: ehpa-php-apache
  11. namespace: default
  12. spec:
  13. maxReplicas: 10
  14. metrics:
  15. - pods:
  16. metric:
  17. name: http_requests
  18. target:
  19. averageValue: 500m
  20. type: AverageValue
  21. type: Pods
  22. - pods:
  23. metric:
  24. name: crane_custom.pods_http_requests
  25. selector:
  26. matchLabels:
  27. autoscaling.crane.io/effective-hpa-uid: 1322c5ac-a1c6-4c71-98d6-e85d07b22da0
  28. target:
  29. averageValue: 500m
  30. type: AverageValue
  31. type: Pods
  32. - resource:
  33. name: cpu
  34. target:
  35. averageUtilization: 50
  36. type: Utilization
  37. type: Resource
  38. minReplicas: 1
  39. scaleTargetRef:
  40. apiVersion: apps/v1
  41. kind: Deployment
  42. name: sample-app

总结

由于生产环境的复杂性,基于多指标的弹性(CPU/Memory/自定义指标)往往是生产应用的常见选择,因此 Effective HPA 通过预测算法覆盖了多指标的弹性,达到了帮助更多业务在生产环境落地水平弹性的成效。