专题-亲和性调度(Author - XiaoYang)

简介

在未分析和深入理解scheduler源码逻辑之前,本人在操作配置亲和性上,由于官方和第三方文档者说明不清楚等原因,在亲和性理解上有遇到过一些困惑,如:

  1. 亲和性的operator的 “In”底层是什么匹配操作?正则匹配吗?“Gt/Lt”底层又是什么操作实现的?

  2. 所有能查到的文档描述pod亲和性的topoloykey有三个:kubernetes.io/hostnamefailure-domain.beta.kubernetes.io/zonefailure-domain.beta.kubernetes.io/region为什么?真的只支持这三个key?不能自定义?

  3. Pod与Node亲和性两种类型的差异是什么?而Pod亲和性正真要去匹配的是什么,其内在逻辑是?不知道你们是否有同样类似的问题或困惑呢?当你清晰的理解了代码逻辑实现后,那么你会觉得一切是那么的清楚明确了,不再有“隐性知识”问题存在。所以我希望本文所述内容能给大家在kubernetes亲和性的解惑上有所帮助。

约束调度

在展开源码分析之前为更好的理解亲和性代码逻辑,补充一些kubernetes调度相关的基础知识:

  1. 亲和性目的是为了实现用户可以按需将pod调度到指定Node上,我称之为“约束调度”
  2. 约束调度操作上常用以下三类:
  • NodeSelector / NodeName node标签选择器 和 “nodeName”匹配
  • Affinity (Node/Pod/Service) 亲和性
  • Taint / Toleration 污点和容忍
  1. 本文所述主题是亲和性,亲和性分为三种类型Node、Pod、Service亲和,以下是亲和性预选和优选阶段代码实现的策略对应表(后面有详细分析):
预选阶段策略 Pod.Spec配置 类别 次序
MatchNodeSelecotorPred NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution Node 6
MatchInterPodAffinityPred PodAffinity.RequiredDuringSchedulingIgnoredDuringExecution**PodAntiAffinity.RequiredDuringSchedulingIgnoredDuringExecution Pod 22
CheckServiceAffinityPred Service 12
优选阶段策略 Pod.Spec配置 默认权重
InterPodAffinityPriority PodAffinity.PreferredDuringSchedulingIgnoredDuringExecution 1
NodeAffinityPriority NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution 1

Labels.selector标签选择器

labels selector是亲和性代码底层使用最基础的代码工具,不论是nodeAffinity还是podAffinity都是需要用到它。在使用yml类型deployment定义一个pod,配置其亲和性时须指定匹配表达式,其根本的匹配都是要对Node或pod的labels标签进行条件匹配。而这些labels标签匹配计算就必须要用到labels.selector工具(公共使用部分)。 所以在将此块最底层的匹配计算分析部分放在最前面,以便于后面源码分析部分更容易理解。

labels.selector接口定义,关键的方法是Matchs()

!FILENAME vendor/k8s.io/apimachinery/pkg/labels/selector.go:36

  1. type Selector interface {
  2. Matches(Labels) bool
  3. Empty() bool
  4. String() string
  5. Add(r ...Requirement) Selector
  6. Requirements() (requirements Requirements, selectable bool)
  7. DeepCopySelector() Selector
  8. }

看一下调用端,如下面的几个实例的func,调用labels.NewSelector()实例化一个labels.selector对象返回.

  1. func LabelSelectorAsSelector(ps *LabelSelector) (labels.Selector, error) {
  2. ...
  3. selector := labels.NewSelector()
  4. ...
  5. }
  6. func NodeSelectorRequirementsAsSelector(nsm []v1.NodeSelectorRequirement) (labels.Selector, error) {
  7. ...
  8. selector := labels.NewSelector()
  9. ...
  10. }
  11. func TopologySelectorRequirementsAsSelector(tsm []v1.TopologySelectorLabelRequirement) (labels.Selector, error) {
  12. ...
  13. selector := labels.NewSelector()
  14. ...
  15. }

NewSelector返回的是一个InternelSelector类型,而InternelSelector类型是一个Requirement(必要条件)

类型的列表。

!FILENAME vendor/k8s.io/apimachinery/pkg/labels/selector.go:79

  1. func NewSelector() Selector {
  2. return internalSelector(nil)
  3. }
  4. type internalSelector []Requirement

InternelSelector类的Matches()底层实现是遍历调用requirement.Matches()

!FILENAME vendor/k8s.io/apimachinery/pkg/labels/selector.go:340

  1. func (lsel internalSelector) Matches(l Labels) bool {
  2. for ix := range lsel {
  3. // internalSelector[ix]为Requirement
  4. if matches := lsel[ix].Matches(l); !matches {
  5. return false
  6. }
  7. }
  8. return true
  9. }

再来看下requirment结构定义(key、操作符、值 ) “这就是配置的亲和匹配条件表达式”

!FILENAME vendor/k8s.io/apimachinery/pkg/labels/selector.go:114

  1. type Requirement struct {
  2. key string
  3. operator selection.Operator
  4. // In huge majority of cases we have at most one value here.
  5. // It is generally faster to operate on a single-element slice
  6. // than on a single-element map, so we have a slice here.
  7. strValues []string
  8. }

requirment.matchs() 真正的条件表达式操作实现,基于表达式operator,计算key/value,返回匹配与否

!FILENAME vendor/k8s.io/apimachinery/pkg/labels/selector.go:192

  1. func (r *Requirement) Matches(ls Labels) bool {
  2. switch r.operator {
  3. case selection.In, selection.Equals, selection.DoubleEquals:
  4. if !ls.Has(r.key) { //IN
  5. return false
  6. }
  7. return r.hasValue(ls.Get(r.key))
  8. case selection.NotIn, selection.NotEquals: //NotIn
  9. if !ls.Has(r.key) {
  10. return true
  11. }
  12. return !r.hasValue(ls.Get(r.key))
  13. case selection.Exists: //Exists
  14. return ls.Has(r.key)
  15. case selection.DoesNotExist: //NotExists
  16. return !ls.Has(r.key)
  17. case selection.GreaterThan, selection.LessThan: // GT、LT
  18. if !ls.Has(r.key) {
  19. return false
  20. }
  21. lsValue, err := strconv.ParseInt(ls.Get(r.key), 10, 64) //能转化为数值的”字符数值“
  22. if err != nil {
  23. klog.V(10).Infof("ParseInt failed for value %+v in label %+v, %+v", ls.Get(r.key), ls, err)
  24. return false
  25. }
  26. // There should be only one strValue in r.strValues, and can be converted to a integer.
  27. if len(r.strValues) != 1 {
  28. klog.V(10).Infof("Invalid values count %+v of requirement %#v, for 'Gt', 'Lt' operators, exactly one value is required", len(r.strValues), r)
  29. return false
  30. }
  31. var rValue int64
  32. for i := range r.strValues {
  33. rValue, err = strconv.ParseInt(r.strValues[i], 10, 64)
  34. if err != nil {
  35. klog.V(10).Infof("ParseInt failed for value %+v in requirement %#v, for 'Gt', 'Lt' operators, the value must be an integer", r.strValues[i], r)
  36. return false
  37. }
  38. }
  39. return (r.operator == selection.GreaterThan && lsValue > rValue) || (r.operator == selection.LessThan && lsValue < rValue)
  40. default:
  41. return false
  42. }
  43. }

注:除了LabelsSelector外还有NodeSelector 、FieldsSelector、PropertySelector等,但基本都是类似的Selector接口实现,逻辑上都基本一致,后在源码分析过程有相应的说明。

Node亲和性

Node亲和性基础描述:

C97CE27E-F20D-4EE1-B2E5-29D8211D99D2

yml配置实例sample:

  1. ---
  2. apiVersion:v1
  3. kind: Pod
  4. metadata:
  5. name: with-node-affinity
  6. spec:
  7. affinity:
  8. nodeAffinity: #pod实例部署在prd-zone-A 或 prd-zone-B
  9. requiredDuringSchedulingIgnoredDuringExecution:
  10. nodeSelectorTerms:
  11. - matchExpressions:
  12. - key: kubernetes.io/prd-zone-name
  13. operator: In
  14. values:
  15. - prd-zone-A
  16. - prd-zone-B
  17. preferredDuringSchedulingIgnoredDuringExecution:
  18. - weight: 1
  19. preference:
  20. matchExpressions:
  21. - key: securityZone
  22. operator: In
  23. values:
  24. - BussinssZone
  25. containers:
  26. - name: with-node-affinity
  27. image: gcr.io/google_containers/pause:2.0

Node亲和性预选策略MatchNodeSelectorPred

策略说明:

基于NodeSelector和NodeAffinity定义为被调度的pod选择相匹配的Node(Nodes Labels)

适用NodeAffinity配置项

NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution

预选策略源码分析:

  1. 策略注册: defaults.init()注册了一条名为“MatchNodeSelectorPred”预选策略项,策略Func是PodMatchNodeSelector()

!FILENAME pkg/scheduler/algorithmprovider/defaults/defaults.go:78

  1. func init() {
  2. ...
  3. factory.RegisterFitPredicate(predicates.MatchNodeSelectorPred, predicates.PodMatchNodeSelector)
  4. ...
  5. }
  1. 策略Func: PodMatchNodeSelector()

获取目标Node信息,调用podMatchesNodeSelectorAndAffinityTerms()对被调度pod和目标node进行亲和性匹配。 如果符合则返回true,反之false并记录错误信息。

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:853

  1. func PodMatchNodeSelector(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
  2. // 获取node信息
  3. node := nodeInfo.Node()
  4. if node == nil {
  5. return false, nil, fmt.Errorf("node not found")
  6. }
  7. // 关键子逻辑func
  8. // 输入参数:被调度的pod和前面获取的node(被检测的node)
  9. if podMatchesNodeSelectorAndAffinityTerms(pod, node) {
  10. return true, nil, nil
  11. }
  12. return false, []algorithm.PredicateFailureReason{ErrNodeSelectorNotMatch}, nil
  13. }

podMatchesNodeSelectorAndAffinityTerms()

​ NodeSelector和NodeAffinity定义的”必要条件”配置匹配检测

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:807

  1. func podMatchesNodeSelectorAndAffinityTerms(pod *v1.Pod, node *v1.Node) bool {
  2. // 如果设置了NodeSelector,则检测Node labels是否满足NodeSelector所定义的所有terms项.
  3. if len(pod.Spec.NodeSelector) > 0 {
  4. selector := labels.SelectorFromSet(pod.Spec.NodeSelector)
  5. if !selector.Matches(labels.Set(node.Labels)) {
  6. return false
  7. }
  8. }
  9. //如果设置了NodeAffinity,则进行Node亲和性匹配 nodeMatchesNodeSelectorTerms() *[后面有详细分析]*
  10. nodeAffinityMatches := true
  11. affinity := pod.Spec.Affinity
  12. if affinity != nil && affinity.NodeAffinity != nil {
  13. nodeAffinity := affinity.NodeAffinity
  14. if nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution == nil {
  15. return true
  16. }
  17. if nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
  18. nodeSelectorTerms := nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms
  19. klog.V(10).Infof("Match for RequiredDuringSchedulingIgnoredDuringExecution node selector terms %+v", nodeSelectorTerms)
  20. // 关键处理func: nodeMatchesNodeSelectorTerms()
  21. nodeAffinityMatches = nodeAffinityMatches && nodeMatchesNodeSelectorTerms(node, nodeSelectorTerms)
  22. }
  23. }
  24. return nodeAffinityMatches
  25. }

  • NodeSelector和NodeAffinity.Require… 都存在配置则需True;

  • 如果NodeSelector失败则直接false,不处理NodeAffinity;

  • 如果指定了多个 NodeSelectorTerms,那 node只要满足其中一个条件;

  • 如果指定了多个 MatchExpressions,那必须要满足所有条件.

nodeMatchesNodeSelectorTerms()调用v1helper.MatchNodeSelectorTerms()进行NodeSelectorTerm定义的必要条件进行检测是否符合。关键的配置定义分为两类(matchExpressions/matchFileds):-“requiredDuringSchedulingIgnoredDuringExecution.matchExpressions”定义检测(匹配key与value)-“requiredDuringSchedulingIgnoredDuringExecution.matchFileds”定义检测(不匹配key,只value)

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:797

  1. func nodeMatchesNodeSelectorTerms(node *v1.Node, nodeSelectorTerms []v1.NodeSelectorTerm) bool {
  2. nodeFields := map[string]string{}
  3. // 获取检测目标node的Filelds
  4. for k, f := range algorithm.NodeFieldSelectorKeys {
  5. nodeFields[k] = f(node)
  6. }
  7. // 调用v1helper.MatchNodeSelectorTerms()
  8. // 参数:nodeSelectorTerms 亲和性配置的必要条件Terms
  9. // labels 被检测的目标node的label列表
  10. // fields 被检测的目标node filed列表
  11. return v1helper.MatchNodeSelectorTerms(nodeSelectorTerms, labels.Set(node.Labels), fields.Set(nodeFields))
  12. }
  13. // pkg/apis/core/v1/helper/helpers.go:302
  14. func MatchNodeSelectorTerms( nodeSelectorTerms []v1.NodeSelectorTerm,
  15. nodeLabels labels.Set, nodeFields fields.Set,) bool {
  16. for _, req := range nodeSelectorTerms {
  17. // nil or empty term selects no objects
  18. if len(req.MatchExpressions) == 0 && len(req.MatchFields) == 0 {
  19. continue
  20. }
  21. // MatchExpressions条件表达式匹配 ①
  22. if len(req.MatchExpressions) != 0 {
  23. labelSelector, err := NodeSelectorRequirementsAsSelector(req.MatchExpressions)
  24. if err != nil || !labelSelector.Matches(nodeLabels) {
  25. continue
  26. }
  27. }
  28. // MatchFields条件表达式匹配 ②
  29. if len(req.MatchFields) != 0 {
  30. fieldSelector, err := NodeSelectorRequirementsAsFieldSelector(req.MatchFields)
  31. if err != nil || !fieldSelector.Matches(nodeFields) {
  32. continue
  33. }
  34. }
  35. return true
  36. }
  37. return false
  38. }

NodeSelectorRequirementAsSelector()是对“requiredDuringSchedulingIgnoredDuringExecution.matchExpressions”所配置的表达式进行Selector表达式进行格式化加工,返回一个labels.Selector实例化对象. [本文开头1.2章节有分析]

!FILENAME pkg/apis/core/v1/helper/helpers.go:222

  1. func NodeSelectorRequirementsAsSelector(nsm []v1.NodeSelectorRequirement) (labels.Selector, error) {
  2. if len(nsm) == 0 {
  3. return labels.Nothing(), nil
  4. }
  5. selector := labels.NewSelector()
  6. for _, expr := range nsm {
  7. var op selection.Operator
  8. switch expr.Operator {
  9. case v1.NodeSelectorOpIn:
  10. op = selection.In
  11. case v1.NodeSelectorOpNotIn:
  12. op = selection.NotIn
  13. case v1.NodeSelectorOpExists:
  14. op = selection.Exists
  15. case v1.NodeSelectorOpDoesNotExist:
  16. op = selection.DoesNotExist
  17. case v1.NodeSelectorOpGt:
  18. op = selection.GreaterThan
  19. case v1.NodeSelectorOpLt:
  20. op = selection.LessThan
  21. default:
  22. return nil, fmt.Errorf("%q is not a valid node selector operator", expr.Operator)
  23. }
  24. // 表达式的三个关键要素: expr.Key, op, expr.Values
  25. r, err := labels.NewRequirement(expr.Key, op, expr.Values)
  26. if err != nil {
  27. return nil, err
  28. }
  29. selector = selector.Add(*r)
  30. }
  31. return selector, nil
  32. }

NodeSelectorRequirementAsFieldSelector()是对“requiredDuringSchedulingIgnoredDuringExecution.matchFields”所配置的表达式进行Selector表达式进行格式化加工,返回一个Fields.Selector实例化对象.

!FILENAME pkg/apis/core/v1/helper/helpers.go:256

  1. func NodeSelectorRequirementsAsFieldSelector(nsm []v1.NodeSelectorRequirement) (fields.Selector, error) {
  2. if len(nsm) == 0 {
  3. return fields.Nothing(), nil
  4. }
  5. selectors := []fields.Selector{}
  6. for _, expr := range nsm {
  7. switch expr.Operator {
  8. case v1.NodeSelectorOpIn:
  9. if len(expr.Values) != 1 {
  10. return nil, fmt.Errorf("unexpected number of value (%d) for node field selector operator %q",
  11. len(expr.Values), expr.Operator)
  12. }
  13. selectors = append(selectors, fields.OneTermEqualSelector(expr.Key, expr.Values[0]))
  14. case v1.NodeSelectorOpNotIn:
  15. if len(expr.Values) != 1 {
  16. return nil, fmt.Errorf("unexpected number of value (%d) for node field selector operator %q",
  17. len(expr.Values), expr.Operator)
  18. }
  19. selectors = append(selectors, fields.OneTermNotEqualSelector(expr.Key, expr.Values[0]))
  20. default:
  21. return nil, fmt.Errorf("%q is not a valid node field selector operator", expr.Operator)
  22. }
  23. }
  24. return fields.AndSelectors(selectors...), nil
  25. }
  1. 关键数据结构NodeSelector相关结构的定义

!FILENAME vendor/k8s.io/api/core/v1/types.go:2436

  1. type NodeSelector struct {
  2. NodeSelectorTerms []NodeSelectorTerm `json:"nodeSelectorTerms" protobuf:"bytes,1,rep,name=nodeSelectorTerms"`
  3. }
  4. type NodeSelectorTerm struct {
  5. MatchExpressions []NodeSelectorRequirement `json:"matchExpressions,omitempty" protobuf:"bytes,1,rep,name=matchExpressions"`
  6. MatchFields []NodeSelectorRequirement `json:"matchFields,omitempty" protobuf:"bytes,2,rep,name=matchFields"`
  7. }
  8. type NodeSelectorRequirement struct {
  9. Key string `json:"key" protobuf:"bytes,1,opt,name=key"`
  10. Operator NodeSelectorOperator `json:"operator" protobuf:"bytes,2,opt,name=operator,casttype=NodeSelectorOperator"`
  11. Values []string `json:"values,omitempty" protobuf:"bytes,3,rep,name=values"`
  12. }
  13. type NodeSelectorOperator string
  14. const (
  15. NodeSelectorOpIn NodeSelectorOperator = "In"
  16. NodeSelectorOpNotIn NodeSelectorOperator = "NotIn"
  17. NodeSelectorOpExists NodeSelectorOperator = "Exists"
  18. NodeSelectorOpDoesNotExist NodeSelectorOperator = "DoesNotExist"
  19. NodeSelectorOpGt NodeSelectorOperator = "Gt"
  20. NodeSelectorOpLt NodeSelectorOperator = "Lt"
  21. )

FieldsSelector实现类的结构定义(Match value)

!FILENAME vendor/k8s.io/apimachinery/pkg/fields/selector.go:78

  1. type hasTerm struct {
  2. field, value string
  3. }
  4. func (t *hasTerm) Matches(ls Fields) bool {
  5. return ls.Get(t.field) == t.value
  6. }
  7. type notHasTerm struct {
  8. field, value string
  9. }
  10. func (t *notHasTerm) Matches(ls Fields) bool {
  11. return ls.Get(t.field) != t.value
  12. }

Node亲和性优选策略NodeAffinityPriority

策略说明:

通过被调度的pod亲和性配置定义条件,对潜在可被调度运行的Nodes进行亲和性匹配并评分.

适用NodeAffinity配置项

NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution

预选策略源码分析:

  1. 策略注册:defaultPriorities()注册了一条名为“NodeAffinityPriority”优选策略项.并注册了策略的两个方法Map/Reduce:

    • CalculateNodeAffinityPriorityMap() map计算, 对潜在被调度Node进行亲和匹配,并为其计权重得分.
    • CalculateNodeAffinityPriorityReduce() reduce计算,重新统计得分,取值区间0~10.

!FILENAME pkg/scheduler/algorithmprovider/defaults/defaults.go:266

  1. //k8s.io/kubernetes/pkg/scheduler/algorithmprovider/defaults/defaults.go/algorithmprovider/defaults.go
  2. func defaultPriorities() sets.String {
  3. ...
  4. factory.RegisterPriorityFunction2("NodeAffinityPriority", priorities.CalculateNodeAffinityPriorityMap, priorities.CalculateNodeAffinityPriorityReduce, 1),
  5. ...
  6. }
  1. 策略Func:

    map计算 CalculateNodeAffinityPriorityMap()

    1. 遍历affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution 定义的TermsNodeSelector对象(labels.selector)后,对潜在被调度Nodelabels进行Match匹配检测,如果匹配则将条件所给定的Weight权重值累计。 最后将返回各潜在的被调度Node最后分值。

!FILENAME pkg/scheduler/algorithm/priorities/node_affinity.go:34

  1. func CalculateNodeAffinityPriorityMap(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error) {
  2. // 获取被检测的Node信息
  3. node := nodeInfo.Node()
  4. if node == nil {
  5. return schedulerapi.HostPriority{}, fmt.Errorf("node not found")
  6. }
  7. // 默认为Spec配置的Affinity
  8. affinity := pod.Spec.Affinity
  9. if priorityMeta, ok := meta.(*priorityMetadata); ok {
  10. // We were able to parse metadata, use affinity from there.
  11. affinity = priorityMeta.affinity
  12. }
  13. var count int32
  14. if affinity != nil && affinity.NodeAffinity != nil && affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution != nil {
  15. // 遍历PreferredDuringSchedulingIgnoredDuringExecution定义的`必要条件项`(Terms)
  16. for i := range affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution {
  17. preferredSchedulingTerm := &affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution[i]
  18. if preferredSchedulingTerm.Weight == 0 { //注意前端的配置,如果weight为0则不做任何处理
  19. continue
  20. }
  21. // TODO: Avoid computing it for all nodes if this becomes a performance problem.
  22. // 获取node亲和MatchExpression表达式条件,实例化label.Selector对象.
  23. nodeSelector, err := v1helper.NodeSelectorRequirementsAsSelector(preferredSchedulingTerm.Preference.MatchExpressions)
  24. if err != nil {
  25. return schedulerapi.HostPriority{}, err
  26. }
  27. if nodeSelector.Matches(labels.Set(node.Labels)) {
  28. count += preferredSchedulingTerm.Weight
  29. }
  30. }
  31. }
  32. // 返回Node得分
  33. return schedulerapi.HostPriority{
  34. Host: node.Name,
  35. Score: int(count),
  36. }, nil
  37. }

再次看到前面(预选策略分析时)分析过的NodeSelectorRequirementAsSelector()返回一个labels.Selector实例对象 使用selector.Matches对node.Labels进行匹配是否符合条件.

reduce计算 CalculateNodeAffinityPriorityReduce()

将各个node的最后得分重新计算分布区间在0〜10.

代码内给定一个NormalizeReduce()方法,MaxPriority值为10,reverse取反false关闭

!FILENAME pkg/scheduler/algorithm/priorities/node_affinity.go:77

  1. const MaxPriority = 10
  2. var CalculateNodeAffinityPriorityReduce = NormalizeReduce(schedulerapi.MaxPriority, false)

NormalizeReduce()

  • 结果评分取值0〜MaxPriority
  • reverse取反为true时,最终评分=(MaxPriority-其原评分值)

!FILENAME pkg/scheduler/algorithm/priorities/reduce.go:29

  1. func NormalizeReduce(maxPriority int, reverse bool) algorithm.PriorityReduceFunction {
  2. return func(
  3. _ *v1.Pod,
  4. _ interface{},
  5. _ map[string]*schedulercache.NodeInfo,
  6. result schedulerapi.HostPriorityList) error {
  7. var maxCount int
  8. // 取出最大的值
  9. for i := range result {
  10. if result[i].Score > maxCount {
  11. maxCount = result[i].Score
  12. }
  13. }
  14. // 如果最大的值为0,且取反设为真,则将所有的评分设置为MaxPriority
  15. if maxCount == 0 {
  16. if reverse {
  17. for i := range result {
  18. result[i].Score = maxPriority
  19. }
  20. }
  21. return nil
  22. }
  23. // 计算后得分 = maxPrority * 原分值 / 最大值
  24. // 如果取反为真则 maxPrority - 计算后得分
  25. for i := range result {
  26. score := result[i].Score
  27. score = maxPriority * score / maxCount
  28. if reverse {
  29. score = maxPriority - score
  30. }
  31. result[i].Score = score
  32. }
  33. return nil
  34. }
  35. }

Pod亲和性

Pod亲和性基础描述:

image-20190416170112768

yml配置实例sample:

  1. ---
  2. apiVersion: apps/v1beta1
  3. kind: Deployment
  4. metadata:
  5. name: affinity
  6. labels:
  7. app: affinity
  8. spec:
  9. replicas: 3
  10. template:
  11. metadata:
  12. labels:
  13. app: affinity
  14. role: lab-web
  15. spec:
  16. containers:
  17. - name: nginx
  18. image: nginx:1.9.0
  19. ports:
  20. - containerPort: 80
  21. name: nginx_web_Lab
  22. affinity: #为实现高可用,三个pod应该分布在不同Node上
  23. podAntiAffinity:
  24. requiredDuringSchedulingIgnoredDuringExecution:
  25. - labelSelector:
  26. matchExpressions:
  27. - key: app
  28. operator: In
  29. values:
  30. - prod-pod
  31. topologyKey: kubernetes.io/hostname

Pod亲和性预选策略MatchInterPodAffinityPred

策略说明:

对需被调度的Pod进行亲和/反亲和配置匹配检测目标Pods,然后获取满足亲和条件的Pods所运行的Nodes​的 TopologyKey的值(亲和性pod定义topologyKey)与目标 Nodes进行一一匹配是否符合条件.

适用NodeAffinity配置项:PodAffinity.RequiredDuringSchedulingIgnoredDuringExecutionPodAntiAffinity.RequiredDuringSchedulingIgnoredDuringExecution

预选策略源码分析:

  1. 策略注册:defaultPredicates()注册了一条名为“MatchInterPodAffinity”预选策略项.

!FILENAME pkg/scheduler/algorithmprovider/defaults/defaults.go:143

  1. func defaultPredicates() sets.String {
  2. ...
  3. factory.RegisterFitPredicateFactory(
  4. predicates.MatchInterPodAffinityPred,
  5. func(args factory.PluginFactoryArgs) algorithm.FitPredicate {
  6. return predicates.NewPodAffinityPredicate(args.NodeInfo, args.PodLister)
  7. },
  8. ...
  9. }
  1. 策略Func: checker.InterPodAffinityMatches()Func是通过NewPodAffinityProdicate()实例化PodAffinityChecker类对象后返回。

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1138

  1. type PodAffinityChecker struct {
  2. info NodeInfo
  3. podLister algorithm.PodLister
  4. }
  5. func NewPodAffinityPredicate(info NodeInfo, podLister algorithm.PodLister) algorithm.FitPredicate {
  6. checker := &PodAffinityChecker{
  7. info: info,
  8. podLister: podLister,
  9. }
  10. return checker.InterPodAffinityMatches //返回策略func
  11. }

InterPodAffinityMatches()检测一个pod是否满足调度到特定的(符合pod亲和或反亲和配置)Node上。

  1. satisfiesExistingPodsAntiAffinity() 满足存在的Pods反亲和配置.
  2. satisfiesPodsAffinityAntiAffinity() 满足Pods亲和与反亲和配置.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1155

  1. func (c *PodAffinityChecker) InterPodAffinityMatches(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
  2. node := nodeInfo.Node()
  3. if node == nil {
  4. return false, nil, fmt.Errorf("node not found")
  5. }
  6. //①
  7. if failedPredicates, error := c.satisfiesExistingPodsAntiAffinity(pod, meta, nodeInfo); failedPredicates != nil {
  8. failedPredicates := append([]algorithm.PredicateFailureReason{ErrPodAffinityNotMatch}, failedPredicates)
  9. return false, failedPredicates, error
  10. }
  11. // Now check if <pod> requirements will be satisfied on this node.
  12. affinity := pod.Spec.Affinity
  13. if affinity == nil || (affinity.PodAffinity == nil && affinity.PodAntiAffinity == nil) {
  14. return true, nil, nil
  15. }
  16. //②
  17. if failedPredicates, error := c.satisfiesPodsAffinityAntiAffinity(pod, meta, nodeInfo, affinity); failedPredicates != nil {
  18. failedPredicates := append([]algorithm.PredicateFailureReason{ErrPodAffinityNotMatch}, failedPredicates)
  19. return false, failedPredicates, error
  20. }
  21. return true, nil, nil
  22. }

① satisfiesExistingPodsAntiAffinity()检测当pod被调度到目标node上是否触犯了其它pods所定义的反亲和配置.即:当调度一个pod到目标Node上,而某个或某些Pod定义了反亲和配置与被 调度的Pod相匹配(触犯),那么就不应该将此Node加入到可选的潜在调度Nodes列表内.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1293

  1. func (c *PodAffinityChecker) satisfiesExistingPodsAntiAffinity(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (algorithm.PredicateFailureReason, error) {
  2. node := nodeInfo.Node()
  3. if node == nil {
  4. return ErrExistingPodsAntiAffinityRulesNotMatch, fmt.Errorf("Node is nil")
  5. }
  6. var topologyMaps *topologyPairsMaps
  7. //如果存在预处理的MetaData则直接获取topologyPairsAntiAffinityPodsMap
  8. if predicateMeta, ok := meta.(*predicateMetadata); ok {
  9. topologyMaps = predicateMeta.topologyPairsAntiAffinityPodsMap
  10. } else {
  11. // 不存在预处理的MetaData处理逻辑.
  12. // 过滤掉pod的nodeName等于NodeInfo.Node.Name,且不存在于nodeinfo中.
  13. // 即运行在其它Nodes上的Pods
  14. filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything())
  15. if err != nil {
  16. errMessage := fmt.Sprintf("Failed to get all pods, %+v", err)
  17. klog.Error(errMessage)
  18. return ErrExistingPodsAntiAffinityRulesNotMatch, errors.New(errMessage)
  19. }
  20. // 获取被调度Pod与其它存在反亲和配置的Pods匹配的topologyMaps
  21. if topologyMaps, err = c.getMatchingAntiAffinityTopologyPairsOfPods(pod, filteredPods); err != nil {
  22. errMessage := fmt.Sprintf("Failed to get all terms that pod %+v matches, err: %+v", podName(pod), err)
  23. klog.Error(errMessage)
  24. return ErrExistingPodsAntiAffinityRulesNotMatch, errors.New(errMessage)
  25. }
  26. }
  27. // 遍历所有topology pairs(所有反亲和topologyKey/Value),检测Node是否有影响.
  28. for topologyKey, topologyValue := range node.Labels {
  29. if topologyMaps.topologyPairToPods[topologyPair{key: topologyKey, value: topologyValue}] != nil {
  30. klog.V(10).Infof("Cannot schedule pod %+v onto node %v", podName(pod), node.Name)
  31. return ErrExistingPodsAntiAffinityRulesNotMatch, nil
  32. }
  33. }
  34. return nil, nil
  35. }

getMatchingAntiAffinityTopologyPairsOfPods()获取被调度Pod与其它存在反亲和配置的Pods匹配的topologyMaps

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1270

  1. func (c *PodAffinityChecker) getMatchingAntiAffinityTopologyPairsOfPods(pod *v1.Pod, existingPods []*v1.Pod) (*topologyPairsMaps, error) {
  2. topologyMaps := newTopologyPairsMaps()
  3. // 遍历所有存在Pods,获取pod所运行的Node信息
  4. for _, existingPod := range existingPods {
  5. existingPodNode, err := c.info.GetNodeInfo(existingPod.Spec.NodeName)
  6. if err != nil {
  7. if apierrors.IsNotFound(err) {
  8. klog.Errorf("Node not found, %v", existingPod.Spec.NodeName)
  9. continue
  10. }
  11. return nil, err
  12. }
  13. // 依据被调度的pod、目标pod、目标Node信息(上面获取得到)获取TopologyPairs。
  14. // getMatchingAntiAffinityTopologyPairsOfPod()下面详述
  15. existingPodTopologyMaps, err := getMatchingAntiAffinityTopologyPairsOfPod(pod, existingPod, existingPodNode)
  16. if err != nil {
  17. return nil, err
  18. }
  19. topologyMaps.appendMaps(existingPodTopologyMaps)
  20. }
  21. return topologyMaps, nil
  22. }
  23. //1)是否ExistingPod定义了反亲和配置,如果没有直接返回
  24. //2)如果有定义,是否有任务一个反亲和Term匹配需被调度的pod.
  25. // 如果配置则将返回term定义的TopologyKey和Node的topologyValue.
  26. func getMatchingAntiAffinityTopologyPairsOfPod(newPod *v1.Pod, existingPod *v1.Pod, node *v1.Node) (*topologyPairsMaps, error) {
  27. affinity := existingPod.Spec.Affinity
  28. if affinity == nil || affinity.PodAntiAffinity == nil {
  29. return nil, nil
  30. }
  31. topologyMaps := newTopologyPairsMaps()
  32. for _, term := range GetPodAntiAffinityTerms(affinity.PodAntiAffinity) {
  33. namespaces := priorityutil.GetNamespacesFromPodAffinityTerm(existingPod, &term)
  34. selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
  35. if err != nil {
  36. return nil, err
  37. }
  38. if priorityutil.PodMatchesTermsNamespaceAndSelector(newPod, namespaces, selector) {
  39. if topologyValue, ok := node.Labels[term.TopologyKey]; ok {
  40. pair := topologyPair{key: term.TopologyKey, value: topologyValue}
  41. topologyMaps.addTopologyPair(pair, existingPod)
  42. }
  43. }
  44. }
  45. return topologyMaps, nil
  46. }

② satisfiesPodsAffinityAntiAffinity()满足Pods亲和与反亲和配置.我们先看一下代码结构,我将共分为两个部分if{}部分,else{}部分,依赖于是否指定了预处理的预选metadata.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1367

  1. func (c *PodAffinityChecker) satisfiesPodsAffinityAntiAffinity(pod *v1.Pod,
  2. meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo,
  3. affinity *v1.Affinity) (algorithm.PredicateFailureReason, error) {
  4. node := nodeInfo.Node()
  5. if node == nil {
  6. return ErrPodAffinityRulesNotMatch, fmt.Errorf("Node is nil")
  7. }
  8. if predicateMeta, ok := meta.(*predicateMetadata); ok {
  9. ... //partI
  10. } else {
  11. ... //partII
  12. }
  13. return nil, nil
  14. }

partI if{…}

  • 如果指定了预处理metadata,则使用此逻辑,否则跳至else{…}
  • 获取所有pod亲和性定义AffinityTerms,如果存在亲和性定义,基于指定的metadata判断AffinityTerms所定义的nodeTopoloykey与值是否所有都存在于metadata.topologyPairsPotentialAffinityPods之内(潜在匹配亲和定义的pod list)。
  • 获取所有pod亲和性定义AntiAffinityTerms,如果存在反亲和定义,基于指定的metadata判断AntiAffinityTerms所定义的nodeTopoloykey与值 是否有一个存在于 metadata.topologyPairsPotentialAntiAffinityPods之内的情况(潜在匹配anti反亲和定义的pod list)。
  1. if predicateMeta, ok := meta.(*predicateMetadata); ok {
  2. // 检测所有affinity terms.
  3. topologyPairsPotentialAffinityPods := predicateMeta.topologyPairsPotentialAffinityPods
  4. if affinityTerms := GetPodAffinityTerms(affinity.PodAffinity); len(affinityTerms) > 0 {
  5. matchExists := c.nodeMatchesAllTopologyTerms(pod, topologyPairsPotentialAffinityPods, nodeInfo, affinityTerms)
  6. if !matchExists {
  7. if !(len(topologyPairsPotentialAffinityPods.topologyPairToPods) == 0 && targetPodMatchesAffinityOfPod(pod, pod)) {
  8. klog.V(10).Infof("Cannot schedule pod %+v onto node %v, because of PodAffinity",
  9. podName(pod), node.Name)
  10. return ErrPodAffinityRulesNotMatch, nil
  11. }
  12. }
  13. }
  14. // 检测所有anti-affinity terms.
  15. topologyPairsPotentialAntiAffinityPods := predicateMeta.topologyPairsPotentialAntiAffinityPods
  16. if antiAffinityTerms := GetPodAntiAffinityTerms(affinity.PodAntiAffinity); len(antiAffinityTerms) > 0 {
  17. matchExists := c.nodeMatchesAnyTopologyTerm(pod, topologyPairsPotentialAntiAffinityPods, nodeInfo, antiAffinityTerms)
  18. if matchExists {
  19. klog.V(10).Infof("Cannot schedule pod %+v onto node %v, because of PodAntiAffinity",
  20. podName(pod), node.Name)
  21. return ErrPodAntiAffinityRulesNotMatch, nil
  22. }
  23. }
  24. }

以下说明继续if{…}内所用的各个子逻辑函数分析(按代码位置的先后顺序):

GetPodAffinityTerms()如果存在podAffinity硬件配置,获取所有”匹配必要条件”Terms

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1217

  1. func GetPodAffinityTerms(podAffinity *v1.PodAffinity) (terms []v1.PodAffinityTerm) {
  2. if podAffinity != nil {
  3. if len(podAffinity.RequiredDuringSchedulingIgnoredDuringExecution) != 0 {
  4. terms = podAffinity.RequiredDuringSchedulingIgnoredDuringExecution
  5. }
  6. }
  7. return terms
  8. }

nodeMatchesAllTopologyTerms()判断目标Node是否匹配所有亲和性配置的定义Terms的topology值.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1336

  1. // 目标Node须匹配所有Affinity terms所定义的TopologyKey,且值须与nodes(运行被亲和匹配表达式匹配的Pods)
  2. // 的TopologyKey和值相匹配。
  3. // 注:此逻辑内metadata预计算了topologyPairs
  4. func (c *PodAffinityChecker) nodeMatchesAllTopologyTerms(pod *v1.Pod, topologyPairs *topologyPairsMaps, nodeInfo *schedulercache.NodeInfo, terms []v1.PodAffinityTerm) bool {
  5. node := nodeInfo.Node()
  6. for _, term := range terms {
  7. // 判断目标node上是否存在亲和配置定义的TopologyKey的key,取出其topologykey值
  8. // 根据key与值创建topologyPair
  9. // 基于metadata.topologyPairsPotentialAffinityPods(潜在亲和pods的topologyPairs)判断\
  10. //目标node上的ToplogyKey与value是否相互匹配.
  11. if topologyValue, ok := node.Labels[term.TopologyKey]; ok {
  12. pair := topologyPair{key: term.TopologyKey, value: topologyValue}
  13. if _, ok := topologyPairs.topologyPairToPods[pair]; !ok {
  14. return false // 一项不满足则为false
  15. }
  16. } else {
  17. return false
  18. }
  19. }
  20. return true
  21. }
  22. // topologyPairsMaps结构定义
  23. type topologyPairsMaps struct {
  24. topologyPairToPods map[topologyPair]podSet
  25. podToTopologyPairs map[string]topologyPairSet
  26. }

targetPodMatchesAffinityOfPod()根据pod的亲和定义检测目标pod的NameSpace是否符合条件以及 Labels.selector条件表达式是否匹配.

!FILENAME pkg/scheduler/algorithm/predicates/metadata.go:498

  1. func targetPodMatchesAffinityOfPod(pod, targetPod *v1.Pod) bool {
  2. affinity := pod.Spec.Affinity
  3. if affinity == nil || affinity.PodAffinity == nil {
  4. return false
  5. }
  6. affinityProperties, err := getAffinityTermProperties(pod, GetPodAffinityTerms(affinity.PodAffinity)) // ①
  7. if err != nil {
  8. klog.Errorf("error in getting affinity properties of Pod %v", pod.Name)
  9. return false
  10. } // ②
  11. return podMatchesAllAffinityTermProperties(targetPod, affinityProperties)
  12. }
  13. // ① 获取affinityTerms所定义所有的namespaces 和 selector 列表,
  14. // 返回affinityTermProperites数组. 数组的每项定义{namesapces,selector}.
  15. func getAffinityTermProperties(pod *v1.Pod, terms []v1.PodAffinityTerm) (properties []*affinityTermProperties, err error) {
  16. if terms == nil {
  17. return properties, nil
  18. }
  19. for _, term := range terms {
  20. namespaces := priorityutil.GetNamespacesFromPodAffinityTerm(pod, &term)
  21. // 基于定义的亲和性term,创建labels.selector
  22. selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
  23. if err != nil {
  24. return nil, err
  25. }
  26. // 返回 namespaces 和 selector
  27. properties = append(properties, &affinityTermProperties{namespaces: namespaces, selector: selector})
  28. }
  29. return properties, nil
  30. }
  31. // 返回Namespace列表(如果term未指定Namespace则使用被调度pod的Namespace).
  32. func GetNamespacesFromPodAffinityTerm(pod *v1.Pod, podAffinityTerm *v1.PodAffinityTerm) sets.String {
  33. names := sets.String{}
  34. if len(podAffinityTerm.Namespaces) == 0 {
  35. names.Insert(pod.Namespace)
  36. } else {
  37. names.Insert(podAffinityTerm.Namespaces...)
  38. }
  39. return names
  40. }
  41. // ② 遍历properties所有定义的namespaces 和 selector 列表,调用PodMatchesTermsNamespaceAndSelector()进行一一匹配.
  42. func podMatchesAllAffinityTermProperties(pod *v1.Pod, properties []*affinityTermProperties) bool {
  43. if len(properties) == 0 {
  44. return false
  45. }
  46. for _, property := range properties {
  47. if !priorityutil.PodMatchesTermsNamespaceAndSelector(pod, property.namespaces, property.selector) {
  48. return false
  49. }
  50. }
  51. return true
  52. }
  53. // 检测NameSpaces一致性和Labels.selector是否匹配.
  54. // - 如果pod.Namespaces不相等于指定的NameSpace值则返回false,如果true则继续labels match.
  55. // - 如果pod.labels不能Match Labels.selector选择器,则返回false,反之true
  56. func PodMatchesTermsNamespaceAndSelector(pod *v1.Pod, namespaces sets.String, selector labels.Selector) bool {
  57. if !namespaces.Has(pod.Namespace) {
  58. return false
  59. }
  60. if !selector.Matches(labels.Set(pod.Labels)) {
  61. return false
  62. }
  63. return true
  64. }

GetPodAntiAffinityTerms()获取pod反亲和配置所有的必要条件Terms

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1231

  1. func GetPodAntiAffinityTerms(podAntiAffinity *v1.PodAntiAffinity) (terms []v1.PodAffinityTerm) {
  2. if podAntiAffinity != nil {
  3. if len(podAntiAffinity.RequiredDuringSchedulingIgnoredDuringExecution) != 0 {
  4. terms = podAntiAffinity.RequiredDuringSchedulingIgnoredDuringExecution
  5. }
  6. }
  7. return terms
  8. }

nodeMatchesAnyTopologyTerm()判断目标Node是否有匹配了反亲和的定义Terms的topology值*.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1353

  1. // Node只须匹配任何一条AnitAffinity terms所定义的TopologyKey则为True
  2. // 逻辑等同于nodeMatchesAllTopologyTerms(),只是匹配一条则返回为true.
  3. func (c *PodAffinityChecker) nodeMatchesAnyTopologyTerm(pod *v1.Pod, topologyPairs *topologyPairsMaps, nodeInfo *schedulercache.NodeInfo, terms []v1.PodAffinityTerm) bool {
  4. node := nodeInfo.Node()
  5. for _, term := range terms {
  6. if topologyValue, ok := node.Labels[term.TopologyKey]; ok {
  7. pair := topologyPair{key: term.TopologyKey, value: topologyValue}
  8. if _, ok := topologyPairs.topologyPairToPods[pair]; ok {
  9. return true // 一项满足则为true
  10. }
  11. }
  12. }
  13. return false
  14. }

partII else{…}

  • 如果没有预处理的Metadata,则通过指定podFilter过滤器获取满足条件的pod列表
  • 获取所有亲和配置定义,如果存在则,通过获取PodAffinity所定义的所有namespaces和标签条件表达式进行匹配”目标pod”,完全符合则获取此目标pod的运行node的topologykey(此为affinity指定的topologykey)的 和”潜在Node”的topologykey的值比对是否一致。
  • 与上类似,获取所有anti反亲和配置定义,如果存在则,通过获取PodAntiAffinity所定义的所有namespaces和标签条件表达式进行匹配”目标pod”,完全符合则获取此目标pod的运行node的topologykey(此为AntiAffinity指定的topologykey)的值和”潜在Node”的topologykey的值比对是否一致。
  1. else {
  2. // We don't have precomputed metadata. We have to follow a slow path to check affinity terms.
  3. filteredPods, err := c.podLister.FilteredList(nodeInfo.Filter, labels.Everything())
  4. if err != nil {
  5. return ErrPodAffinityRulesNotMatch, err
  6. }
  7. //获取亲和、反亲和配置定义的"匹配条件"Terms
  8. affinityTerms := GetPodAffinityTerms(affinity.PodAffinity)
  9. antiAffinityTerms := GetPodAntiAffinityTerms(affinity.PodAntiAffinity)
  10. matchFound, termsSelectorMatchFound := false, false
  11. for _, targetPod := range filteredPods {
  12. // 遍历所有目标Pod,检测所有亲和性配置"匹配条件"Terms
  13. if !matchFound && len(affinityTerms) > 0 {
  14. // podMatchesPodAffinityTerms()对namespaces和标签条件表达式进行匹配目标pod【详解后述】
  15. affTermsMatch, termsSelectorMatch, err := c.podMatchesPodAffinityTerms(pod, targetPod, nodeInfo, affinityTerms)
  16. if err != nil {
  17. errMessage := fmt.Sprintf("Cannot schedule pod %+v onto node %v, because of PodAffinity, err: %v", podName(pod), node.Name, err)
  18. klog.Error(errMessage)
  19. return ErrPodAffinityRulesNotMatch, errors.New(errMessage)
  20. }
  21. if termsSelectorMatch {
  22. termsSelectorMatchFound = true
  23. }
  24. if affTermsMatch {
  25. matchFound = true
  26. }
  27. }
  28. // 同上,遍历所有目标Pod,检测所有Anti反亲和配置"匹配条件"Terms.
  29. if len(antiAffinityTerms) > 0 {
  30. antiAffTermsMatch, _, err := c.podMatchesPodAffinityTerms(pod, targetPod, nodeInfo, antiAffinityTerms)
  31. if err != nil || antiAffTermsMatch {
  32. klog.V(10).Infof("Cannot schedule pod %+v onto node %v, because of PodAntiAffinityTerm, err: %v",
  33. podName(pod), node.Name, err)
  34. return ErrPodAntiAffinityRulesNotMatch, nil
  35. }
  36. }
  37. }
  38. if !matchFound && len(affinityTerms) > 0 {
  39. if termsSelectorMatchFound {
  40. klog.V(10).Infof("Cannot schedule pod %+v onto node %v, because of PodAffinity",
  41. podName(pod), node.Name)
  42. return ErrPodAffinityRulesNotMatch, nil
  43. }
  44. // Check if pod matches its own affinity properties (namespace and label selector).
  45. if !targetPodMatchesAffinityOfPod(pod, pod) {
  46. klog.V(10).Infof("Cannot schedule pod %+v onto node %v, because of PodAffinity",
  47. podName(pod), node.Name)
  48. return ErrPodAffinityRulesNotMatch, nil
  49. }
  50. }
  51. }

以下说明继续else{…}内所用的子逻辑函数分析

podMatchesPodAffinityTerms()通过获取亲和配置定义的所有namespaces和标签条件表达式进行匹配目标pod,完全符合则获取此目标pod的运行node的topologykey(此为affinity指定的topologykey)的和潜在Node的topologykey的比对是否一致.

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:1189

  1. func (c *PodAffinityChecker) podMatchesPodAffinityTerms(pod, targetPod *v1.Pod, nodeInfo *schedulercache.NodeInfo, terms []v1.PodAffinityTerm) (bool, bool, error) {
  2. if len(terms) == 0 {
  3. return false, false, fmt.Errorf("terms array is empty")
  4. }
  5. // 获取{namespaces,selector}列表
  6. props, err := getAffinityTermProperties(pod, terms)
  7. if err != nil {
  8. return false, false, err
  9. }
  10. // 匹配目标pod是否在affinityTerm定义的{namespaces,selector}列表内所有项,如果不匹配则返回false,
  11. // 如果匹配则获取此pod的运行node信息(称为目标Node),
  12. // 通过“目标Node”所定义的topologykey(此为affinity指定的topologykey)的值来匹配“潜在被调度的Node”的topologykey是否一致。
  13. if !podMatchesAllAffinityTermProperties(targetPod, props) {
  14. return false, false, nil
  15. }
  16. // Namespace and selector of the terms have matched. Now we check topology of the terms.
  17. targetPodNode, err := c.info.GetNodeInfo(targetPod.Spec.NodeName)
  18. if err != nil {
  19. return false, false, err
  20. }
  21. for _, term := range terms {
  22. if len(term.TopologyKey) == 0 {
  23. return false, false, fmt.Errorf("empty topologyKey is not allowed except for PreferredDuringScheduling pod anti-affinity")
  24. }
  25. if !priorityutil.NodesHaveSameTopologyKey(nodeInfo.Node(), targetPodNode, term.TopologyKey) {
  26. return false, true, nil
  27. }
  28. }
  29. return true, true, nil
  30. }

priorityutil.NodesHaveSameTopologyKey() 正真的toplogykey比较实现的逻辑代码块。*从此代码可以看出deployment的yml对topologykey设定的可以支持自定义的

!FILENAME pkg/scheduler/algorithm/priorities/util/topologies.go:53

  1. // 判断两者的topologyKey定义的值是否一致。
  2. func NodesHaveSameTopologyKey(nodeA, nodeB *v1.Node, topologyKey string) bool {
  3. if len(topologyKey) == 0 {
  4. return false
  5. }
  6. if nodeA.Labels == nil || nodeB.Labels == nil {
  7. return false
  8. }
  9. nodeALabel, okA := nodeA.Labels[topologyKey] //取Node一个被意义化的“Label”的值value
  10. nodeBLabel, okB := nodeB.Labels[topologyKey]
  11. // If found label in both nodes, check the label
  12. if okB && okA {
  13. return nodeALabel == nodeBLabel //比对
  14. }
  15. return false
  16. }

Pod亲和性优选策略InterPodAffinityPriority

策略说明:并发遍历所有潜在的目标Nodes,对Pods与需被调度Pod的亲和和反亲性检测,对亲性匹配则增,对反亲性匹配则减, 最终对每个Node进行统计分数。

适用NodeAffinity配置项:PodAffinity.PreferredDuringSchedulingIgnoredDuringExecutionPodAntiAffinity.PreferredDuringSchedulingIgnoredDuringExecution

预选策略源码分析:

  1. 策略注册:defaultPriorities()注册了一条名为“InterPodAffinityPriority”优选策略项.

!FILENAME pkg/scheduler/algorithmprovider/defaults/defaults.go:145

  1. // k8s.io/kubernetes/pkg/scheduler/algorithmprovider/defaults/defaults.go
  2. func defaultPriorities() sets.String {
  3. ...
  4. factory.RegisterPriorityConfigFactory(
  5. "InterPodAffinityPriority",
  6. factory.PriorityConfigFactory{
  7. Function: func(args factory.PluginFactoryArgs) algorithm.PriorityFunction {
  8. return priorities.NewInterPodAffinityPriority(args.NodeInfo, args.NodeLister, args.PodLister, args.HardPodAffinitySymmetricWeight)
  9. },
  10. Weight: 1,
  11. },
  12. ),
  13. ...
  14. }
  1. 策略Func: interPodAffinity.CalculateInterPodAffinityPriority()通过NewPodAffinityPriority()实例化interPodAffinityod类对象及CalculateInterPodAffinityPriority()策略Func返回。

!FILENAME pkg/scheduler/algorithm/priorities/interpod_affinity.go:45

  1. func NewInterPodAffinityPriority(
  2. info predicates.NodeInfo,
  3. nodeLister algorithm.NodeLister,
  4. podLister algorithm.PodLister,
  5. hardPodAffinityWeight int32) algorithm.PriorityFunction {
  6. interPodAffinity := &InterPodAffinity{
  7. info: info,
  8. nodeLister: nodeLister,
  9. podLister: podLister,
  10. hardPodAffinityWeight: hardPodAffinityWeight,
  11. }
  12. return interPodAffinity.CalculateInterPodAffinityPriority
  13. }

CalculateInterPodAffinityPriority()基于pod亲和性配置匹配”必要条件项”Terms,并发处理所有目标nodes,为其目标node统计亲和weight得分.我们先来看一下它的代码结构:

  • processPod := func(existingPod *v1.Pod) error {… pm.processTerms()}
  • processNode := func(i int) {…}
  • workqueue.ParallelizeUntil(context.TODO(), 16, len(allNodeNames), processNode)
  • fScore = float64(schedulerapi.MaxPriority) * ((pm.counts[node.Name] - minCount) / (maxCount - minCount))

此代码逻辑需理解几个定义:pod 一个”需被调度的Pod”hasAffinityConstraints “被调度的pod”是否有定义亲和配置hasAntiAffinityConstraints “被调度的pod”是否有定义亲和配置existingPod 一个待处理的”亲和目标pod“existingPodNode 运行此“亲和目标pod”的节点—“目标Node”existingHasAffinityConstraints “亲和目标pod”是否存在亲和约束existingHasAntiAffinityConstraints “亲和目标pod”是否存在反亲和约束

!FILENAME pkg/scheduler/algorithm/priorities/interpod_affinity.go:119

  1. func (ipa *InterPodAffinity) CalculateInterPodAffinityPriority(pod *v1.Pod, nodeNameToInfo map[string]*schedulercache.NodeInfo, nodes []*v1.Node) (schedulerapi.HostPriorityList, error) {
  2. affinity := pod.Spec.Affinity
  3. //"需被调度Pod"是否存在亲和、反亲和约束配置
  4. hasAffinityConstraints := affinity != nil && affinity.PodAffinity != nil
  5. hasAntiAffinityConstraints := affinity != nil && affinity.PodAntiAffinity != nil
  6. allNodeNames := make([]string, 0, len(nodeNameToInfo))
  7. for name := range nodeNameToInfo {
  8. allNodeNames = append(allNodeNames, name)
  9. }
  10. var maxCount float64
  11. var minCount float64
  12. pm := newPodAffinityPriorityMap(nodes)
  13. // processPod()主要处理pod亲和和反亲和weight累计的逻辑代码。 ②
  14. // 调用了Terms处理方法:processTerms()
  15. processPod := func(existingPod *v1.Pod) error {
  16. ...
  17. // 亲和性检测逻辑代码 ①
  18. pm.processTerms(terms, pod, existingPod, existingPodNode, 1)
  19. ...
  20. }
  21. //ProcessNode()通过一个判断是否存在亲和性配置选择调用processPod() ③
  22. processNode := func(i int) {
  23. ...
  24. if err := processPod(existingPod); err != nil {
  25. pm.setError(err)
  26. }
  27. ...
  28. }
  29. // 并发多线程处理调用ProcessNode()
  30. workqueue.ParallelizeUntil(context.TODO(), 16, len(allNodeNames), processNode)
  31. ...
  32. for _, node := range nodes {
  33. if pm.counts[node.Name] > maxCount {
  34. maxCount = pm.counts[node.Name]
  35. }
  36. if pm.counts[node.Name] < minCount {
  37. minCount = pm.counts[node.Name]
  38. }
  39. }
  40. result := make(schedulerapi.HostPriorityList, 0, len(nodes))
  41. for _, node := range nodes {
  42. fScore := float64(0)
  43. if (maxCount - minCount) > 0 { //reduce计算fScore分 ④
  44. fScore = float64(schedulerapi.MaxPriority) * ((pm.counts[node.Name] - minCount) / (maxCount - minCount))
  45. }
  46. result = append(result, schedulerapi.HostPriority{
  47. Host: node.Name,
  48. Score: int(fScore)
  49. })
  50. }
  51. }
  52. return result, nil
  53. }

ProcessTerms()给定Pod和此Pod的定义的亲和性配置(podAffinityTerm)、被测目标pod、运行被测目标pod的Node信息,对所有潜在可被调度的Nodes列表进行一一检测,并对根据检测结果为node进行weight累计。流程如下:

  1. “被测Pod”的namespaces是否与“给定的pod”的namespaces是否一致;

  2. “被测Pod”的labels是否与“给定的pod”的podAffinityTerm定义匹配;

  3. 如果前两条件都为True,则对运行“被测的pod”的node的TopologyKey的值与所有潜在可被调度的Node进行遍历检测 TopologyKey的值是否一致,true则累计weight值.

    逻辑理解:

    12实现了找出在同一个namespace下满足被调pod所配置podAffinityTerm的pods;

    3则实现获取topologyKey的值与潜在被调度的Node进行匹配检测” .

    此处则可清楚的理解pod亲和性配置匹配的内在含义与逻辑。

!FILENAME pkg/scheduler/algorithm/priorities/interpod_affinity.go:107

  1. func (p *podAffinityPriorityMap) processTerms(terms []v1.WeightedPodAffinityTerm, podDefiningAffinityTerm, podToCheck *v1.Pod, fixedNode *v1.Node, multiplier int) {
  2. for i := range terms {
  3. term := &terms[i]
  4. p.processTerm(&term.PodAffinityTerm, podDefiningAffinityTerm, podToCheck, fixedNode, float64(term.Weight*int32(multiplier)))
  5. }
  6. }
  7. func (p *podAffinityPriorityMap) processTerm(term *v1.PodAffinityTerm, podDefiningAffinityTerm, podToCheck *v1.Pod, fixedNode *v1.Node, weight float64) {
  8. // 获取namesapce信息(affinityTerm.Namespaces或pod.Namesapce)
  9. // 根据podAffinityTerm定义生成selector对象(参看本文开头的述labelSelector)
  10. namespaces := priorityutil.GetNamespacesFromPodAffinityTerm(podDefiningAffinityTerm, term)
  11. selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector) //labeSelector
  12. if err != nil {
  13. p.setError(err)
  14. return
  15. }
  16. //判断“被检测的Pod”的Namespace和Selector Labels是否匹配
  17. match := priorityutil.PodMatchesTermsNamespaceAndSelector(podToCheck, namespaces, selector)
  18. if match {
  19. func() {
  20. p.Lock()
  21. defer p.Unlock()
  22. for _, node := range p.nodes {
  23. //对"运行被检测亲和Pod的Node节点" 与被考虑的所有Nodes进行一一匹配TopologyKey检查,如相等则进行累加权值
  24. if priorityutil.NodesHaveSameTopologyKey(node, fixedNode, term.TopologyKey) {
  25. p.counts[node.Name] += weight
  26. }
  27. }
  28. }()
  29. }
  30. }

GetNamespaceFromPodAffinitTerm()返回Namespaces列表(如果term未指定Namespace则使用被调度pod的Namespace)

!FILENAME pkg/scheduler/algorithm/priorities/util/topologies.go:28

  1. func GetNamespacesFromPodAffinityTerm(pod *v1.Pod, podAffinityTerm *v1.PodAffinityTerm) sets.String {
  2. names := sets.String{}
  3. if len(podAffinityTerm.Namespaces) == 0 {
  4. names.Insert(pod.Namespace)
  5. } else {
  6. names.Insert(podAffinityTerm.Namespaces...)
  7. }
  8. return names
  9. }

PodMatchesTermsNamespaceAndSelector()检测NameSpace一致性和Labels.selector是否匹配.

!FILENAME pkg/scheduler/algorithm/priorities/util/topologies.go:40

  1. func PodMatchesTermsNamespaceAndSelector(pod *v1.Pod, namespaces sets.String, selector labels.Selector) bool {
  2. if !namespaces.Has(pod.Namespace) {
  3. return false
  4. }
  5. if !selector.Matches(labels.Set(pod.Labels)) {
  6. return false
  7. }
  8. return true
  9. }

processPod() 处理亲和和反亲和逻辑层,调用processTerms()进行检测与统计权重值。

!FILENAME pkg/scheduler/algorithm/priorities/interpod_affinity.go:136

  1. processPod := func(existingPod *v1.Pod) error {
  2. existingPodNode, err := ipa.info.GetNodeInfo(existingPod.Spec.NodeName)
  3. if err != nil {
  4. if apierrors.IsNotFound(err) {
  5. klog.Errorf("Node not found, %v", existingPod.Spec.NodeName)
  6. return nil
  7. }
  8. return err
  9. }
  10. existingPodAffinity := existingPod.Spec.Affinity
  11. existingHasAffinityConstraints := existingPodAffinity != nil && existingPodAffinity.PodAffinity != nil
  12. existingHasAntiAffinityConstraints := existingPodAffinity != nil && existingPodAffinity.PodAntiAffinity != nil
  13. //如果"需被调度的Pod"存在亲和约束,则与"亲和目标Pod"和"亲和目标Node"进行一次ProcessTerms()检测,如果成立则wieght权重值加1倍.
  14. if hasAffinityConstraints {
  15. terms := affinity.PodAffinity.PreferredDuringSchedulingIgnoredDuringExecution
  16. pm.processTerms(terms, pod, existingPod, existingPodNode, 1)
  17. }
  18. // 如果"需被调度的Pod"存在反亲和约束,则与"亲和目标Pod"和"亲和目标Node"进行一次ProcessTerms()检测,如果成立则wieght权重值减1倍.
  19. if hasAntiAffinityConstraints {
  20. terms := affinity.PodAntiAffinity.PreferredDuringSchedulingIgnoredDuringExecution
  21. pm.processTerms(terms, pod, existingPod, existingPodNode, -1)
  22. }
  23. //如果"亲和目标Pod"存在亲和约束,则反过来与"需被调度的Pod"和"亲和目标Node"进行一次ProcessTerms()检测,如果成立则wieght权重值加1倍.
  24. if existingHasAffinityConstraints {
  25. if ipa.hardPodAffinityWeight > 0 {
  26. terms := existingPodAffinity.PodAffinity.RequiredDuringSchedulingIgnoredDuringExecution
  27. for _, term := range terms {
  28. pm.processTerm(&term, existingPod, pod, existingPodNode, float64(ipa.hardPodAffinityWeight))
  29. }
  30. }
  31. terms := existingPodAffinity.PodAffinity.PreferredDuringSchedulingIgnoredDuringExecution
  32. pm.processTerms(terms, existingPod, pod, existingPodNode, 1)
  33. }
  34. // 如果"亲和目标Pod"存在反亲和约束,则反过来与"需被调度的Pod"和"亲和目标Node"进行一次ProcessTerms()检测,如果成立则wieght权重值减1倍.
  35. if existingHasAntiAffinityConstraints {
  36. terms := existingPodAffinity.PodAntiAffinity.PreferredDuringSchedulingIgnoredDuringExecution
  37. pm.processTerms(terms, existingPod, pod, existingPodNode, -1)
  38. }
  39. return nil
  40. }

processNode 如果”被调度pod”未定义亲和配置,则检测潜在Nodes的亲和性定义.

!FILENAME pkg/scheduler/algorithm/priorities/interpod_affinity.go:193

  1. processNode := func(i int) {
  2. nodeInfo := nodeNameToInfo[allNodeNames[i]]
  3. if nodeInfo.Node() != nil {
  4. if hasAffinityConstraints || hasAntiAffinityConstraints {
  5. // We need to process all the nodes.
  6. for _, existingPod := range nodeInfo.Pods() {
  7. if err := processPod(existingPod); err != nil {
  8. pm.setError(err)
  9. }
  10. }
  11. } else {
  12. for _, existingPod := range nodeInfo.PodsWithAffinity() {
  13. if err := processPod(existingPod); err != nil {
  14. pm.setError(err)
  15. }
  16. }
  17. }
  18. }
  19. }

④ 最后的得分fscore计算公式:

  1. // 10 * (node权重累计值 - 最小权重得分值) / (最大权重得分值 - 最小权重得分值)
  2. fScore = float64(schedulerapi.MaxPriority) * ((pm.counts[node.Name] - minCount) / (maxCount - minCount))
  3. const (
  4. // MaxPriority defines the max priority value.
  5. MaxPriority = 10
  6. )

Service亲和性

在default调度器代码内并未注册此预选策略,仅有代码实现。连google/baidu上都无法查询到相关使用案例,配置用法不予分析,仅看下面源码详细分析。

代码场景应用注释译文:一个服务的第一个Pod被调度到带有Label “region=foo”的Nodes(资源集群)上, 那么其服务后面的其它Pod都将调度至Label “region=foo”的Nodes。

Serice亲和性预选策略checkServiceAffinity

通过NewServiceAffinityPredicate()创建一个ServiceAffinity类对象,并返回两个预选策略所必须的处理Func:

  • affinity.checkServiceAffinity 基于预选元数据Meta,对被调度的pod检测Node是否满足服务亲和性.

  • affinity.serverAffinityMetadataProducer 基于预选Meta的pod信息,获取服务信息和在相同NameSpace下的的Pod列表,供亲和检测时使用。

后面将详述处理func

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:955

  1. func NewServiceAffinityPredicate(podLister algorithm.PodLister, serviceLister algorithm.ServiceLister, nodeInfo NodeInfo, labels []string) (algorithm.FitPredicate, PredicateMetadataProducer) {
  2. affinity := &ServiceAffinity{
  3. podLister: podLister,
  4. serviceLister: serviceLister,
  5. nodeInfo: nodeInfo,
  6. labels: labels,
  7. }
  8. return affinity.checkServiceAffinity, affinity.serviceAffinityMetadataProducer
  9. }

affinity.serverAffinityMetadataProducer()输入:predicateMateData返回:services 和 pods

  1. 基于预选MetaData的pod信息查询出services
  2. 基于预选MetaData的pod Lables获取所有匹配的pods,且过滤掉仅剩在同一个Namespace的pods。

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:934

  1. func (s *ServiceAffinity) serviceAffinityMetadataProducer(pm *predicateMetadata) {
  2. if pm.pod == nil {
  3. klog.Errorf("Cannot precompute service affinity, a pod is required to calculate service affinity.")
  4. return
  5. }
  6. pm.serviceAffinityInUse = true
  7. var errSvc, errList error
  8. // 1.基于预选MetaData的pod信息查询services
  9. pm.serviceAffinityMatchingPodServices, errSvc = s.serviceLister.GetPodServices(pm.pod)
  10. // 2.基于预选MetaData的pod Lables获取所有匹配的pods
  11. selector := CreateSelectorFromLabels(pm.pod.Labels)
  12. allMatches, errList := s.podLister.List(selector)
  13. // In the future maybe we will return them as part of the function.
  14. if errSvc != nil || errList != nil {
  15. klog.Errorf("Some Error were found while precomputing svc affinity: \nservices:%v , \npods:%v", errSvc, errList)
  16. }
  17. // 3.过滤掉仅剩在同一个Namespace的pods
  18. pm.serviceAffinityMatchingPodList = FilterPodsByNamespace(allMatches, pm.pod.Namespace)
  19. }

affinity.checkServiceAffinity()基于预处理的MetaData,对被调度的pod检测Node是否满足服务亲和性。

最终的亲和检测Labels:

​ Final affinityLabels =(A ∩ B)+ (B ∩ C) 与 node.Labels 进行Match计算 //∩交集符号

A: 需被调度podNodeSelector配置B: 需被调度pod定义的服务亲和affinityLabels配置C: 被选定的亲和目标NodeLables

!FILENAME pkg/scheduler/algorithm/predicates/predicates.go:992

  1. func (s *ServiceAffinity) checkServiceAffinity(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
  2. var services []*v1.Service
  3. var pods []*v1.Pod
  4. if pm, ok := meta.(*predicateMetadata); ok && (pm.serviceAffinityMatchingPodList != nil || pm.serviceAffinityMatchingPodServices != nil) {
  5. services = pm.serviceAffinityMatchingPodServices
  6. pods = pm.serviceAffinityMatchingPodList
  7. } else {
  8. // Make the predicate resilient in case metadata is missing.
  9. pm = &predicateMetadata{pod: pod}
  10. s.serviceAffinityMetadataProducer(pm)
  11. pods, services = pm.serviceAffinityMatchingPodList, pm.serviceAffinityMatchingPodServices
  12. }
  13. // 筛选掉存在于Node(nodeinfo)上pods,且与之进行podKey比对不相等的pods。 ①
  14. filteredPods := nodeInfo.FilterOutPods(pods)
  15. node := nodeInfo.Node()
  16. if node == nil {
  17. return false, nil, fmt.Errorf("node not found")
  18. }
  19. // affinityLabes交集 ==(A ∩ B)
  20. // A:被调度pod的NodeSelector定义 B:定义的亲和性Labels ②
  21. affinityLabels := FindLabelsInSet(s.labels, labels.Set(pod.Spec.NodeSelector))
  22. // Step 1: If we don't have all constraints, introspect nodes to find the missing constraints.
  23. if len(s.labels) > len(affinityLabels) {
  24. if len(services) > 0 {
  25. if len(filteredPods) > 0 {
  26. //"被选定的亲和Node"
  27. //基于第一个filteredPods获取Node信息
  28. nodeWithAffinityLabels, err := s.nodeInfo.GetNodeInfo(filteredPods[0].Spec.NodeName)
  29. if err != nil {
  30. return false, nil, err
  31. }
  32. // 输入:交集Labels、服务亲和Labels、被选出的亲和Node Lables
  33. // affinityLabels = affinityLabels + 交集(B ∩ C)
  34. // B: 服务亲和Labels C:被选出的亲和Node的Lables ③
  35. AddUnsetLabelsToMap(affinityLabels, s.labels, labels.Set(nodeWithAffinityLabels.Labels))
  36. }
  37. }
  38. }
  39. // 进行一次最终的匹配(affinityLabels 与 被检测亲和的node.Labels ) ④
  40. if CreateSelectorFromLabels(affinityLabels).Matches(labels.Set(node.Labels)) {
  41. return true, nil, nil
  42. }
  43. return false, []algorithm.PredicateFailureReason{ErrServiceAffinityViolated}, nil
  44. }

FilterOutPods()筛选掉存在于Node(nodeinfo)上pods,且与之进行podKey比对不相等的podsfilteredPods = 未在Node上的pods + 在node上但podKey相同的pods

!FILENAME pkg/scheduler/cache/node_info.go:656

  1. func (n *NodeInfo) FilterOutPods(pods []*v1.Pod) []*v1.Pod {
  2. //获取Node的详细信息
  3. node := n.Node()
  4. if node == nil {
  5. return pods
  6. }
  7. filtered := make([]*v1.Pod, 0, len(pods))
  8. for _, p := range pods {
  9. //如果pod(亲和matched)的NodeName 不等于Spec配置的nodeNmae (即pod不在此Node上),将pod放入filtered.
  10. if p.Spec.NodeName != node.Name {
  11. filtered = append(filtered, p)
  12. continue
  13. }
  14. //如果在此Node上,则获取podKey(pod.UID)
  15. //遍历此Node上所有的目标Pods,获取每个podKey进行与匹配pod的podkey是否相同,
  16. //相同则将pod放入filtered并返回
  17. podKey, _ := GetPodKey(p)
  18. for _, np := range n.Pods() {
  19. npodkey, _ := GetPodKey(np)
  20. if npodkey == podKey {
  21. filtered = append(filtered, p)
  22. break
  23. }
  24. }
  25. }
  26. return filtered
  27. }

FindLabelsInSet() 参数一: (B)定义的亲和性Labels配置参数二: (A)被调度pod的定义NodeSelector配置Selector检测存在的于NodeSelector的亲和性Labels配置,则取两者的交集部分. (A ∩ B)

!FILENAME pkg/scheduler/algorithm/predicates/utils.go:26

  1. func FindLabelsInSet(labelsToKeep []string, selector labels.Set) map[string]string {
  2. aL := make(map[string]string)
  3. for _, l := range labelsToKeep {
  4. if selector.Has(l) {
  5. aL[l] = selector.Get(l)
  6. }
  7. }
  8. return aL
  9. }

AddUnsetLabelsToMap()参数一: (N)在FindLabelsInSet()计算出来的交集Labels参数二: (B)定义的亲和性Labels配置参数三: (C)”被选出的亲和Node”上的Lables 检测存在的于”被选出的亲和Node”上的亲和性配置Labels,则取两者的交集部分存放至N. (B ∩ C)=>N

!FILENAME pkg/scheduler/algorithm/predicates/utils.go:37

  1. // 输入:交集Labels、服务亲和Labels、被选出的亲和Node Lables
  2. // 填充:Labels交集 ==(B ∩ C) B: 服务亲和Labels C:被选出的亲和Node Lables
  3. func AddUnsetLabelsToMap(aL map[string]string, labelsToAdd []string, labelSet labels.Set) {
  4. for _, l := range labelsToAdd {
  5. // 如果存在则不作任何操作
  6. if _, exists := aL[l]; exists {
  7. continue
  8. }
  9. // 反之,计算包含的交集部分 C ∩ B
  10. if labelSet.Has(l) {
  11. aL[l] = labelSet.Get(l)
  12. }
  13. }
  14. }

CreateSelectorFromLabels().Match() 返回labels.Selector对象

!FILENAME pkg/scheduler/algorithm/predicates/utils.go:62

  1. func CreateSelectorFromLabels(aL map[string]string) labels.Selector {
  2. if aL == nil || len(aL) == 0 {
  3. return labels.Everything()
  4. }
  5. return labels.Set(aL).AsSelector()
  6. }

End