集群指标扩展

依赖

为了使用集群指标扩展(Cluster Metrics Extension),你需要将以下依赖添加到你的项目中:

  1. <!-- Maven -->
  2. <dependency>
  3. <groupId>com.typesafe.akka</groupId>
  4. <artifactId>akka-cluster-metrics_2.12</artifactId>
  5. <version>2.5.22</version>
  6. </dependency>
  7. <!-- Gradle -->
  8. dependencies {
  9. compile group: 'com.typesafe.akka', name: 'akka-cluster-metrics_2.12', version: '2.5.22'
  10. }
  11. <!-- sbt -->
  12. libraryDependencies += "com.typesafe.akka" %% "akka-cluster-metrics" % "2.5.22"

并将以下配置添加到application.conf中:

  1. akka.extensions = [ "akka.cluster.metrics.ClusterMetricsExtension" ]

简介

集群的成员节点可以收集系统健康指标,并在集群指标扩展的帮助下将其发布到其他集群节点和系统事件总线上注册的订阅者。

集群指标信息主要用于负载均衡路由器(load-balancing routers),也可用于实现基于指标的高级节点生命周期,例如当 CPU 窃取时间过多时“节点让它崩溃”。

如果启用了该功能,状态为「WeaklyUp」的集群成员将参与集群指标收集和分发。

指标收集器

指标集合委托给akka.cluster.metrics.MetricsCollector.的实现。

不同的收集器(collector)实现提供发布到集群的不同指标子集。当未设置Sigar时,某些消息路由和让其崩溃功能可能无法工作。

集群指标扩展附带两个内置收集器实现:

  1. akka.cluster.metrics.SigarMetricsCollector,它要求提供Sigar,并且更丰富/更精确
  2. akka.cluster.metrics.JmxMetricsCollector,用作回退,不太丰富/精确

你也可以插入(plug-in)自己的指标收集器实现。

默认情况下,指标扩展将使用收集器提供程序回滚,并尝试按以下顺序加载它们:

  1. 配置的用户提供的收集器
  2. 内置的akka.cluster.metrics.SigarMetricsCollector
  3. 最后是akka.cluster.metrics.JmxMetricsCollector

指标事件

指标扩展定期地将集群指标的当前快照发布到节点系统事件总线。

发布间隔由akka.cluster.metrics.collector.sample-interval设置控制。

akka.cluster.metrics.ClusterMetricsChanged事件的有效负载将包含节点的最新指标,以及在收集器采样间隔期间接收到的其他群集成员节点指标流言。

你可以通过指标侦听器 Actor 订阅这些事件,以实现自定义节点生命周期:

  1. ClusterMetricsExtension.get(system).subscribe(metricsListenerActor);

Hyperic Sigar 配置

与可以从普通 JMX MBean 中检索到的指标相比,用户提供的指标收集器和内置的指标收集器都可以选择使用Hyperic Sigar来获取更广泛、更准确的指标范围。

Sigar使用的是本机 O/S 库,需要提供库,即在运行时将 O/S 本机库部署、提取和加载到 JVM 中。

用户可以通过以下方式之一提供Sigar类和本机库:

  1. 使用「Kamon sigar-loader」加载器用作用户项目的项目依赖项。指标扩展将根据需要在Kamon sigar provisioner的帮助下提取和加载Sigar库。
  2. 使用「Kamon sigar-loader」加载器作为 Java 代理:java -javaagent:/path/to/sigar-loader.jarKamon sigar loader代理将在 JVM 启动期间提取和加载Sigar库。
  3. sigar.jar放在classpath上,将 O/S 的Sigar本机库放在java.library.path上。用户需要手动管理项目依赖项和库部署。
  • 警告:当使用Kamon sigar loader并在同一主机上运行同一应用程序的多个实例时,必须确保将Sigar库提取到一个唯一的每个实例目录中。你可以使用akka.cluster.metrics.native-library-extract-folder配置设置控制提取目录。

为了使用Sigar的功能,需要在用户项目中添加以下依赖项:

  1. <!-- Maven -->
  2. <dependency>
  3. <groupId>io.kamon</groupId>
  4. <artifactId>sigar-loader</artifactId>
  5. <version>1.6.6-rev002</version>
  6. </dependency>
  7. <!-- Gradle -->
  8. dependencies {
  9. compile group: 'io.kamon', name: 'sigar-loader', version: '1.6.6-rev002'
  10. }
  11. <!-- sbt -->
  12. libraryDependencies += "io.kamon" % "sigar-loader" % "1.6.6-rev002"

你可以从「Maven Central」中下载Kamon sigar loader的依赖包。

自适应负载平衡

AdaptiveLoadBalancingPool / AdaptiveLoadBalancingGroup根据集群指标数据对集群节点的消息执行负载平衡。它使用随机选择的路由,概率来自于相应节点的剩余容量。它可以配置为使用特定的MetricsSelector来产生概率,即a.k.a.权重:

  • heap / HeapMetricsSelector - 已用和最大 JVM 堆内存。基于剩余堆容量的权重;(max - used) / max
  • load / SystemLoadAverageMetricsSelector - 过去 1 分钟的系统平均负载,在 Linux 系统顶部可以找到相应的值。如果系统平均负载接近cpus/cores的数量,则系统可能接近瓶颈。基于剩余负载能力的权重;1 - (load / processors)
  • cpu / CpuMetricsSelector - 以百分比表示的 CPU 利用率,User + Sys + Nice + Wait之和。基于剩余 CPU 容量的权重;1 - utilization
  • mix / MixMetricsSelector - 组合堆、CPU 和负载。基于组合选择器剩余容量平均值的权重。
  • akka.cluster.metrics.MetricsSelector的任何自定义实现

使用「指数加权移动平均值」平滑收集的指标值。在「集群配置」中,你可以调整过去的数据相对于新数据的衰减速度。

让我们来看看这台正在运行的路由器。还有什么比计算阶乘(factorial)更苛刻的呢?

执行阶乘计算的后端工作程序:

  1. public class FactorialBackend extends AbstractActor {
  2. @Override
  3. public Receive createReceive() {
  4. return receiveBuilder()
  5. .match(
  6. Integer.class,
  7. n -> {
  8. CompletableFuture<FactorialResult> result =
  9. CompletableFuture.supplyAsync(() -> factorial(n))
  10. .thenApply((factorial) -> new FactorialResult(n, factorial));
  11. pipe(result, getContext().dispatcher()).to(getSender());
  12. })
  13. .build();
  14. }
  15. BigInteger factorial(int n) {
  16. BigInteger acc = BigInteger.ONE;
  17. for (int i = 1; i <= n; ++i) {
  18. acc = acc.multiply(BigInteger.valueOf(i));
  19. }
  20. return acc;
  21. }
  22. }

接收用户作业并通过路由器委派到后端的前端:

  1. public class FactorialFrontend extends AbstractActor {
  2. final int upToN;
  3. final boolean repeat;
  4. LoggingAdapter log = Logging.getLogger(getContext().getSystem(), this);
  5. ActorRef backend =
  6. getContext().actorOf(FromConfig.getInstance().props(), "factorialBackendRouter");
  7. public FactorialFrontend(int upToN, boolean repeat) {
  8. this.upToN = upToN;
  9. this.repeat = repeat;
  10. }
  11. @Override
  12. public void preStart() {
  13. sendJobs();
  14. getContext().setReceiveTimeout(Duration.ofSeconds(10));
  15. }
  16. @Override
  17. public Receive createReceive() {
  18. return receiveBuilder()
  19. .match(
  20. FactorialResult.class,
  21. result -> {
  22. if (result.n == upToN) {
  23. log.debug("{}! = {}", result.n, result.factorial);
  24. if (repeat) sendJobs();
  25. else getContext().stop(getSelf());
  26. }
  27. })
  28. .match(
  29. ReceiveTimeout.class,
  30. x -> {
  31. log.info("Timeout");
  32. sendJobs();
  33. })
  34. .build();
  35. }
  36. void sendJobs() {
  37. log.info("Starting batch of factorials up to [{}]", upToN);
  38. for (int n = 1; n <= upToN; n++) {
  39. backend.tell(n, getSelf());
  40. }
  41. }
  42. }

如你所见,路由器的定义方式与其他路由器相同,在这种情况下,配置如下:

  1. akka.actor.deployment {
  2. /factorialFrontend/factorialBackendRouter = {
  3. # Router type provided by metrics extension.
  4. router = cluster-metrics-adaptive-group
  5. # Router parameter specific for metrics extension.
  6. # metrics-selector = heap
  7. # metrics-selector = load
  8. # metrics-selector = cpu
  9. metrics-selector = mix
  10. #
  11. routees.paths = ["/user/factorialBackend"]
  12. cluster {
  13. enabled = on
  14. use-roles = ["backend"]
  15. allow-local-routees = off
  16. }
  17. }
  18. }

只有router类型和metrics-selector参数特定于此路由器,其他事物的工作方式与其他路由器相同。

同样类型的路由器也可以在代码中定义:

  1. int totalInstances = 100;
  2. Iterable<String> routeesPaths = Arrays.asList("/user/factorialBackend", "");
  3. boolean allowLocalRoutees = true;
  4. Set<String> useRoles = new HashSet<>(Arrays.asList("backend"));
  5. ActorRef backend =
  6. getContext()
  7. .actorOf(
  8. new ClusterRouterGroup(
  9. new AdaptiveLoadBalancingGroup(
  10. HeapMetricsSelector.getInstance(), Collections.<String>emptyList()),
  11. new ClusterRouterGroupSettings(
  12. totalInstances, routeesPaths, allowLocalRoutees, useRoles))
  13. .props(),
  14. "factorialBackendRouter2");
  15. int totalInstances = 100;
  16. int maxInstancesPerNode = 3;
  17. boolean allowLocalRoutees = false;
  18. Set<String> useRoles = new HashSet<>(Arrays.asList("backend"));
  19. ActorRef backend =
  20. getContext()
  21. .actorOf(
  22. new ClusterRouterPool(
  23. new AdaptiveLoadBalancingPool(
  24. SystemLoadAverageMetricsSelector.getInstance(), 0),
  25. new ClusterRouterPoolSettings(
  26. totalInstances, maxInstancesPerNode, allowLocalRoutees, useRoles))
  27. .props(Props.create(FactorialBackend.class)),
  28. "factorialBackendRouter3");

运行自适应负载平衡示例最简单的方法下载「Akka Cluster Sample with Java」中的代码和教程。它包含有关如何运行自适应负载平衡示例的说明,此示例的源代码也可以在「 Akka Samples Repository」中找到。

订阅指标事件

可以直接订阅指标事件来实现其他功能。

  1. import akka.actor.AbstractActor;
  2. import akka.cluster.Cluster;
  3. import akka.cluster.ClusterEvent.CurrentClusterState;
  4. import akka.cluster.metrics.ClusterMetricsChanged;
  5. import akka.cluster.metrics.NodeMetrics;
  6. import akka.cluster.metrics.StandardMetrics;
  7. import akka.cluster.metrics.StandardMetrics.HeapMemory;
  8. import akka.cluster.metrics.StandardMetrics.Cpu;
  9. import akka.cluster.metrics.ClusterMetricsExtension;
  10. import akka.event.Logging;
  11. import akka.event.LoggingAdapter;
  12. public class MetricsListener extends AbstractActor {
  13. LoggingAdapter log = Logging.getLogger(getContext().getSystem(), this);
  14. Cluster cluster = Cluster.get(getContext().getSystem());
  15. ClusterMetricsExtension extension = ClusterMetricsExtension.get(getContext().getSystem());
  16. // Subscribe unto ClusterMetricsEvent events.
  17. @Override
  18. public void preStart() {
  19. extension.subscribe(getSelf());
  20. }
  21. // Unsubscribe from ClusterMetricsEvent events.
  22. @Override
  23. public void postStop() {
  24. extension.unsubscribe(getSelf());
  25. }
  26. @Override
  27. public Receive createReceive() {
  28. return receiveBuilder()
  29. .match(
  30. ClusterMetricsChanged.class,
  31. clusterMetrics -> {
  32. for (NodeMetrics nodeMetrics : clusterMetrics.getNodeMetrics()) {
  33. if (nodeMetrics.address().equals(cluster.selfAddress())) {
  34. logHeap(nodeMetrics);
  35. logCpu(nodeMetrics);
  36. }
  37. }
  38. })
  39. .match(
  40. CurrentClusterState.class,
  41. message -> {
  42. // Ignore.
  43. })
  44. .build();
  45. }
  46. void logHeap(NodeMetrics nodeMetrics) {
  47. HeapMemory heap = StandardMetrics.extractHeapMemory(nodeMetrics);
  48. if (heap != null) {
  49. log.info("Used heap: {} MB", ((double) heap.used()) / 1024 / 1024);
  50. }
  51. }
  52. void logCpu(NodeMetrics nodeMetrics) {
  53. Cpu cpu = StandardMetrics.extractCpu(nodeMetrics);
  54. if (cpu != null && cpu.systemLoadAverage().isDefined()) {
  55. log.info("Load: {} ({} processors)", cpu.systemLoadAverage().get(), cpu.processors());
  56. }
  57. }
  58. }

自定义指标收集器

指标集合委托给akka.cluster.metrics.MetricsCollector的实现

你也可以插入自己的指标收集器,而不是内置的akka.cluster.metrics.SigarMetricsCollectorakka.cluster.metrics.JmxMetricsCollector

看看这两个实现的灵感。

自定义指标收集器实现类必须在akka.cluster.metrics.collector.provider配置属性中指定。

配置

可以使用以下属性配置群集指标扩展:

  1. ##############################################
  2. # Akka Cluster Metrics Reference Config File #
  3. ##############################################
  4. # This is the reference config file that contains all the default settings.
  5. # Make your edits in your application.conf in order to override these settings.
  6. # Sigar provisioning:
  7. #
  8. # User can provision sigar classes and native library in one of the following ways:
  9. #
  10. # 1) Use https://github.com/kamon-io/sigar-loader Kamon sigar-loader as a project dependency for the user project.
  11. # Metrics extension will extract and load sigar library on demand with help of Kamon sigar provisioner.
  12. #
  13. # 2) Use https://github.com/kamon-io/sigar-loader Kamon sigar-loader as java agent: `java -javaagent:/path/to/sigar-loader.jar`
  14. # Kamon sigar loader agent will extract and load sigar library during JVM start.
  15. #
  16. # 3) Place `sigar.jar` on the `classpath` and sigar native library for the o/s on the `java.library.path`
  17. # User is required to manage both project dependency and library deployment manually.
  18. # Cluster metrics extension.
  19. # Provides periodic statistics collection and publication throughout the cluster.
  20. akka.cluster.metrics {
  21. # Full path of dispatcher configuration key.
  22. # Use "" for default key `akka.actor.default-dispatcher`.
  23. dispatcher = ""
  24. # How long should any actor wait before starting the periodic tasks.
  25. periodic-tasks-initial-delay = 1s
  26. # Sigar native library extract location.
  27. # Use per-application-instance scoped location, such as program working directory.
  28. native-library-extract-folder = ${user.dir}"/native"
  29. # Metrics supervisor actor.
  30. supervisor {
  31. # Actor name. Example name space: /system/cluster-metrics
  32. name = "cluster-metrics"
  33. # Supervision strategy.
  34. strategy {
  35. #
  36. # FQCN of class providing `akka.actor.SupervisorStrategy`.
  37. # Must have a constructor with signature `<init>(com.typesafe.config.Config)`.
  38. # Default metrics strategy provider is a configurable extension of `OneForOneStrategy`.
  39. provider = "akka.cluster.metrics.ClusterMetricsStrategy"
  40. #
  41. # Configuration of the default strategy provider.
  42. # Replace with custom settings when overriding the provider.
  43. configuration = {
  44. # Log restart attempts.
  45. loggingEnabled = true
  46. # Child actor restart-on-failure window.
  47. withinTimeRange = 3s
  48. # Maximum number of restart attempts before child actor is stopped.
  49. maxNrOfRetries = 3
  50. }
  51. }
  52. }
  53. # Metrics collector actor.
  54. collector {
  55. # Enable or disable metrics collector for load-balancing nodes.
  56. # Metrics collection can also be controlled at runtime by sending control messages
  57. # to /system/cluster-metrics actor: `akka.cluster.metrics.{CollectionStartMessage,CollectionStopMessage}`
  58. enabled = on
  59. # FQCN of the metrics collector implementation.
  60. # It must implement `akka.cluster.metrics.MetricsCollector` and
  61. # have public constructor with akka.actor.ActorSystem parameter.
  62. # Will try to load in the following order of priority:
  63. # 1) configured custom collector 2) internal `SigarMetricsCollector` 3) internal `JmxMetricsCollector`
  64. provider = ""
  65. # Try all 3 available collector providers, or else fail on the configured custom collector provider.
  66. fallback = true
  67. # How often metrics are sampled on a node.
  68. # Shorter interval will collect the metrics more often.
  69. # Also controls frequency of the metrics publication to the node system event bus.
  70. sample-interval = 3s
  71. # How often a node publishes metrics information to the other nodes in the cluster.
  72. # Shorter interval will publish the metrics gossip more often.
  73. gossip-interval = 3s
  74. # How quickly the exponential weighting of past data is decayed compared to
  75. # new data. Set lower to increase the bias toward newer values.
  76. # The relevance of each data sample is halved for every passing half-life
  77. # duration, i.e. after 4 times the half-life, a data sample’s relevance is
  78. # reduced to 6% of its original relevance. The initial relevance of a data
  79. # sample is given by 1 – 0.5 ^ (collect-interval / half-life).
  80. moving-average-half-life = 12s
  81. }
  82. }
  83. # Cluster metrics extension serializers and routers.
  84. akka.actor {
  85. # Protobuf serializer for remote cluster metrics messages.
  86. serializers {
  87. akka-cluster-metrics = "akka.cluster.metrics.protobuf.MessageSerializer"
  88. }
  89. # Interface binding for remote cluster metrics messages.
  90. serialization-bindings {
  91. "akka.cluster.metrics.ClusterMetricsMessage" = akka-cluster-metrics
  92. "akka.cluster.metrics.AdaptiveLoadBalancingPool" = akka-cluster-metrics
  93. "akka.cluster.metrics.MixMetricsSelector" = akka-cluster-metrics
  94. "akka.cluster.metrics.CpuMetricsSelector$" = akka-cluster-metrics
  95. "akka.cluster.metrics.HeapMetricsSelector$" = akka-cluster-metrics
  96. "akka.cluster.metrics.SystemLoadAverageMetricsSelector$" = akka-cluster-metrics
  97. }
  98. # Globally unique metrics extension serializer identifier.
  99. serialization-identifiers {
  100. "akka.cluster.metrics.protobuf.MessageSerializer" = 10
  101. }
  102. # Provide routing of messages based on cluster metrics.
  103. router.type-mapping {
  104. cluster-metrics-adaptive-pool = "akka.cluster.metrics.AdaptiveLoadBalancingPool"
  105. cluster-metrics-adaptive-group = "akka.cluster.metrics.AdaptiveLoadBalancingGroup"
  106. }
  107. }

英文原文链接Cluster Metrics Extension.