Prometheus Monitoring

Prometheus is a widely popular tool for monitoring and alerting a wide variety of systems. Dask.distributed exposesscheduler and worker metrics in a prometheus text based format. Metrics are available at http://scheduler-address:8787/metrics.

Available metrics are as following

Metric nameDescriptionSchedulerWorker
python_gc_objects_collected_totalObjects collected during gc.YesYes
python_gc_objects_uncollectable_totalUncollectable object found during GC.YesYes
python_gc_collections_totalNumber of times this generation was collected.YesYes
python_infoPython platform information.YesYes
dask_scheduler_workersNumber of workers connected.Yes
dask_scheduler_clientsNumber of clients connected.Yes
dask_scheduler_tasksNumber of tasks at scheduler.Yes
dask_worker_tasksNumber of tasks at worker. Yes
dask_worker_connectionsNumber of task connections to other workers. Yes
dask_worker_threadsNumber of worker threads. Yes
dask_worker_latency_secondsLatency of worker connection. Yes
dask_worker_tick_duration_median_secondsMedian tick duration at worker. Yes
dask_worker_task_duration_median_secondsMedian task runtime at worker. Yes
dask_worker_transfer_bandwidth_median_bytesBandwidth for transfer at worker in Bytes. Yes