Web Interface

Information about the current state of the network helps to track progress,identify performance issues, and debug failures.

Dask.distributed includes a web interface to help deliver this information overa normal web page in real time. This web interface is launched by defaultwherever the scheduler is launched if the scheduler machine has Bokehinstalled (conda install bokeh -c bokeh).

These diagnostic pages are:

  • Main Scheduler pages at http://scheduler-address:8787. These pages,particularly the /status page are the main page that most peopleassociate with Dask. These pages are served from a separate standaloneBokeh server application running in a separate process.

The available pages are http://scheduler-address:8787/<page>/ where <page> is one of

  • status: a stream of recently run tasks, progress bars, resource use
  • tasks: a larger stream of the last 100k tasks
  • workers: basic information about workers and their current load
  • health: basic health check, returns ok if service is running

Plots

Example Computation

The following plots show a trace of the following computation:

  1. from distributed import Client
  2. from time import sleep
  3. import random
  4.  
  5. def inc(x):
  6. sleep(random.random() / 10)
  7. return x + 1
  8.  
  9. def dec(x):
  10. sleep(random.random() / 10)
  11. return x - 1
  12.  
  13. def add(x, y):
  14. sleep(random.random() / 10)
  15. return x + y
  16.  
  17.  
  18. client = Client('127.0.0.1:8786')
  19.  
  20. incs = client.map(inc, range(100))
  21. decs = client.map(dec, range(100))
  22. adds = client.map(add, incs, decs)
  23. total = client.submit(sum, adds)
  24.  
  25. del incs, decs, adds
  26. total.result()

Progress

The interface shows the progress of the various computations as well as theexact number completed.Resources view of Dask web interfaceEach bar is assigned a color according to the function being run. Each barhas a few components. On the left the lighter shade is the number of tasksthat have both completed and have been released from memory. The darker shadeto the right corresponds to the tasks that are completed and whose data stillreside in memory. If errors occur then they appear as a black colored blockto the right.

Typical computations may involve dozens of kinds of functions. We handle thisvisually with the following approaches:

  • Functions are ordered by the number of total tasks
  • The colors are assigned in a round-robin fashion from a standard palette
  • The progress bars shrink horizontally to make space for more functions
  • Only the largest functions (in terms of number of tasks) are displayedProgress bar plot of Dask web interfaceCounts of tasks processing, waiting for dependencies, processing, etc.. aredisplayed in the title bar.

Memory Use

The interface shows the relative memory use of each function with a horizontalbar sorted by function name.Memory use plot of Dask web interfaceThe title shows the number of total bytes in use. Hovering over any bartells you the specific function and how many bytes its results are activelytaking up in memory. This does not count data that has been released.

Task Stream

The task stream plot shows when tasks complete on which workers. Worker coresare on the y-axis and time is on the x-axis. As a worker completes a task itsstart and end times are recorded and a rectangle is added to this plotaccordingly.Task stream plot of Dask web interfaceThe colors signifying the following:

  • Serialization (gray)
  • Communication between workers (red)
  • Disk I/O (orange)
  • Error (black)
  • Execution times (colored by task: purple, green, yellow, etc)If data transfer occurs between workers a red bar appears preceding thetask bar showing the duration of the transfer. If an error occurs than ablack bar replaces the normal color. This plot show the last 1000 tasks.It resets if there is a delay greater than 10 seconds.

For a full history of the last 100,000 tasks see the tasks/ page.

Resources

The resources plot show the average CPU and Memory use over time as well asaverage network traffic. More detailed information on a per-worker basis isavailable in the workers/ page.Resources view of Dask web interface

Per-worker resources

The workers/ page shows per-worker resources, the main ones being CPU andmemory use. Custom metrics can be registered and displayed in this page. Hereis an example showing how to display GPU utilization and GPU memory use:

  1. import subprocess
  2.  
  3. def nvidia_data(name):
  4. def dask_function(dask_worker):
  5. cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(name)
  6. result = subprocess.check_output(cmd.split())
  7. return result.strip().decode()
  8. return dask_function
  9.  
  10. def register_metrics(dask_worker):
  11. for name in ['utilization.gpu', 'utilization.memory']:
  12. dask_worker.metrics[name] = nvidia_data(name)
  13.  
  14. client.run(register_metrics)

Connecting to Web Interface

Default

By default, dask-scheduler prints out the address of the web interface:

  1. INFO - Bokeh UI at: http://10.129.39.91:8787/status
  2. ...
  3. INFO - Starting Bokeh server on port 8787 with applications at paths ['/status', '/tasks']

The machine hosting the scheduler runs an HTTP server serving at that address.

Troubleshooting

Some clusters restrict the ports that are visible to the outside world. Theseports may include the default port for the web interface, 8787. There area few ways to handle this:

  • Open port 8787 to the outside world. Often this involves asking yourcluster administrator.
  • Use a different port that is publicly accessible using the—dashboard-address :8787 option on the dask-scheduler command.
  • Use fancier techniques, like Port ForwardingRunning distributed on a remote machine can cause issues with viewing the webUI – this depends on the remote machines network configuration.

Port Forwarding

If you have SSH access then one way to gain access to a blocked port is throughSSH port forwarding. A typical use case looks like the following:

  1. local$ ssh -L 8000:localhost:8787 [email protected]
  2. remote$ dask-scheduler # now, the web UI is visible at localhost:8000
  3. remote$ # continue to set up dask if needed -- add workers, etc

It is then possible to go to localhost:8000 and see Dask Web UI. This same approach isnot specific to dask.distributed, but can be used by any service that operates over anetwork, such as Jupyter notebooks. For example, if we chose to do this we couldforward port 8888 (the default Jupyter port) to port 8001 withssh -L 8001:localhost:8888 user@remote.