Diagnostics (distributed)

The Dask distributed scheduler provides live feedback in twoforms:

  • An interactive dashboard containing many plots and tables with liveinformation
  • A progress bar suitable for interactive use in consoles or notebooks

Dashboard

If Bokeh is installedthen the dashboard will start up automatically whenever the scheduler is created.For local use this happens when you create a client with no arguments:
  1. from dask.distributed import Clientclient = Client() # start distributed scheduler locally. Launch dashboard
It is typically served at http://localhost:8787/status ,but may be served elsewhere if this port is taken.The address of the dashboard will be displayed if you are in a Jupyter Notebook,or can be queriesd from client.scheduler_info()['services'].There are numerous pages with information about task runtimes, communication,statistical profiling, load balancing, memory use, and much more.For more information we recommend the video guide above.
Client([address, loop, timeout, …])Connect to and submit computation to a Dask cluster
## Capture diagnostics
get_task_stream([client, plot, filename])Collect task stream within a context block
Client.profile(self[, key, start, stop, …])Collect statistical profiling information about recent work
performance_report([filename])Gather performance report
You can capture some of the same information that the dashboard presents foroffline processing using the get_task_stream and Client.profilefunctions. These capture the start and stop time of every task and transfer,as well as the results of a statistical profiler.
  1. with get_task_stream(plot='save', filename="task-stream.html") as ts: x.compute()client.profile(filename="dask-profile.html")history = ts.data
Additionally, Dask can save many diagnostics dashboards at once including thetask stream, worker profiles, bandwidths, etc. with the performance_reportcontext manager:
  1. from dask.distributed import performance_reportwith performance_report(filename="dask-report.html"): ## some dask computation
The following video demonstrates the performance_report context manager in greaterdetail:

Progress bar

progress(*futures[, notebook, multi, complete])Track progress of futures

The dask.distributed progress bar differs from the ProgressBar used forlocal diagnostics.The progress function takes a Dask object that is executing in the background:

  1. # Single machine progress bar
  2. from dask.diagnostics import ProgressBar
  3.  
  4. with ProgressBar():
  5. x.compute()
  6.  
  7. # Distributed scheduler ProgressBar
  8.  
  9. from dask.distributed import Client, progress
  10.  
  11. client = Client() # use dask.distributed by default
  12.  
  13. x = x.persist() # start computation in the background
  14. progress(x) # watch progress
  15.  
  16. x.compute() # convert to final result when done if desired

External Documentation

More in-depth technical documentation about Dask’s distributed scheduler isavailable at https://distributed.dask.org/en/latest

API

  • dask.distributed.progress(*futures, notebook=None, multi=True, complete=True, **kwargs)
  • Track progress of futures

This operates differently in the notebook and the console

  • Notebook: This returns immediately, leaving an IPython widget on screen
  • Console: This blocks until the computation completes

Parameters:

  • futures: Futures
  • A list of futures or keys to track

  • notebook: bool (optional)

  • Running in the notebook or not (defaults to guess)

  • multi: bool (optional)

  • Track different functions independently (defaults to True)

  • complete: bool (optional)

  • Track all keys (True) or only keys that have not yet run (False)(defaults to True)

Notes

In the notebook, the output of progress must be the last statementin the cell. Typically, this means calling progress at the end of acell.

Examples

  1. >>> progress(futures) # doctest: +SKIP
  2. [########################################] | 100% Completed | 1.7s
  • dask.distributed.gettask_stream(_client=None, plot=False, filename='task-stream.html')
  • Collect task stream within a context block

This provides diagnostic information about every task that was run duringthe time when this block was active.

This must be used as a context manager.

Parameters:

  • plot: boolean, str
  • If true then also return a Bokeh figureIf plot == ‘save’ then save the figure to a file

  • filename: str (optional)

  • The filename to save to if you set plot='save'

See also

  • Client.get_task_stream
  • Function version of this context manager

Examples

  1. >>> with get_task_stream() as ts:
  2. ... x.compute()
  3. >>> ts.data
  4. [...]

Get back a Bokeh figure and optionally save to a file

  1. >>> with get_task_stream(plot='save', filename='task-stream.html') as ts:
  2. ... x.compute()
  3. >>> ts.figure
  4. <Bokeh Figure>

To share this file with others you may wish to upload and serve it online.A common way to do this is to upload the file as a gist, and then serve iton https://raw.githack.com

  1. $ pip install gist
  2. $ gist task-stream.html
  3. https://gist.github.com/8a5b3c74b10b413f612bb5e250856ceb

You can then navigate to that site, click the “Raw” button to the right ofthe task-stream.html file, and then provide that URL tohttps://raw.githack.com . This process should provide a sharable link thatothers can use to see your task stream plot.