Quickstart

Install

  1. $ pip install dask distributed --upgrade

See installation document for more information.

Setup Dask.distributed the Easy Way

If you create a client without providing an address it will start up a localscheduler and worker for you.

  1. >>> from dask.distributed import Client
  2. >>> client = Client() # set up local cluster on your laptop
  3. >>> client
  4. <Client: scheduler="127.0.0.1:8786" processes=8 cores=8>

Setup Dask.distributed the Hard Way

This allows dask.distributed to use multiple machines as workers.

Set up scheduler and worker processes on your local computer:

  1. $ dask-scheduler
  2. Scheduler started at 127.0.0.1:8786
  3.  
  4. $ dask-worker 127.0.0.1:8786
  5. $ dask-worker 127.0.0.1:8786
  6. $ dask-worker 127.0.0.1:8786

Note

At least one dask-worker must be running after launching ascheduler.

Launch a Client and point it to the IP/port of the scheduler.

  1. >>> from dask.distributed import Client
  2. >>> client = Client('127.0.0.1:8786')

See setup documentation for advanced use.

Map and Submit Functions

Use the map and submit methods to launch computations on the cluster.The map/submit functions send the function and arguments to the remoteworkers for processing. They return Future objects that refer to remotedata on the cluster. The Future returns immediately while the computationsrun remotely in the background.

  1. >>> def square(x):
  2. return x ** 2
  3.  
  4. >>> def neg(x):
  5. return -x
  6.  
  7. >>> A = client.map(square, range(10))
  8. >>> B = client.map(neg, A)
  9. >>> total = client.submit(sum, B)
  10. >>> total.result()
  11. -285

Gather

The map/submit functions return Future objects, lightweight tokens thatrefer to results on the cluster. By default the results of computationsstay on the cluster.

  1. >>> total # Function hasn't yet completed
  2. <Future: status: waiting, key: sum-58999c52e0fa35c7d7346c098f5085c7>
  3.  
  4. >>> total # Function completed, result ready on remote worker
  5. <Future: status: finished, key: sum-58999c52e0fa35c7d7346c098f5085c7>

Gather results to your local machine either with the Future.result methodfor a single future, or with the Client.gather method for many futures atonce.

  1. >>> total.result() # result for single future
  2. -285
  3. >>> client.gather(A) # gather for many futures
  4. [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Restart

When things go wrong, or when you want to reset the cluster state, call therestart method.

  1. >>> client.restart()

See client for advanced use.