Install Dask

You can install dask with conda, with pip, or by installing from source.

Conda

Dask is installed by default in Anaconda.You can update Dask using the conda command:

  1. conda install dask

This installs Dask and all common dependencies, including Pandas and NumPy.Dask packages are maintained both on the default channel and on conda-forge.Optionally, you can obtain a minimal Dask installation using the following command:

  1. conda install dask-core

This will install a minimal set of dependencies required to run Dask similar to (but not exactly the same as) pip install dask below.

Pip

You can install everything required for most common uses of Dask (arrays,dataframes, …) This installs both Dask and dependencies like NumPy, Pandas,and so on that are necessary for different workloads. This is often the rightchoice for Dask users:

  1. pip install "dask[complete]" # Install everything

You can also install only the Dask library. Modules like dask.array,dask.dataframe, dask.delayed, or dask.distributed won’t work until you also install NumPy,Pandas, Toolz, or Tornado, respectively. This is common for downstream librarymaintainers:

  1. pip install dask # Install only core parts of dask

We also maintain other dependency sets for different subsets of functionality:

  1. pip install "dask[array]" # Install requirements for dask array
  2. pip install "dask[bag]" # Install requirements for dask bag
  3. pip install "dask[dataframe]" # Install requirements for dask dataframe
  4. pip install "dask[delayed]" # Install requirements for dask delayed
  5. pip install "dask[distributed]" # Install requirements for distributed dask

We have these options so that users of the lightweight core Dask scheduleraren’t required to download the more exotic dependencies of the collections(Numpy, Pandas, Tornado, etc.).

Install from Source

To install Dask from source, clone the repository from github:

  1. git clone https://github.com/dask/dask.git
  2. cd dask
  3. pip install .

You can also install all dependencies as well:

  1. pip install ".[complete]"

You can view the list of all dependencies within the extras_require fieldof setup.py.

Or do a developer install by using the -e flag:

  1. pip install -e .

Anaconda

Dask is included by default in the Anaconda distribution.

Optional dependencies

Specific functionality in Dask may require additional optional dependencies.For example, reading from Amazon S3 requires s3fs.These optional dependencies and their minimum supported versions are listed below.

DependencyVersionDescription
bokeh>=1.0.0Visualizing dask diagnostics
cloudpickle>=0.2.1Pickling support for Python objects
cityhash Faster hashing of arrays
distributed>=2.0Distributed computing in Python
fastparquet Storing and reading data from parquet files
fsspec>=0.6.0Used for local, cluster and remote data IO
gcsfs>=0.4.0File-system interface to Google Cloud Storage
murmurhash Faster hashing of arrays
numpy>=1.13.0Required for dask.array
pandas>=0.21.0Required for dask.dataframe
partd>=0.3.10Concurrent appendable key-value storage
psutil Enables a more accurate CPU count
pyarrow>=0.14.0Python library for Apache Arrow
s3fs>=0.4.0Reading from Amazon S3
sqlalchemy Writing and reading from SQL databases
toolz>=0.7.3Utility functions for iterators, functions, and dictionaries
xxhash Faster hashing of arrays

Test

Test Dask with py.test:

  1. cd dask
  2. py.test dask

Please be aware that installing Dask naively may not install allrequirements by default. Please read the pip section above which discussesrequirements. You may choose to install the dask[complete] version which includesall dependencies for all collections. Alternatively, you may choose to testonly certain submodules depending on the libraries within your environment.For example, to test only Dask core and Dask array we would run tests asfollows:

  1. py.test dask/tests dask/array/tests