GPUs

Dask works with GPUs in a few ways.

Custom Computations

Many people use Dask alongside GPU-accelerated libraries like PyTorch andTensorFlow to manage workloads across several machines. They typically useDask’s custom APIs, notably Delayed and Futures.

Dask doesn’t need to know that these functions use GPUs. It just runs Pythonfunctions. Whether or not those Python functions use a GPU is orthogonal toDask. It will work regardless.

As a worked example, you may want to view this talk:

High Level Collections

Dask can also help to scale out large array and dataframe computations bycombining the Dask Array and DataFrame collections with a GPU-acceleratedarray or dataframe library.

Recall that Dask Array creates a large array out of many NumPyarrays and Dask DataFrame creates a large dataframe out ofmany Pandas dataframes. We can use these same systems with GPUs if we swap outthe NumPy/Pandas components with GPU-accelerated versions of those samelibraries, as long as the GPU accelerated version looks enough likeNumPy/Pandas in order to interoperate with Dask.

Fortunately, libraries that mimic NumPy, Pandas, and Scikit-Learn on the GPU doexist.

DataFrames

The RAPIDS libraries provide a GPU acceleratedPandas-like library,cuDF,which interoperates well and is tested against Dask DataFrame.

If you have cuDF installed then you should be able to convert a Pandas-backedDask DataFrame to a cuDF-backed Dask DataFrame as follows:

  1. import cudf
  2.  
  3. df = df.map_partitions(cudf.from_pandas) # convert pandas partitions into cudf partitions

However, cuDF does not support the entire Pandas interface, and so a variety ofDask DataFrame operations will not function properly. Check thecuDF API Referencefor currently supported interface.

Arrays

Note

Dask’s integration with CuPy relies on features recently added toNumPy and CuPy, particularly in version numpy>=1.17 and cupy>=6

Chainer’s CuPy library provides a GPUaccelerated NumPy-like library that interoperates nicely with Dask Array.

If you have CuPy installed then you should be able to convert a NumPy-backedDask Array into a CuPy backed Dask Array as follows:

  1. import cupy
  2.  
  3. x = x.map_blocks(cupy.asarray)

CuPy is fairly mature and adheres closely to the NumPy API. However, smalldifferences do exist and these can cause Dask Array operations to functionimproperly. Check theCuPy Reference Manualfor API compatibility.

Scikit-Learn

There are a variety of GPU accelerated machine learning libraries that followthe Scikit-Learn Estimator API of fit, transform, and predict. These cangenerally be used within Dask-ML’s meta estimators,such as hyper parameter optimization.

Some of these include:

Setup

From the examples above we can see that the user experience of using Dask withGPU-backed libraries isn’t very different from using it with CPU-backedlibraries. However, there are some changes you might consider making whensetting up your cluster.

Restricting Work

By default Dask allows as many tasks as you have CPU cores to run concurrently.However if your tasks primarily use a GPU then you probably want far fewertasks running at once. There are a few ways to limit parallelism here:

  • Limit the number of threads explicitly on your workers using the—nthreads keyword in the CLI or the ncores= keyword theCluster constructor.
  • Use worker resources and tag certaintasks as GPU tasks so that the scheduler will limit them, while leaving therest of your CPU cores for other work

Specifying GPUs per Machine

Some configurations may have many GPU devices per node. Dask is often used tobalance and coordinate work between these devices.

In these situations it is common to start one Dask worker per device, and usethe CUDA environment varible CUDA_VISIBLE_DEVICES to pin each worker toprefer one device.

  1. # If we have four GPUs on one machine
  2. CUDA_VISIBLE_DEVICES=0 dask-worker ...
  3. CUDA_VISIBLE_DEVICES=1 dask-worker ...
  4. CUDA_VISIBLE_DEVICES=2 dask-worker ...
  5. CUDA_VISIBLE_DEVICES=3 dask-worker ...

The Dask CUDA project contains someconvenience CLI and Python utilities to automate this process.

Work in Progress

GPU computing is a quickly moving field today and as a result the informationin this page is likely to go out of date quickly. We encourage interestedreaders to check out Dask’s Blog which has moretimely updates on ongoing work.