Dask Use Cases

Dask is a versatile tool that supports a variety of workloads.This page contains brief and illustrative examples of how people use Dask in practice.These emphasize breadth and hopefully inspire readers to find new waysthat Dask can serve them beyond their original intent.

Overview

Dask uses can be roughly divided in the following two categories:

  • Large NumPy/Pandas/Lists withDask Array,Dask DataFrame,Dask Bag,to analyze large datasets with familiar techniques.This is similar to Databases, Spark, or big array libraries
  • Custom task scheduling. You submit a graph of functions that depend oneach other for custom workloads. This is similar to Luigi, Airflow,Celery, or Makefiles) Most people today approach Dask assuming it is a framework like Spark, designedfor the first use case around large collections of uniformly shaped data.However, many of the more productive and novel use cases fall into the secondcategory where Dask is used to parallelize custom workflows.

In the real-world applications above we see that people end up using bothsides of Dask to achieve novel results.

Contributing

If you solve interesting problems with Dask then we want you to share yourstory. Hearing from experienced users like yourself can help newcomers quicklyidentify the parts of Dask and the surrounding ecosystem that are likely to bevaluable to them.

Stories are collected as pull requests to github.com/dask/dask-stories. You may wish to read a few of thestories above to get a sense for the typical level of information. There is atemplate in the repository with suggestions, but you can also structure yourstory a different way.