Prioritizing Work

When there is more work than workers, Dask has to decide which tasks toprioritize over others. Dask can determine these priorities automatically tooptimize performance, or a user can specify priorities manually according totheir needs.

Dask uses the following priorities, in order:

  • User priorities: A user defined priority, provided by the priority= keyword argumentto functions like compute(), persist(), submit(), or map().Tasks with higher priorities run before tasks with lower priorities withthe default priority being zero.
  1. future = client.submit(func, *args, priority=10) # high priority task
  2. future = client.submit(func, *args, priority=-10) # low priority task
  3.  
  4. df = df.persist(priority=10) # high priority computation
  • First in first out chronologically: Dask prefers computations that weresubmitted early. Because users can submit computations asynchronously itmay be that several different computations are running on the workers atthe same time. Generally Dask prefers those groups of tasks that weresubmitted first.

As a nuance, tasks that are submitted within a close window are oftenconsidered to be submitted at the same time.

  1. x = x.persist() # submitted first and so has higher priority
  2. # wait a while
  3. x = x.persist() # submitted second and so has lower priority

In this case “a while” depends on the kind of computation. Operationsthat are often used in bulk processing, like compute and persistconsider any two computations submitted in the same sixty secondsto have the same priority. Operations that are often used in real-timeprocessing, like submit or map are considered the same priority ifthey are submitted within the 100 milliseconds of each other. Thisbehavior can be controlled with the fifo_timeout= keyword:

  1. x = x.persist()
  2. # wait one minute
  3. x = x.persist(fifo_timeout='10 minutes') # has the same priority
  4.  
  5. a = client.submit(func, *args)
  6. # wait no time at all
  7. b = client.submit(func, *args, fifo_timeout='0ms') # is lower priority
  • Graph Structure: Within any given computation (a compute or persistcall) Dask orders tasks in such a way as to minimize the memory-footprintof the computation. This is discussed in more depth in thetask ordering documentation.