Invalidating Jug Task Results (invalidate subcommand)

When you invalidate results of a task, you are telling jug that everything that was computed using that function should no longer be used. Thus, any result which depended on that function needs to be recomputed.

Invalidation is manual

To invalidate a task, you need to use the invalidate command. Jug does not detect that you fixed a bug in your code automatically.

Invalidation is dependency aware

When you invalidate a task, all results that depend on the invalid results will also be invalidated. This is what makes the invalidate subcommand so powerful.

Example

Consider the British parliament example we used elsewhere in this guide:

  1. allcounts = []
  2. for mp in MPs:
  3. article = get_data(mp)
  4. words = count_words(mp, article)
  5. allcounts.append(words)
  6. global_counts = add_counts(allcounts) # Here all processes must sync

Its dependency graph looks like this:

  1. M0 -> get_data -> count_words -> C0
  2. M1 -> get_data -> count_words -> C1
  3. M2 -> get_data -> count_words -> C2
  4. ...
  5. C0 C1 C2 -> add_counts -> avgs
  6. C0 + avgs -> divergence
  7. C1 + avgs -> divergence
  8. C2 + avgs -> divergence
  9. ...

This is a typical fan-in-fan-out structure. After you have run the code, jug status will give you this output:

  1. \ Waiting Ready Finished Running Task name
  2. ----------------------------------------------------------------------------------
  3. 0 0 1 0 jugfile.add_counts
  4. 0 0 656 0 jugfile.count_words
  5. 0 0 656 0 jugfile.divergence
  6. 0 0 656 0 jugfile.get_data
  7. ..................................................................................
  8. 0 0 1969 0 Total

Now assume that add_counts has a bug. Now you must:

  1. Fix the bug (well, of course)
  2. Rerun everything that could have been affected by the bug.

Jug invalidation helps you with the second task.

  1. $ jug invalidate --target add_counts
  2. Invalidated Task name
  3. -----------------------------------------------------------
  4. 1 jugfile.add_counts
  5. 656 jugfile.divergence
  6. ...........................................................
  7. 657 Total

will remove results for all the add_counts tasks, and all the ``divergence`` tasks because those results depended on results from add_counts. Now, jug status gives us:

  1. \ Waiting Ready Finished Running Task name
  2. ------------------------------------------------------------------------------------
  3. 0 1 0 0 jugfile.add_counts
  4. 0 0 656 0 jugfile.count_words
  5. 656 0 0 0 jugfile.divergence
  6. 0 0 656 0 jugfile.get_data
  7. ....................................................................................
  8. 656 1 1312 0 Total

So, now when you run jug execute, add_counts will be re-run as will everything that could possibly have changed as well.