6.7 Rebuilding Certain Targets

The list of the top 100 e-books on project Gutenberg changes daily. We’ve seen that if we run the Drake workflow again then the HTML containing this list is not being downloaded again. Luckily, Drake allows us to run certain steps again, so that we can update this HTML file:

  1. $ drake -w 02.drake '=top.html'

There is a more convenient way than using the output filename to specify which step you want to execute again. We can add so-called tags to both the input and output of steps. A tag starts with a “%”. It is a good idea to choose a short and descriptive tag name so that you can easily specify this at the command line. Let’s add the tag %html to the first step and %filter to the second step:

  1. NUM:=5
  2. BASE=data/
  3. top.html, %html <- [-timecheck]
  4. curl -s 'http://www.gutenberg.org/browse/scores/top' > $OUTPUT
  5. top-$[NUM], %filter <- top.html
  6. < $INPUT grep -E '^<li>' |
  7. head -n $[NUM] |
  8. sed -E "s/.*ebooks\/([0-9]+)\">([^<]+)<.*/\\1,\\2/" > $OUTPUT

We can now rebuild the first step by specifying the %html tag:

  1. $ drake -w 03.drake '=%html'