8.1 Overview

This intermezzo chapter discusses several approaches to speed up tasks that require commands and pipelines to be run many times. The main goal of this chapter is to demonstrate to you the flexibility and power of a tool called GNU Parallel. Because this tool can be combined with any other tool discussed in this book, it will positively change the way you use the command line for data science. In this chapter, you’ll learn about:

  • Running commands in serial to a range of numbers, lines, and files.
  • Breaking a large task into several smaller tasks.
  • Running pipelines in parallel using GNU Parallel.
  • Distributing pipelines on multiple machines.