2.2 Installing the Docker Image

This section used to be called Setting up the Data Science Toolbox and described how to install a Vagrant box containing all the command-line tools. This Vagrant box was created in 2014, and because technology around virtualisation and containerisation has moved on, it became high time for an update. So now, instead of a Vagrant box, we use a Docker image.

In this book we use many different command-line tools. Linux often comes with a whole bunch of command-line tools pre-installed. Moreover, Linux offers many packages that contain other, relevant tools. Installing these packages yourself is not too difficult. However, we also use tools that are not available as packages and require a more manual, and more involved, installation. In order to acquire the necessary command-line tools without having to go through the involved installation process of each, we encourage you to install a Docker image that was created specifically for this book.

If you still prefer to run the command-line tools natively rather than inside a Docker image, then you can, of course, install the command-line tools individually yourself. Please be aware that this is a very time-consuming process. The Appendix lists all the command-line tools used in the book. The installation instructions are for Ubuntu only. The scripts and data sets used in the book can be obtained by cloning this book’s GitHub repository.

To install the Docker image, you first need to download and install Docker itself from the Docker website.Once Docker is installed, you invoke the following command on your terminal or command prompt to download the Docker image (don’t type the dollar sign):

  1. $ docker pull datascienceworkshops/data-science-at-the-command-line

You can run the Docker image as follows:

  1. $ docker run --rm -it datascienceworkshops/data-science-at-the-command-line

You’re now inside an isolated Linux environment—known as a Docker container—with all the necessary command-line tools installed. If the following command produces an enthusiastic cow, then you know everything is working correctly:

  1. $ cowsay "Let's go!"
  2. ___________
  3. < Let's go! >
  4. -----------
  5. \ ^__^
  6. \ (oo)\_______
  7. (__)\ )\/\
  8. ||----w |
  9. || ||

Run exit to exit the container. If you want to get data in and out of the container, you can add a volume, which means that a local directory gets mapped to a directory inside the container. We recommend that you create a new directory, navigate to this new directory, and then run the following when you’re on macOS or Linux:

  1. $ docker run --rm -it -v`pwd`:/data datascienceworkshops/data-science-at-the-command-line

Or the following when you’re on Windows and using the command line:

  1. $ docker run --rm -it -v %cd%:/data datascienceworkshops/data-science-at-the-command-line

Or the following when you’re using Windows PowerShell:

  1. $ docker run --rm -it -v ${PWD}:/data datascienceworkshops/data-science-at-the-command-line

In the above commands, the option -v instructs docker to map the current directory to the /data directory inside the container, so this is the place to get data in and out of the Docker container.

If you would like to know more about the Docker image you can visit it on Docker Hub.