1.4 What is the Command Line?

Before we discuss why you should use the command line for data science, let’s take a peek at what the command line actually looks like (it may be already familiar to you). Figure 1.1 and Figure 1.2 show a screenshot of the command line as it appears by default on macOS and Ubuntu, respectively. Ubuntu is a particular distribution of GNU/Linux, which we’ll be assuming throughout the book.

Command line on macOSFigure 1.1: Command line on macOS

Command line on UbuntuFigure 1.2: Command line on Ubuntu

The window shown in the two screenshots is called the terminal. This is the program that enables you to interact with the shell. It is the shell that executes the commands we type in. (On both Ubuntu and macOS, the default shell is Bash.)

We’re not showing the Microsoft Windows command line (also known as the Command Prompt or PowerShell), because it’s fundamentally different and incompatible with the commands presented in this book. The good news is that you can install the Data Science Toolbox on Microsoft Windows, so that you’re able to follow along. How to install the Data Science Toolbox is explained in Chapter 2.

Typing commands is a very different way of interacting with your computer than through a graphical user interface. If you are mostly used to processing data in, say, Microsoft Excel, then this approach may seem intimidating at first. Don’t be afraid. Trust us when we say that you’ll get used to working at the command line very quickly.

In this book, the commands that we type in, and the output that they generate, is displayed as text. For example, the contents of the terminal (after the welcome message) in the two screenshots would look like this:

  1. $ whoami
  2. vagrant
  3. $ hostname
  4. data-science-toolbox
  5. $ date
  6. Tue Jul 22 02:52:09 UTC 2014
  7. $ echo 'The command line is awesome!' | cowsay
  8. ______________________________
  9. < The command line is awesome! >
  10. ------------------------------
  11. \ ^__^
  12. \ (oo)\_______
  13. (__)\ )\/\
  14. ||----w |
  15. || ||

You’ll also notice that each command is preceded with a dollar sign ($). This is called the prompt. The prompt in the two screenshots showed more information, namely the username (vagrant), the hostname (data-science-toolbox) and the current working directory (~). It’s a convention to show only a dollar sign in examples, because the prompt (1) can change during a session (when you go to a different directory), (2) can be customized by the user (e.g., it can also show the time or the current git (Torvalds and Hamano 2014) branch you’re working on), and (3) is irrelevant for the commands themselves.

In the next chapter we’ll explain much more about essential command-line concepts. Now it’s time to first explain why you should learn to use the command line for doing data science.