3.2 Copying Local Files to the Data Science Toolbox

A common situation is that you already have the necessary files on your own computer. This section explains how you can get those files onto the local or remote version of the Data Science Toolbox.

3.2.1 Local Version of Data Science Toolbox

We mentioned in Chapter 2 that the Vagrant version of the Data Science Toolbox is an isolated virtual environment. Luckily there is one exception to that: files can be transfered in and out the Data Science Toolbox. The local directory from which you ran vagrant up (which is the one that contains the file Vagrantfile), is mapped to a directory in the Data Science Toolbox. This directory is called /vagrant. Please note that this is not your home directory. Let us check the contents of this directory:

  1. $ ls -1 /vagrant
  2. build
  3. Vagrantfile

If you have a file on your local computer, and you want to apply some command-line tools to it, all you have to do is copy or move the file to that directory. Let’s assume that you have a file called logs.csv on your Desktop. If you are running Linux or macOS, execute the following command on your operating system (and not inside the Data Science Toolbox):

  1. $ cp ~/Desktop/logs.csv .

And if you are running Windows, you can run the following commands on the command prompt:

  1. > cd %UserProfile%\Desktop
  2. > copy logs.csv MyDataScienceToolbox\

You may also drag-and-drop the file into the directory using Windows Explorer.

The file is now located in the directory /vagrant. It is a good idea to keep your data in a separate directory, like we have ~/book/ch03/data. So, after you have copied the file, you can move it by running:

  1. $ mv /vagrant/logs.csv ~/book/ch03/data
  2. $ cd ~/book/ch03
  3. $ cat data/logs.csv

3.2.2 Remote Version of Data Science Toolbox

If you are running Linux or macOS, you can use scp (Rinne and Ylonen 2014), which stands for secure copy, to copy files onto the EC2 instance. You will need the same key pair file that you used to login to the EC2 instance.

  1. $ scp -i mykey.pem ~/Desktop/logs.csv \
  2. > ubuntu@ec2-184-73-72-150.compute-1.amazonaws.com:data

Replace the host name in the example ec2-184-73-72-150.compute-1.amazonaws.com with the value you see on the EC2 overview page in the AWS console.