End sidebar

For this initial tutorial we are just going to try to create a model that can classify any image as a 3 or a 7. So let’s download a sample of MNIST that contains images of just these digits:

In [ ]:

  1. path = untar_data(URLs.MNIST_SAMPLE)

In [ ]:

  1. #hide
  2. Path.BASE_PATH = path

We can see what’s in this directory by using ls, a method added by fastai. This method returns an object of a special fastai class called L, which has all the same functionality of Python’s built-in list, plus a lot more. One of its handy features is that, when printed, it displays the count of items, before listing the items themselves (if there are more than 10 items, it just shows the first few):

In [ ]:

  1. path.ls()

Out[ ]:

  1. (#9) [Path('cleaned.csv'),Path('item_list.txt'),Path('trained_model.pkl'),Path('models'),Path('valid'),Path('labels.csv'),Path('export.pkl'),Path('history.csv'),Path('train')]

The MNIST dataset follows a common layout for machine learning datasets: separate folders for the training set and the validation set (and/or test set). Let’s see what’s inside the training set:

In [ ]:

  1. (path/'train').ls()

Out[ ]:

  1. (#2) [Path('train/7'),Path('train/3')]

There’s a folder of 3s, and a folder of 7s. In machine learning parlance, we say that “3” and “7” are the labels (or targets) in this dataset. Let’s take a look in one of these folders (using sorted to ensure we all get the same order of files):

In [ ]:

  1. threes = (path/'train'/'3').ls().sorted()
  2. sevens = (path/'train'/'7').ls().sorted()
  3. threes

Out[ ]:

  1. (#6131) [Path('train/3/10.png'),Path('train/3/10000.png'),Path('train/3/10011.png'),Path('train/3/10031.png'),Path('train/3/10034.png'),Path('train/3/10042.png'),Path('train/3/10052.png'),Path('train/3/1007.png'),Path('train/3/10074.png'),Path('train/3/10091.png')...]

As we might expect, it’s full of image files. Let’s take a look at one now. Here’s an image of a handwritten number 3, taken from the famous MNIST dataset of handwritten numbers:

In [ ]:

  1. im3_path = threes[1]
  2. im3 = Image.open(im3_path)
  3. im3

Out[ ]:

End sidebar - 图1

Here we are using the Image class from the Python Imaging Library (PIL), which is the most widely used Python package for opening, manipulating, and viewing images. Jupyter knows about PIL images, so it displays the image for us automatically.

In a computer, everything is represented as a number. To view the numbers that make up this image, we have to convert it to a NumPy array or a PyTorch tensor. For instance, here’s what a section of the image looks like, converted to a NumPy array:

In [ ]:

  1. array(im3)[4:10,4:10]

Out[ ]:

  1. array([[ 0, 0, 0, 0, 0, 0],
  2. [ 0, 0, 0, 0, 0, 29],
  3. [ 0, 0, 0, 48, 166, 224],
  4. [ 0, 93, 244, 249, 253, 187],
  5. [ 0, 107, 253, 253, 230, 48],
  6. [ 0, 3, 20, 20, 15, 0]], dtype=uint8)

The 4:10 indicates we requested the rows from index 4 (included) to 10 (not included) and the same for the columns. NumPy indexes from top to bottom and left to right, so this section is located in the top-left corner of the image. Here’s the same thing as a PyTorch tensor:

In [ ]:

  1. tensor(im3)[4:10,4:10]

Out[ ]:

  1. tensor([[ 0, 0, 0, 0, 0, 0],
  2. [ 0, 0, 0, 0, 0, 29],
  3. [ 0, 0, 0, 48, 166, 224],
  4. [ 0, 93, 244, 249, 253, 187],
  5. [ 0, 107, 253, 253, 230, 48],
  6. [ 0, 3, 20, 20, 15, 0]], dtype=torch.uint8)

We can slice the array to pick just the part with the top of the digit in it, and then use a Pandas DataFrame to color-code the values using a gradient, which shows us clearly how the image is created from the pixel values:

In [ ]:

  1. #hide_output
  2. im3_t = tensor(im3)
  3. df = pd.DataFrame(im3_t[4:15,4:22])
  4. df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')

Out[ ]:

01234567891011121314151617
0000000000000000000
1000002915019525425525417619315096000
200048166224253253234196253253253253233000
309324424925318746108410194253253233000
401072532532304800000192253253156000
503202015000004322425324574000
600000000002492532451260000
700000001410122325324812400000
800000111662392532532531873000000
90000016248250253253253253232213111200
100000000439898208253253253253187220

End sidebar - 图2

You can see that the background white pixels are stored as the number 0, black is the number 255, and shades of gray are between the two. The entire image contains 28 pixels across and 28 pixels down, for a total of 784 pixels. (This is much smaller than an image that you would get from a phone camera, which has millions of pixels, but is a convenient size for our initial learning and experiments. We will build up to bigger, full-color images soon.)

So, now you’ve seen what an image looks like to a computer, let’s recall our goal: create a model that can recognize 3s and 7s. How might you go about getting a computer to do that?

Warning: Stop and Think!: Before you read on, take a moment to think about how a computer might be able to recognize these two different digits. What kinds of features might it be able to look at? How might it be able to identify these features? How could it combine them together? Learning works best when you try to solve problems yourself, rather than just reading somebody else’s answers; so step away from this book for a few minutes, grab a piece of paper and pen, and jot some ideas down…