如何在PyTorch中使用VisualDL

下面我们演示一下如何在PyTorch中使用VisualDL,从而可以把PyTorch的训练过程以及最后的模型可视化出来。我们将以PyTorch用卷积神经网络(CNN, Convolutional Neural Network)来训练Cifar10 数据集作为例子。

程序的主体来自PyTorch的 Tutorial我们同时提供了 Jupyter Notebook 的可交互版本。请参见本文件夹里面的 pytorch_cifar10.ipynb

  1. import torch
  2. import torchvision
  3. import torchvision.transforms as transforms
  4. from torch.autograd import Variable
  5. import torch.nn as nn
  6. import torch.nn.functional as F
  7. import torch.optim as optim
  8.  
  9. import matplotlib
  10. matplotlib.use('Agg')
  11.  
  12. from visualdl import LogWriter
  13.  
  14.  
  15. transform = transforms.Compose(
  16. [transforms.ToTensor(),
  17. transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
  18.  
  19. trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
  20. download=True, transform=transform)
  21. trainloader = torch.utils.data.DataLoader(trainset, batch_size=500,
  22. shuffle=True, num_workers=2)
  23.  
  24. testset = torchvision.datasets.CIFAR10(root='./data', train=False,
  25. download=True, transform=transform)
  26. testloader = torch.utils.data.DataLoader(testset, batch_size=500,
  27. shuffle=False, num_workers=2)
  28.  
  29. classes = ('plane', 'car', 'bird', 'cat',
  30. 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
  31.  
  32.  
  33. import matplotlib.pyplot as plt
  34. import numpy as np
  35.  
  36.  
  37. # functions to show an image
  38. def imshow(img):
  39. img = img / 2 + 0.5 # unnormalize
  40. npimg = img.numpy()
  41. fig, ax = plt.subplots()
  42. plt.imshow(np.transpose(npimg, (1, 2, 0)))
  43. # we can either show the image or save it locally
  44. # plt.show()
  45. fig.savefig('out' + str(np.random.randint(0, 10000)) + '.pdf')

我们可以预览一下将要分析的 Cifar10 图片集:

如何在PyTorch中使用VisualDL - 图1

然后我们开始创建 VisualDL 的数据采集 loggers

  1. logdir = "/workspace"
  2. logger = LogWriter(logdir, sync_cycle=100)
  3.  
  4. # mark the components with 'train' label.
  5. with logger.mode("train"):
  6. # create a scalar component called 'scalars/'
  7. scalar_pytorch_train_loss = logger.scalar("scalars/scalar_pytorch_train_loss")
  8. image1 = logger.image("images/image1", 1)
  9. image2 = logger.image("images/image2", 1)
  10. histogram0 = logger.histogram("histogram/histogram0", num_buckets=100)

Cifar10 中有 50000 个训练图像和 10000 个测试图像。我们每 500 个作为一个训练集,图片采样也选 500 。 每个训练集 (batch) 是如下的维度:

500 x 3 x 32 x 32

接下来我们开始创建 CNN 模型

  1. # get some random training images
  2. dataiter = iter(trainloader)
  3. images, labels = dataiter.next()
  4.  
  5. # show images
  6. imshow(torchvision.utils.make_grid(images))
  7. # print labels
  8. print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
  9.  
  10. # Define a Convolution Neural Network
  11. class Net(nn.Module):
  12. def __init__(self):
  13. super(Net, self).__init__()
  14. self.conv1 = nn.Conv2d(3, 6, 5)
  15. self.pool = nn.MaxPool2d(2, 2)
  16. self.conv2 = nn.Conv2d(6, 16, 5)
  17. self.fc1 = nn.Linear(16 * 5 * 5, 120)
  18. self.fc2 = nn.Linear(120, 84)
  19. self.fc3 = nn.Linear(84, 10)
  20.  
  21. def forward(self, x):
  22. x = self.pool(F.relu(self.conv1(x)))
  23. x = self.pool(F.relu(self.conv2(x)))
  24. x = x.view(-1, 16 * 5 * 5)
  25. x = F.relu(self.fc1(x))
  26. x = F.relu(self.fc2(x))
  27. x = self.fc3(x)
  28. return x
  29.  
  30.  
  31. net = Net()

接下来我们开始训练并且同时用 VisualDL 来采集相关数据

  1. # Train the network
  2. for epoch in range(5): # loop over the dataset multiple times
  3. running_loss = 0.0
  4. for i, data in enumerate(trainloader, 0):
  5. # get the inputs
  6. inputs, labels = data
  7.  
  8. # wrap them in Variable
  9. inputs, labels = Variable(inputs), Variable(labels)
  10.  
  11. # zero the parameter gradients
  12. optimizer.zero_grad()
  13.  
  14. # forward + backward + optimize
  15. outputs = net(inputs)
  16. loss = criterion(outputs, labels)
  17.  
  18. loss.backward()
  19. optimizer.step()
  20.  
  21. # use VisualDL to retrieve metrics
  22. # scalar
  23. scalar_pytorch_train_loss.add_record(train_step, float(loss))
  24.  
  25. # histogram
  26. weight_list = net.conv1.weight.view(6*3*5*5, -1)
  27. histogram0.add_record(train_step, weight_list)
  28.  
  29. # image
  30. image1.start_sampling()
  31. image1.add_sample([96, 25], net.conv2.weight.view(16*6*5*5, -1))
  32. image1.finish_sampling()
  33.  
  34. image2.start_sampling()
  35. image2.add_sample([18, 25], net.conv1.weight.view(6*3*5*5, -1))
  36. image2.finish_sampling()
  37.  
  38.  
  39. train_step += 1
  40.  
  41. # print statistics
  42. running_loss += loss.data[0]
  43. if i % 2000 == 1999: # print every 2000 mini-batches
  44. print('[%d, %5d] loss: %.3f' %
  45. (epoch + 1, i + 1, running_loss / 2000))
  46. running_loss = 0.0
  47.  
  48. print('Finished Training')

最后,因为 PyTorch 采用 Dynamic Computation Graphs,我们用一个 dummy 输入来空跑一下模型,以便产生图

  1. import torch.onnx
  2. dummy_input = Variable(torch.randn(4, 3, 32, 32))
  3. torch.onnx.export(net, dummy_input, "pytorch_cifar10.onnx")
  4.  
  5. print('Done')

训练结束后,各个组件的可视化结果如下:

关于误差的数值图的如下:

如何在PyTorch中使用VisualDL - 图2

训练过后的第一,第二层卷积权重图的如下:

如何在PyTorch中使用VisualDL - 图3

训练参数的柱状图的如下:

如何在PyTorch中使用VisualDL - 图4

模型图的效果如下:

如何在PyTorch中使用VisualDL - 图5

生成的完整效果图可以在这里下载。