| Product Manager | Machine Learning Practitioner | UI/UX Designer/Preacher | Full-Stack Developer |
In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves. Meanwhile in industry, Tensorflow is currently the platform of choice, but that may not be true for long. — The Gradient
# import standard PyTorch modules
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.tensorboard import SummaryWriter # TensorBoard support
# import torchvision module to handle image manipulation
import torchvision
import torchvision.transforms as transforms
# calculate train time, writing train data to files etc.
import time
import pandas as pd
import json
from IPython.display import clear_output
torch.set_printoptions(linewidth=120)
torch.set_grad_enabled(True) # On by default, leave it here for clarity
is the main module that holds all the things you need for Tensor computation. You can build a fully functional neural network using Tensor computation alone, but this is not what this article is about. We’ll make use of the more powerful and convenient
torch
,
torch.nn
and
torch.optim
classes to quickly build our CNN. For those of you interested in knowing how to do this from ‘scratch scratch’, visit this fantastic PyTorch official tutorial by Jeremy Howard.
torchvision
module provides many classes and functions to build neural networks. You can think of it as the fundamental building blocks of neural networks: models, all kinds of layers, activation functions, parameter classes, etc. It allows us to build the model like putting some LEGO set together.
torch.nn
offers all the optimizers like SGD, ADAM, etc., so you don’t have to write it from scratch.
torch.optim
contains a lot of popular datasets, model architectures, and common image transformations for computer vision. We get our Fashion MNIST dataset from it and also use its transforms.
torchvision
enables PyTorch to generate the report for Tensor Board. We’ll use Tensor Board to look at our training data, compare results and gain intuition. Tensor Board used to be TensorFlow’s biggest advantage over PyTorch, but it is now officially supported by PyTorch from v1.2.
SummaryWriter
,
time
,
json
, etc.
pandas
already has the Fashion MNIST dataset. If you’re not familiar with Fashion MNIST dataset:
torchvision
is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend
Fashion-MNISTto serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. — From Github
Fashion-MNIST
# Use standard FashionMNIST dataset
train_set = torchvision.datasets.FashionMNIST(
root = './data/FashionMNIST',
train = True,
download = True,
transform = transforms.Compose([
transforms.ToTensor()
])
)
to turn images into Tensor so we can directly use it with our network. The dataset is stored in the
transforms.ToTensor
class named
dataset
.
train_set
# Build the neural network, expand on top of nn.Module
class Network(nn.Module):
def __init__(self):
super().__init__()
# define layers
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
# define forward function
def forward(self, t):
# conv 1
t = self.conv1(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)
# conv 2
t = self.conv2(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)
# fc1
t = t.reshape(-1, 12*4*4)
t = self.fc1(t)
t = F.relu(t)
# fc2
t = self.fc2(t)
t = F.relu(t)
# output
t = self.out(t)
# don't need softmax here since we'll use cross-entropy as activation.
return t
. It packs all the basics: weights, biases, forward method and also some utility attributes and methods like
nn.Module
and
.parameters()
which we will be using too.
.zero_grad()
dunder function.
__init__
def __init__(self):
super().__init__()
# define layers
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10)
and
nn.Conv2d
are two standard PyTorch layers defined within the
nn.Linear
module. These are quite self-explanatory. One thing to note is that we only defined the actual layers here. The activation and max-pooling operations are included in the forward function that is explained below.
torch.nn
# define forward function
def forward(self, t):
# conv 1
t = self.conv1(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)
# conv 2
t = self.conv2(t)
t = F.relu(t)
t = F.max_pool2d(t, kernel_size=2, stride=2)
# fc1
t = t.reshape(-1, 12*4*4)
t = self.fc1(t)
t = F.relu(t)
# fc2
t = self.fc2(t)
t = F.relu(t)
# output
t = self.out(t)
# don't need softmax here since we'll use cross-entropy as activation.
return t
(Fully Connect layer 1), we used PyTorch’s tensor operation
fc1
to flatten the tensor so it can be passed to the dense layer afterward. Also, we didn’t add the softmax activation function at the output layer since PyTorch’s CrossEntropy function will take care of that for us.
t.reshape
# put all hyper params into a OrderedDict, easily expandable
params = OrderedDict(
lr = [.01, .001],
batch_size = [100, 1000],
shuffle = [True, False]
)
epochs = 3
: Learning Rate. We want to try 0.01 and 0.001 for our models.
lr
: Batch Size to speed up the training process. We’ll use 100 and 1000.
batch_size
: Shuffle toggle, whether we shuffle the batch before training.
shuffle
and
RunBuilder
to manage our hyperparameters and training process.
RunManager
is to offer a static method
RunBuilder
. It takes the OrderedDict (with all hyperparameters stored in it) as a parameter and generates a named tuple
get_runs
, each element of
Run
represent one possible combination of the hyperparameters. This named tuple is later consumed by the training loop. The code is easy to understand.
run
# import modules to build RunBuilder and RunManager helper classes
from collections import OrderedDict
from collections import namedtuple
from itertools import product
# Read in the hyper-parameters and return a Run namedtuple containing all the
# combinations of hyper-parameters
class RunBuilder():
@staticmethod
def get_runs(params):
Run = namedtuple('Run', params.keys())
runs = []
for v in product(*params.values()):
runs.append(Run(*v))
return runs
class.
RunManager
and
csv
for future reference or API extraction.
json
# Helper class, help track loss, accuracy, epoch time, run time,
# hyper-parameters etc. Also record to TensorBoard and write into csv, json
class RunManager():
def __init__(self):
# tracking every epoch count, loss, accuracy, time
self.epoch_count = 0
self.epoch_loss = 0
self.epoch_num_correct = 0
self.epoch_start_time = None
# tracking every run count, run data, hyper-params used, time
self.run_params = None
self.run_count = 0
self.run_data = []
self.run_start_time = None
# record model, loader and TensorBoard
self.network = None
self.loader = None
self.tb = None
# record the count, hyper-param, model, loader of each run
# record sample images and network graph to TensorBoard
def begin_run(self, run, network, loader):
self.run_start_time = time.time()
self.run_params = run
self.run_count += 1
self.network = network
self.loader = loader
self.tb = SummaryWriter(comment=f'-{run}')
images, labels = next(iter(self.loader))
grid = torchvision.utils.make_grid(images)
self.tb.add_image('images', grid)
self.tb.add_graph(self.network, images)
# when run ends, close TensorBoard, zero epoch count
def end_run(self):
self.tb.close()
self.epoch_count = 0
# zero epoch count, loss, accuracy,
def begin_epoch(self):
self.epoch_start_time = time.time()
self.epoch_count += 1
self.epoch_loss = 0
self.epoch_num_correct = 0
#
def end_epoch(self):
# calculate epoch duration and run duration(accumulate)
epoch_duration = time.time() - self.epoch_start_time
run_duration = time.time() - self.run_start_time
# record epoch loss and accuracy
loss = self.epoch_loss / len(self.loader.dataset)
accuracy = self.epoch_num_correct / len(self.loader.dataset)
# Record epoch loss and accuracy to TensorBoard
self.tb.add_scalar('Loss', loss, self.epoch_count)
self.tb.add_scalar('Accuracy', accuracy, self.epoch_count)
# Record params to TensorBoard
for name, param in self.network.named_parameters():
self.tb.add_histogram(name, param, self.epoch_count)
self.tb.add_histogram(f'{name}.grad', param.grad, self.epoch_count)
# Write into 'results' (OrderedDict) for all run related data
results = OrderedDict()
results["run"] = self.run_count
results["epoch"] = self.epoch_count
results["loss"] = loss
results["accuracy"] = accuracy
results["epoch duration"] = epoch_duration
results["run duration"] = run_duration
# Record hyper-params into 'results'
for k,v in self.run_params._asdict().items(): results[k] = v
self.run_data.append(results)
df = pd.DataFrame.from_dict(self.run_data, orient = 'columns')
# display epoch information and show progress
clear_output(wait=True)
display(df)
# accumulate loss of batch into entire epoch loss
def track_loss(self, loss):
# multiply batch size so variety of batch sizes can be compared
self.epoch_loss += loss.item() * self.loader.batch_size
# accumulate number of corrects of batch into entire epoch num_correct
def track_num_correct(self, preds, labels):
self.epoch_num_correct += self._get_num_correct(preds, labels)
@torch.no_grad()
def _get_num_correct(self, preds, labels):
return preds.argmax(dim=1).eq(labels).sum().item()
# save end results of all runs into csv, json for further analysis
def save(self, fileName):
pd.DataFrame.from_dict(
self.run_data,
orient = 'columns',
).to_csv(f'{fileName}.csv')
with open(f'{fileName}.json', 'w', encoding='utf-8') as f:
json.dump(self.run_data, f, ensure_ascii=False, indent=4)
object to store everything we want to export into Tensor Board during the run. Write the network graph and sample images into the
SummaryWriter
object.
SummaryWriter
and
epoch_loss
.
epoch_num_correct
and put all our run data(loss, accuracy, run count, epoch count, run duration, epoch duration, all hyperparameters) into it. Then we’ll use Pandas to read it in and display it in a neat table format.
results
OrderedDict objects for all runs) into
results
and
csv
format for further analysis or API access.
json
class. Congrats on coming to this far! The hardest part is already behind you. From now on everything will start to come together and make sense.
RunManager
and
RunBuilder
classes, the training process is a breeze:
RunManager
m = RunManager()
# get all runs from params using RunBuilder class
for run in RunBuilder.get_runs(params):
# if params changes, following line of code should reflect the changes too
network = Network()
loader = torch.utils.data.DataLoader(train_set, batch_size = run.batch_size)
optimizer = optim.Adam(network.parameters(), lr=run.lr)
m.begin_run(run, network, loader)
for epoch in range(epochs):
m.begin_epoch()
for batch in loader:
images = batch[0]
labels = batch[1]
preds = network(images)
loss = F.cross_entropy(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
m.track_loss(loss)
m.track_num_correct(preds, labels)
m.end_epoch()
m.end_run()
# when all runs are done, save results to files
m.save('results')
to create an iterator of hyperparameters, then loop through each hyperparameter combination to carry out our training:
RunBuilder
for run in RunBuilder.get_runs(params):
object from the
network
class defined above.
Network
. This
network = Network()
objects hold all our weights/biases we need to train.
network
object. It is a PyTorch class that holds our training/validation/test dataset, and it will iterate through the dataset and gives us training data in batches equal to the
DataLoader
specied.
batch_size
loader = torch.utils.data.DataLoader(train_set, batch_size = run.batch_size)
class. The
torch.optim
class gets network parameters and learning rate as input and will help us step through the training process and updates the gradients, etc. We’ll use Adam as our optimization algorithm here.
optim
optimizer = optim.Adam(network.parameters(), lr=run.lr)
method of our
begin_run
class to start tracking run training data.
RunManager
m.begin_run(run, network, loader)
for epoch in range(epochs):
m.begin_epoch()
for batch in loader:
images = batch[0]
labels = batch[1]
preds = network(images)
loss = F.cross_entropy(preds, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
m.track_loss(loss)
m.track_num_correct(preds, labels)
class to do the forward propagation (remember the
network
method above?) and get the predictions. With predictions, we can calculate the loss of this batch using
forward
function. Once the loss is calculated, we reset the gradients (otherwise PyTorch will accumulate the gradients which is not what we want) with
cross_entropy
, do one back propagation use
.zero_grad()
method to calculate all the gradients of the weights/biases. Then, we use the optimizer defined above to update the weights/biases. Now that the network is updated for the current batch, we’ll calculate the loss and number of correct predictions and accumulate/track them using
loss.backward()
and
track_loss
methods of our
track_num_correct
class.
RunManager
.
m.save('results')
to proxy and access our Tensor Board running on Colab virtual machine. Install
ngrok
first:
ngrok
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
LOG_DIR = './runs'
get_ipython().system_raw(
'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
.format(LOG_DIR)
)
proxy:
ngrok
get_ipython().system_raw('./ngrok http 6006 &')
! curl -s http://localhost:4040/api/tunnels | python3 -c \
"import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"