This article is a simple guide that will help you build and understand the concepts behind building a simple CNN. By the end of this article you will be able to build a simple CNN based on the PyTorch API and will classify clothing using the FashionMNIST dateset. This is assuming you have prior knowledge to Artificial Neural Networks. CNN The concept of CNN or Convolution Neural Networks was popularized by Yann André LeCun who is also known as the father of the convolution nets. A CNN works very similar to how our human eye works. The core operations that are behind the CNN’s are matrix additions and multiplications.So, there is no need to get worried about them. But to know about the working of the CNN’s we need to know how the image gets stored in the computer. The above example shows us how the image is stored which is in the form of arrays. But these are only grey scale images. So an RGB or a color image is 3 such matrices stacked on one other. The above multi-dimension matrix represents a colored image. But in this article we would discuss the classification of grey scale images. CNN Architecture The core function behind a CNN is the convolution operation. It is multiplication of the image matrix with a filter matrix to extract some important features from the image matrix. The above diagram shows the convolution operation on an image matrix. The convolution matrix is filled by moving the filter matrix through the image matrix. Another important component of a CNN is called the Max-pool layer. This helps us in reducing the number of features i.e. it sharpens them so that our CNN performs better. To all of the convolutional layers we apply the RELU activation function. While mapping the convolutional layers to the output we need to use a linear layer. So we use layers called the fully connected layers abbreviated as fc. The final fc’s activation mostly is a sigmoid activation function. We can clearly see the output maps between 0 and 1 for all input values. So now you are aware of the layers we are going to use. This knowledge is enough for building a simple CNN but one optional layer call the dropout will help the perform well. Dropout layer is placed in between the fc layers and this randomly drops the connection with a set probability which will help us in training the CNN better. CNN Our CNN architecture , but at the end we will add a dropout between the fc layers. Without wasting anymore time we will get into the code. torch torchvision torchvision.datasets FashionMNIST torch.utils.data DataLoader torchvision transforms data_transform = transforms.ToTensor() train_data = FashionMNIST(root= , train= , download= , transform=data_transform) test_data = FashionMNIST(root= , train= , download= , transform=data_transform) print( , len(train_data)) print( , len(test_data)) batch_size = train_loader = DataLoader(train_data, batch_size=batch_size, shuffle= ) test_loader = DataLoader(test_data, batch_size=batch_size, shuffle= ) classes = [ , , , , , , , , , ] For visualizing the Data numpy np matplotlib.pyplot plt %matplotlib inline dataiter = iter(train_loader) images, labels = dataiter.next() images = images.numpy() fig = plt.figure(figsize=( , )) idx np.arange(batch_size): ax = fig.add_subplot( , batch_size/ , idx+ , xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap= ) ax.set_title(classes[labels[idx]]) torch.nn nn torch.nn.functional F super(Net, self).__init__() self.conv1 = nn.Conv2d( , , ) self.pool = nn.MaxPool2d( , ) self.conv2 = nn.Conv2d( , , ) self.fc1 = nn.Linear( * * , ) self.fc1_drop = nn.Dropout(p= ) self.fc2 = nn.Linear( , ) x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(x.size( ), ) x = F.relu(self.fc1(x)) x = self.fc1_drop(x) x = self.fc2(x) x net = Net() print(net) torch.optim optim criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr= , momentum= ) loss_over_time = [] epoch range(n_epochs): running_loss = batch_i, data enumerate(train_loader): inputs, labels = data optimizer.zero_grad() outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() batch_i % == : avg_loss = running_loss/ loss_over_time.append(avg_loss) print( .format(epoch + , batch_i+ , avg_loss)) running_loss = print( ) loss_over_time n_epochs = training_loss = train(n_epochs) plt.plot(training_loss) plt.xlabel( ) plt.ylabel( ) plt.ylim( , ) plt.show() dataiter = iter(test_loader) images, labels = dataiter.next() preds = np.squeeze(net(images).data.max( , keepdim= )[ ].numpy()) images = images.numpy() fig = plt.figure(figsize=( , )) idx np.arange(batch_size): ax = fig.add_subplot( , batch_size/ , idx+ , xticks=[], yticks=[]) ax.imshow(np.squeeze(images[idx]), cmap= ) ax.set_title( .format(classes[preds[idx]], classes[labels[idx]]), color=( preds[idx]==labels[idx] )) import import # data loading and transforming from import from import from import # The output of torchvision datasets are PILImage images of range [0, 1]. # We transform them to Tensors for input into a CNN ## Define a transform to read the data in as a tensor # choose the training and test datasets './data' True True './data' False True # Print out some stats about the training and test data 'Train data, number of images: ' 'Test data, number of images: ' # prepare data loaders, set the batch_size 20 True True # specify the image classes 'T-shirt/top' 'Trouser' 'Pullover' 'Dress' 'Coat' 'Sandal' 'Shirt' 'Sneaker' 'Bag' 'Ankle boot' import as import as # obtain one batch of training images # plot the images in the batch, along with the corresponding labels 25 4 for in 2 2 1 'gray' # Defining the CNN import as import as : class Net (nn.Module) : def __init__ (self) # 1 input image channel (grayscale), 10 output channels/feature maps # 3x3 square convolution kernel ## output size = (W-F)/S +1 = (28-3)/1 +1 = 26 # the output Tensor for one image, will have the dimensions: (10, 26, 26) # after one pool layer, this becomes (10, 13, 13) 1 10 3 # maxpool layer # pool with kernel_size=2, stride=2 2 2 # second conv layer: 10 inputs, 20 outputs, 3x3 conv ## output size = (W-F)/S +1 = (13-3)/1 +1 = 11 # the output tensor will have dimensions: (20, 11, 11) # after another pool layer this becomes (20, 5, 5); 5.5 is rounded down 10 20 3 # 20 outputs * the 5*5 filtered/pooled map size 20 5 5 50 # dropout with p=0.4 0.4 # finally, create 10 output channels (for the 10 classes) 50 10 # define the feedforward behavior : def forward (self, x) # two conv/relu + pool layers # prep for linear layer # this line of code is the equivalent of Flatten in Keras 0 -1 # two linear layers with dropout in between # final output return # instantiate and print your Net import as # using cross entropy whcih combines softmax and NLL loss # stochastic gradient descent with a small learning rate and some momentum 0.001 0.9 # Training the CNN : def train (n_epochs) # to track the loss as the network trains for in # loop over the dataset multiple times 0.0 for in # get the input images and their corresponding labels # zero the parameter (weight) gradients # forward pass to get outputs # calculate the loss # backward pass to calculate the parameter gradients # update the parameters # print loss statistics # to convert loss into a scalar and add it to running_loss, we use .item() if 1000 999 # print every 1000 batches 1000 # record and print the avg loss over the 1000 batches 'Epoch: {}, Batch: {}, Avg. Loss: {}' 1 1 0.0 'Finished Training' return # define the number of epochs to train for 30 # start small to see if your model works, initially # call train # visualize the loss as the network trained '1000\'s of batches' 'loss' 0 2.5 # consistent scale # obtain one batch of test images # get predictions 1 True 1 # plot the images in the batch, along with predicted and true labels 25 4 for in 2 2 1 'gray' "{} ({})" "green" if else "red" Conclusion This is how we build a simple CNN. Just in case you wanna catch up attached below is my LinkedIn profile link. Feel free to connect. Link: https://www.linkedin.com/in/srimanth-tenneti-662b7117b/