has sort of became one of the de facto standard for creating Neural Networks now, and I love its interface. Yet, it is somehow a little difficult for beginners to get a hold of. PyTorch I remember picking PyTorch up only after some extensive experimentation a couple of years back. To tell you the truth, it took me a lot of time to pick it up but am I glad that I moved from . With its high customizability and pythonic syntax, PyTorch is just a joy to work with, and I would recommend it to anyone who wants to do some heavy lifting with Deep Learning. Keras to PyTorch So, in this PyTorch guide, and go through some of the most important classes and modules that you will require while creating any Neural Network with Pytorch. I will try to ease some of the pain with PyTorch for starters But, that is not to say that this is aimed at beginners only as . I will also talk about the high customizability PyTorch provides and will talk about custom Layers, Datasets, Dataloaders, and Loss functions So let’s get some coffee ☕ ️and start it up. Tensors Tensors are the basic building blocks in PyTorch and put very simply, they are NumPy arrays but on GPU. In this part, I will list down some of the most used operations we can use while working with Tensors. This is by no means an exhaustive list of operations you can do with Tensors, but it is helpful to understand what tensors are before going towards the more exciting parts. 1. Create a Tensor We can create a PyTorch tensor in multiple ways. This includes converting to tensor from a NumPy array. Below is just a small gist with some examples to start with, but you can do a whole lot of with tensors just like you can do with NumPy arrays. more things t = torch.Tensor([[ , , ],[ , , ]]) print( ) t = torch.randn( , ) print( ) t = torch.ones( , ) print( ) t = torch.zeros( , ) print( ) t = torch.randint(low = ,high = ,size = ( , )) print( ) a = np.array([[ , , ],[ , , ]]) t = torch.from_numpy(a) print( ) t = t.numpy() print( ) # Using torch.Tensor 1 2 3 3 4 5 f"Created Tensor Using torch.Tensor:\n " {t} # Using torch.randn 3 5 f"Created Tensor Using torch.randn:\n " {t} # using torch.[ones|zeros](*size) 3 5 f"Created Tensor Using torch.ones:\n " {t} 3 5 f"Created Tensor Using torch.zeros:\n " {t} # using torch.randint - a tensor of size 4,5 with entries between 0 and 10(excluded) 0 10 4 5 f"Created Tensor Using torch.randint:\n " {t} # Using from_numpy to convert from Numpy Array to Tensor 1 2 3 3 4 5 f"Convert to Tensor From Numpy Array:\n " {t} # Using .numpy() to convert from Tensor to Numpy array f"Convert to Numpy Array From Tensor:\n " {t} 2. Tensor Operations Again, there are a lot of operations you can do on these tensors. The full list of functions can be found . here A = torch.randn( , ) W = torch.randn( , ) t = A.mm(W) print( ) t = t.t() print( ) t = t** print( ) print( ) 3 4 4 2 # Multiply Matrix A and W f"Created Tensor t by Multiplying A and W:\n " {t} # Transpose Tensor t f"Transpose of Tensor t:\n " {t} # Square each element of t 2 f"Square each element of Tensor t:\n " {t} # return the size of a tensor f"Size of Tensor t using .size():\n " {t.size()} What are PyTorch Variables? In the previous versions of Pytorch, Tensor and Variables used to be different and provided different functionality, but now the Variable API is , and all methods for variables work with Tensors. So, if you don’t know about them, it’s fine as they re not needed, and if you know them, you can forget about them. Note: deprecated The nn.Module Here comes the fun part as we are now going to talk about some of the most used constructs in Pytorch while creating deep learning projects. nn.Module lets you create your Deep Learning models as a class. You can inherit from nn.Moduleto define any model as a class. Every model class necessarily contains an procedure block and a block for the pass. __init__ forward In the part, the user can define all the layers the network is going to have but doesn’t yet define how those layers would be connected to each other. __init__ In the pass block, the user defines how data flows from one layer to another inside the network. forward So, put simply, any network we define will look like: super().__init__() self.lin1 = nn.Linear( , ) self.lin2 = nn.Linear( , ) x = self.lin1(x) x = self.lin2(x) x : class myNeuralNet (nn.Module) : def __init__ (self) # Define all Layers Here 784 30 30 10 : def forward (self, x) # Connect the layer Outputs here to define the forward pass return Here we have defined a very simple Network that takes an input of size 784 and passes it through two linear layers in a sequential manner. But the thing to note is that we can define any sort of calculation while defining the forward pass, and that makes PyTorch highly customizable for research purposes. For example, in our crazy experimentation mode, we might have used the below network where we arbitrarily attach our layers. Here we send back the output from the second linear layer back again to the first one after adding the input to it(skip connection) back again(I honestly don’t know what that will do). super().__init__() self.lin1 = nn.Linear( , ) self.lin2 = nn.Linear( , ) self.lin3 = nn.Linear( , ) x_lin1 = self.lin1(x) x_lin2 = x + self.lin2(x_lin1) x_lin2 = self.lin1(x_lin2) x = self.lin3(x_lin2) x : class myCrazyNeuralNet (nn.Module) : def __init__ (self) # Define all Layers Here 784 30 30 784 30 10 : def forward (self, x) # Connect the layer Outputs here to define the forward pass return We can also check if the neural network forward pass works. I usually do that by first creating some random input and just passing that through the network I have created. x = torch.randn(( , )) model = myCrazyNeuralNet() model(x).size() -------------------------- torch.Size([ , ]) 100 784 100 10 A word about Layers Pytorch is pretty powerful, and you can actually create any new experimental layer by yourself using . For example, rather than using the predefined Linear Layer from Pytorch above, we could have created our . nn.Module nn.Linear custom linear layer super().__init__() self.weights = nn.Parameter(torch.randn(in_size, out_size)) self.bias = nn.Parameter(torch.zeros(out_size)) x.mm(self.weights) + self.bias : class myCustomLinearLayer (nn.Module) : def __init__ (self,in_size,out_size) : def forward (self, x) return You can see how we wrap our weights tensor in nn.Parameter. This is done to make the tensor to be considered as a model parameter. From PyTorch : docs Parameters are subclasses, that have a very special property when used with - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear in iterator Tensor Module parameters() As you will later see, the iterator will be an input to the optimizer. But more on that later. model.parameters() Right now, we can now use this custom layer in any PyTorch network, just like any other layer. super().__init__() self.lin1 = myCustomLinearLayer( , ) x = self.lin1(x) x x = torch.randn(( , )) model = myCustomNeuralNet() model(x).size() ------------------------------------------ torch.Size([ , ]) : class myCustomNeuralNet (nn.Module) : def __init__ (self) # Define all Layers Here 784 10 : def forward (self, x) # Connect the layer Outputs here to define the forward pass return 100 784 100 10 But then again, Pytorch would not be so widely used if it didn’t provide a lot of ready to made layers used very frequently in wide varieties of Neural Network architectures. Some examples are: , , , , , , , / , , , , , nn.Linear nn.Conv2d nn.MaxPool2d nn.ReLU nn.BatchNorm2d nn.Dropout nn.Embedding nn.GRU nn.LSTM nn.Softmax nn.LogSoftmax nn.MultiheadAttention nn.TransformerEncoder nn.TransformerDecoder I have linked all the layers to their source where you could read all about them, but to show how I usually try to understand a layer and read the docs, I would try to look at a very simple convolutional layer here. So, a Conv2d Layer needs as input an Image of height H and width W, with Cin channels. Now, for the first layer in a convnet, the number of in_channels would be 3(RGB), and the number of out_channels can be defined by the user. The kernel_size mostly used is 3x3, and the stride normally used is 1. To check a new layer which I don’t know much about, I usually try to see the input as well as output for the layer like below where I would first initialize the layer: conv_layer = nn.Conv2d(in_channels = , out_channels = , kernel_size = ( , ), stride = , padding= ) 3 64 3 3 1 1 And then pass some random input through it. Here 100 is the batch size. x = torch.randn(( , , , )) conv_layer(x).size() -------------------------------- torch.Size([ , , , ]) 100 3 24 24 100 64 24 24 So, we get the output from the convolution operation as required, and I have sufficient information on how to use this layer in any Neural Network I design. Datasets and DataLoaders How would we pass data to our Neural nets while training or while testing? We can definitely pass tensors as we have done above, but Pytorch also provides us with pre-built Datasets to make it easier for us to pass data to our neural nets. You can check out the complete list of datasets provided at and . But, to give a concrete example for datasets, let’s say we had to pass images to an Image Neural net using a folder which has images in this structure: torchvision.datasets torchtext.datasets data train sailboat kayak . . We can use dataset to get an example image like below: torchvision.datasets.ImageFolder torchvision transforms torchvision.datasets ImageFolder traindir = t = transforms.Compose([ transforms.Resize(size= ), transforms.CenterCrop(size= ), transforms.ToTensor()]) train_dataset = ImageFolder(root=traindir,transform=t) print( , len(train_dataset)) print( , train_dataset[ ]) from import from import "data/train/" 256 224 "Num Images in Dataset:" "Example Image and Label:" 2 This dataset has 847 images, and we can get an image and its label using an index. Now we can pass images one by one to any image neural network using a for loop: i range( ,len(train_dataset)): image ,label = train_dataset[i] pred = model(image) for in 0 We can actually write some more code to append images and labels in a batch and then pass it to the Neural network. But Pytorch provides us with a utility iterator to do precisely that. Now we can simply wrap our in the Dataloader, and we will get batches instead of individual examples. But that is not optimal. We want to do batching. torch.utils.data.DataLoader train_dataset train_dataloader = DataLoader(train_dataset,batch_size = , shuffle= , num_workers= ) 64 True 10 We can simply iterate with batches using: image_batch, label_batch train_dataloader: print(image_batch.size(),label_batch.size()) ------------------------------------------------------------------ torch.Size([ , , , ]) torch.Size([ ]) for in break 64 3 224 224 64 So actually, the whole process of using datasets and Dataloaders becomes: t = transforms.Compose([ transforms.Resize(size= ), transforms.CenterCrop(size= ), transforms.ToTensor()]) train_dataset = torchvision.datasets.ImageFolder(root=traindir,transform=t) train_dataloader = DataLoader(train_dataset,batch_size = , shuffle= , num_workers= ) image_batch, label_batch train_dataloader: pred = myImageNeuralNet(image_batch) 256 224 64 True 10 for in You can look at this particular example in action in my previous blogpost on Image classification using Deep Learning . here This is great, and Pytorch does provide a lot of functionality out of the box. But the main power of Pytorch comes with its immense customization. We can also create our own custom datasets if the datasets provided by PyTorch don’t fit our use case. Understanding Custom Datasets To write our custom datasets, we can make use of the abstract class provided by Pytorch. We need to inherit this Dataset class and need to define two methods to create a custom Dataset. torch.utils.data.Dataset : a function that returns the size of the dataset. This one is pretty simple to write in most cases. __len__ : a function that takes as input an index i and returns the sample at index i. __getitem__ For example, we can create a simple custom dataset that returns an image and a label from a folder. See that most of the tasks are happening in part where we use to get image names and do some general preprocessing. __init__ glob.glob glob glob PIL Image torch.utils.data Dataset self.image_paths = glob( ) self.labels = [x.split( )[ ] x self.image_paths] self.label_to_idx = {x:i i,x enumerate(set(self.labels))} self.transform = transform len(self.image_paths) img_name = self.image_paths[idx] label = self.labels[idx] image = Image.open(img_name) self.transform: image = self.transform(image) image,self.label_to_idx[label] from import from import from import : class customImageFolderDataset (Dataset) """Custom Image Loader dataset.""" : def __init__ (self, root, transform=None) """ Args: root (string): Path to the images organized in a particular folder structure. transform: Any Pytorch transform to be applied """ # Get all image paths from a directory f" /*/*" {root} # Get the labels from the image paths "/" -2 for in # Create a dictionary mapping each label to a index from 0 to len(classes). for in : def __len__ (self) # return length of dataset return : def __getitem__ (self, idx) # open and send one image and label if return Also, note that we open our images one at a time in the method and not while initializing. This is not done in because we don't want to load all our images in the memory and just need to load the required ones. __getitem__ __init__ We can now use this dataset with the utility Dataloader just like before. It works just like the previous dataset provided by PyTorch but without some utility functions. t = transforms.Compose([ transforms.Resize(size= ), transforms.CenterCrop(size= ), transforms.ToTensor()]) train_dataset = customImageFolderDataset(root=traindir,transform=t) train_dataloader = DataLoader(train_dataset,batch_size = , shuffle= , num_workers= ) image_batch, label_batch train_dataloader: pred = myImageNeuralNet(image_batch) 256 224 64 True 10 for in Understanding Custom DataLoaders But I am adding it for completeness here. This particular section is a little advanced and can be skipped going through this post as it will not be needed in a lot of situations. So let’s say you are looking to provide batches to a network that processes text input, and the network could take sequences with any sequence size as long as the size remains constant in the batch. For example, we can have a BiLSTM network that can process sequences of any length. It’s alright if you don’t understand the layers used in it right now; just know that it can process sequences with variable sizes. super().__init__() self.hidden_size = drp = max_features, embed_size = , self.embedding = nn.Embedding(max_features, embed_size) self.lstm = nn.LSTM(embed_size, self.hidden_size, bidirectional= , batch_first= ) self.linear = nn.Linear(self.hidden_size* , ) self.relu = nn.ReLU() self.dropout = nn.Dropout(drp) self.out = nn.Linear( , ) h_embedding = self.embedding(x) h_embedding = torch.squeeze(torch.unsqueeze(h_embedding, )) h_lstm, _ = self.lstm(h_embedding) avg_pool = torch.mean(h_lstm, ) max_pool, _ = torch.max(h_lstm, ) conc = torch.cat(( avg_pool, max_pool), ) conc = self.relu(self.linear(conc)) conc = self.dropout(conc) out = self.out(conc) out : class BiLSTM (nn.Module) : def __init__ (self) 64 0.1 10000 300 True True 4 64 64 1 : def forward (self, x) 0 1 1 1 return This network expects its input to be of shape (batch_size, seq_length) and works with any seq_length. We can check this by passing our model two random batches with different sequence lengths(10 and 25). model = BiLSTM() input_batch_1 = torch.randint(low = ,high = , size = ( , )) input_batch_2 = torch.randint(low = ,high = , size = ( , )) print(model(input_batch_1).size()) print(model(input_batch_2).size()) ------------------------------------------------------------------ torch.Size([ , ]) torch.Size([ , ]) 0 10000 100 10 0 10000 100 25 100 1 100 1 Now, we want to provide tight batches to this model, such that each batch has the same sequence length based on the max sequence length in the batch to minimize padding. This has an added benefit of making the neural net run faster. It was, in fact, one of the methods used in the winning submission of the Quora Insincere challenge in Kaggle, where running time was of utmost importance. So, how do we do this? Let’s write a very simple custom dataset class first. self.data = list(zip(X,y)) self.data = sorted(self.data, key= x: len(x[ ])) len(self.data) self.data[idx] : class CustomTextDataset (Dataset) ''' Simple Dataset initializes with X and y vectors We start by sorting our X and y vectors by sequence lengths ''' : def __init__ (self,X,y=None) # Sort by length of first element in tuple lambda 0 : def __len__ (self) return : def __getitem__ (self, idx) return Also, let’s generate some random data which we will use with this custom Dataset. numpy np train_data_size = sizes = np.random.randint(low= ,high= ,size=(train_data_size,)) X = [np.random.randint( , , (sizes[i])) i range(train_data_size)] y = np.random.rand(train_data_size).round() print((X[ ],y[ ])) import as 1024 50 300 0 10000 for in #checking one example in dataset 0 0 Example of one random sequence and label. Each integer in the sequence corresponds to a word in the sentence. We can use the custom dataset now using: train_dataset = CustomTextDataset(X,y) If we now try to use the Dataloader on this dataset with batch_size>1, we will get an error. Why is that? train_dataloader = DataLoader(train_dataset,batch_size = , shuffle= , num_workers= ) xb,yb train_dataloader: print(xb.size(),yb.size()) 64 False 10 for in This happens because the sequences have different lengths, and our data loader expects our sequences of the same length. Remember that in the previous image example, we resized all images to size 224 using the transforms, so we didn’t face this error. So, how do we iterate through this dataset so that each batch has sequences with the same length, but different batches may have different sequence lengths? We can use parameter in the DataLoader that lets us define how to stack sequences in a particular batch. To use this, we need to define a function that takes as input a batch and returns ( , ) with padded sequence lengths based on in the batch. The functions I have used in the below function are simple NumPy operations. Also, the function is properly commented so you can understand what is happening. collate_fn x_batch y_batch max_sequence_length data = [item[ ] item batch] target = [item[ ] item batch] max_seq_len = max([len(x) x data]) data = [np.pad(p, ( , max_seq_len - len(p)), ) p data] data = torch.LongTensor(data) target = torch.LongTensor(target) [data, target] : def collate_text (batch) # get text sequences in batch 0 for in # get labels in batch 1 for in # get max_seq_length in batch for in # pad text sequences based on max_seq_len 0 'constant' for in # convert data and target to tensor return We can now use this collate_fn with our Dataloader as: train_dataloader = DataLoader(train_dataset,batch_size = , shuffle= , num_workers= ,collate_fn = collate_text) xb,yb train_dataloader: print(xb.size(),yb.size()) 64 False 10 for in See that the batches have different sequence lengths now It will work this time as we have provided a custom collate_fn. And see that the batches have different sequence lengths now. Thus we would be able to train our BiLSTM using variable input sizes just like we wanted. Training a Neural Network We know how to create a neural network using . But how to train it? Any neural network that has to be trained will have a training loop that will look something similar to below: nn.Module num_epochs = epoch range(num_epochs): model.train() x_batch,y_batch train_dataloader: optimizer.zero_grad() pred = model(x_batch) loss = loss_criterion(pred, y_batch) loss.backward() optimizer.step() model.eval() x_batch,y_batch valid_dataloader: pred = model(x_batch) val_loss = loss_criterion(pred, y_batch) 5 for in # Set model to train mode for in # Clear gradients # Forward pass - Predicted outputs # Find Loss and backpropagation of gradients # Update the parameters for in In the above code, we are running five epochs and in each epoch: We iterate through the dataset using a data loader. In each iteration, we do a forward pass using model(x_batch) We calculate the Loss using a loss_criterion We back-propagate that loss using call. We don't have to worry about the calculation of the gradients at all, as this simple call does it all for us. loss.backward() Take an optimizer step to change the weights in the whole network using . This is where weights of the network get modified using the gradients calculated in call. optimizer.step() loss.backward() We go through the validation data loader to check the validation score/metrics. Before doing validation, we set the model to eval mode using . Please note we don't back-propagate losses in eval mode. model.eval() Till now, we have talked about how to use to create networks and how to use Custom Datasets and Dataloaders with Pytorch. So let's talk about the various options available for Loss Functions and Optimizers. nn.Module Loss functions Pytorch provides us with a variety of for our most common tasks, like Classification and Regression. Some most used examples are , , and . You can read the documentation of each loss function, but to explain how to use these loss functions, I will go through the example of loss functions nn.CrossEntropyLoss nn.NLLLoss nn.KLDivLoss nn.MSELoss nn.NLLLoss The documentation for NLLLoss is pretty succinct. As in, this loss function is used for Multiclass classification, and based on the documentation: the input expected needs to be of size ( x ) — These are the predictions from the Neural Network we have created. batch_size Num_Classes We need to have the log-probabilities of each class in the input — To get log-probabilities from a Neural Network, we can add a Layer as the last layer of our network. LogSoftmax The target needs to be a tensor of classes with class numbers in the range(0, C-1) where C is the number of classes. So, we can try to use this Loss function for a simple classification network. Please note the LogSoftmax layer after the final linear layer. If you don't want to use this layer, you could have just used LogSoftmax nn.CrossEntropyLoss super().__init__() self.lin = nn.Linear( , ) self.logsoftmax = nn.LogSoftmax(dim= ) x = self.lin(x) x = self.logsoftmax(x) x : class myClassificationNet (nn.Module) : def __init__ (self) # Define all Layers Here 784 10 1 : def forward (self, x) # Connect the layer Outputs here to define the forward pass return Let’s define a random input to pass to our network to test it: X = torch.randn( , ) y = torch.randint(low = ,high = ,size = ( ,)) # some random input: 100 784 0 10 100 And pass it through the model to get predictions: model = myClassificationNet() preds = model(X) We can now get the loss as: criterion = nn.NLLLoss() loss = criterion(preds,y) loss ------------------------------------------ tensor( , grad_fn=<NllLossBackward>) 2.4852 Custom Loss Function Defining your custom loss functions is again a piece of cake, and you should be okay as long as you use tensor operations in your loss function. For example, here is the customMseLoss loss = torch.mean((output - target)** ) loss : def customMseLoss (output,target) 2 return You can use this custom loss just like before. But note that we don’t instantiate the loss using criterion this time as we have defined it as a function. output = model(x) loss = customMseLoss(output, target) loss.backward() If we wanted, we could have also written it as a class using nn.Module , and then we would have been able to use it as an object. Here is an NLLLoss custom example: super().__init__() log_prob = * x loss = log_prob.gather( , y.unsqueeze( )) loss = loss.mean() loss criterion = CustomNLLLoss() loss = criterion(preds,y) : class CustomNLLLoss (nn.Module) : def __init__ (self) : def forward (self, x, y) # x should be output from LogSoftmax Layer -1.0 # Get log_prob based on y class_index as loss=-mean(ylogp) 1 1 return Optimizers Once we get gradients using the loss.backward() call, we need to take an optimizer step to change the weights in the whole network. Pytorch provides a variety of different ready to use optimizers using the module. For example: , , and the most widely used . torch.optim torch.optim.Adadelta torch.optim.Adagrad torch.optim.RMSprop torch.optim.Adam To use the most used Adam optimizer from PyTorch, we can simply instantiate it with: optimizer = torch.optim.Adam(model.parameters(), lr= , betas=( , )) 0.01 0.9 0.999 And then use and while training the model. optimizer . zero_grad() optimizer.step() I am not discussing how to write custom optimizers as it is an infrequent use case, but if you want to have more optimizers, do check out the library, which provides a lot of other optimizers used in research papers. Also, if you anyhow want to create your own optimizers, you can take inspiration using the source code of implemented optimizers in or . pytorch-optimizer PyTorch pytorch-optimizers Other optimizers from pytorch-optimizer library Using GPU/Multiple GPUs Till now, whatever we have done is on the CPU. If you want to use a GPU, you can put your model to GPU using . Or if you want to use multiple GPUs, you can use . Here is a utility function that checks the number of GPUs in the machine and sets up parallel training automatically using DataParallel if needed. model.to('cuda') nn.DataParallel train_on_gpu = torch.cuda.is_available() print( ) train_on_gpu: gpu_count = torch.cuda.device_count() print( ) gpu_count > : multi_gpu = : multi_gpu = train_on_gpu: model = model.to( ) multi_gpu: model = nn.DataParallel(model) # Whether to train on a gpu f'Train on gpu: ' {train_on_gpu} # Number of gpus if f' gpus detected.' {gpu_count} if 1 True else False if 'cuda' if The only thing that we will need to change is that we will load our data to GPU while training if we have GPUs. It’s as simple as adding a few lines of code to our training loop. num_epochs = epoch range(num_epochs): model.train() x_batch,y_batch train_dataloader: train_on_gpu: x_batch,y_batch = x_batch.cuda(), y_batch.cuda() optimizer.zero_grad() pred = model(x_batch) loss = loss_criterion(pred, y_batch) loss.backward() optimizer.step() model.eval() x_batch,y_batch valid_dataloader: train_on_gpu: x_batch,y_batch = x_batch.cuda(), y_batch.cuda() pred = model(x_batch) val_loss = loss_criterion(pred, y_batch) 5 for in for in if for in if Conclusion Pytorch provides a lot of customizability with minimal code. While at first, it might be hard to understand how the whole ecosystem is structured with classes, in the end, it is simple Python. In this post, I have tried to break down most of the parts you might need while using Pytorch, and I hope it makes a little more sense for you after reading this. You can find the code for this post here on my repo, where I keep codes for all my blogs. GitHub If you want to learn more about Pytorch using a course based structure, take a look at the course by IBM on Coursera. Also, if you want to know more about Deep Learning, I would like to recommend this excellent course on in the . Deep Neural Networks with PyTorch Deep Learning in Computer Vision Advanced machine learning specialization Thanks for the read. I am going to be writing more beginner-friendly posts in the future too. Follow me up at or Subscribe to my to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter Medium blog @mlwhiz : There are some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea. Full disclosure Also published on: https://mlwhiz.com/blog/2020/09/09/pytorch_guide/