This is actually an assignment from ’s , lesson 5. I’ve showcased using PyTorch. Today, let’s try to delve down even deeper and see if we could write our own nn.Linear module. Why waste your time writing your own PyTorch module while it’s already been written by the devs over at Facebook? Jeremy Howard fast.ai course how easy it is to build a Convolutional Neural Networks from scratch Well, for one, you’ll gain a deeper understanding of how all the pieces are put together. By comparing your code with the PyTorch code, you will gain knowledge of why and how these libraries are developed. Also, once you’re done, you’ll have more confidence in implementing and using all these libraries, knowing how things work. There will be no myth to you. And last but not least, you’ll be able to modify/tweak these modules should the situation require. And this is the difference between a noob and a pro. OK, enough of the motivation, let’s get to it. Simple MNIST one layer NN as the backdrop First of all, we need some ‘backdrop’ codes to test whether and how well our module performs. Let’s build a very simple one-layer neural network to solve the good-old MNIST dataset. The code (running in Jupyter Notebook) snippet below: %matplotlib inline fastai.basics * sys path = Config().data_path()/ path.mkdir(parents= ) !wget http://deeplearning.net/data/mnist/mnist.pkl.gz -P {path} gzip.open(path/ , ) f: ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding= ) plt.imshow(x_train[ ].reshape(( , )), cmap= ) x_train.shape x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid)) n,c = x_train.shape x_train.shape, y_train.min(), y_train.max() bs= train_ds = TensorDataset(x_train, y_train) valid_ds = TensorDataset(x_valid, y_valid) data = DataBunch.create(train_ds, valid_ds, bs=bs) super().__init__() self.lin = nn.Linear( , , bias= ) self.lin(xb) model =Mnist_Logistic() lr= loss_func = nn.CrossEntropyLoss() wd = y_hat = model(x) w2 = p model.parameters(): w2 += (p** ).sum() loss = loss_func(y_hat, y) + w2*wd loss.requres_grad = loss.backward() torch.no_grad(): p model.parameters(): p.sub_(lr * p.grad) p.grad.zero_() loss.item() losses = [update(x,y,lr) x,y data.train_dl] plt.plot(losses); # We'll use fast.ai to showcase how to build your own 'nn.Linear' module from import import # create and download/prepare our MNIST dataset 'mnist' True # Get the images downloaded into data set with 'mnist.pkl.gz' 'rb' as 'latin-1' # Have a look at the images and shape 0 28 28 "gray" # convert numpy into PyTorch tensor # prepare dataset and create fast.ai DataBunch for training 64 # create a simple MNIST logistic model with only one Linear layer : class Mnist_Logistic (nn.Module) : def __init__ (self) 784 10 True : def forward (self, xb) return 2e-2 # define update function with weight decay : def update (x,y,lr) 1e-5 # weight decay 0. for in 2 # add to regular loss True with for in return # iterate through one epoch and plot losses for in These codes are quite self-explanatory. We used the library for this project. Download the MNIST pickle file and unzip it, transfer it into a PyTorch tensor, then stuff it into a fast.ai DataBunch object for further training. Then we created a simple neural network with only one layer. We also write our own function instead of using the optimizers since we could be writing our own optimizers from scratch as the next step of our PyTorch learning journey. Finally, we iterate through the dataset and plot the losses to see whether and how well it works. fast.ai Linear update torch.optim First Iteration: Just make it work All PyTorch modules/layers are extended from the . torch.nn.Module : class myLinear (nn.Module) Within the class, we’ll need an dunder function to initialize our linear layer and a function to do the forward calculation. Let’s look at the function first. __init__ forward __init__ We’ll use the PyTorch official document as a guideline to build our module. From the document, an module has the following attributes: nn.Linear So we’ll get these three attributes in: super().__init__() self.in_features = in_features self.out_features = out_features self.bias = bias : def __init__ (self, in_features, out_features, bias=True) The class also needs to hold weight and bias parameters so it can be trained. We also initialize those. self.weight = torch.nn.Parameter(torch.randn(out_features, in_features)) self.bias = torch.nn.Parameter(torch.randn(out_features)) Here we used to set our and , otherwise, it won’t train. torch.nn.Parameter weight bias Also, note that we used instead of what’s described in the document to initialize the parameters. This is not the best way of doing weights initialization, but our purpose is to get it to work first, we’ll tweak it in our next iteration. torch.randn OK, now that the part is done, let’s move on to function. This is actually the easy part: __init__ forward _, y = input.shape y != self.in_features: sys.exit( ) output = input @ self.weight.t() + self.bias output : def forward (self, input) if f'Wrong Input Features. Please use tensor with Input Features' {self.in_features} return We first get the shape of the input, figure out how many columns are in the input, then check whether the input size match. Then we do the matrix multiplication (Note we did a transpose here to align the weights) and return the results. We can test whether it works by giving it some data: my = myLinear( , ) a = torch.randn( , ) my(a) 20 10 5 20 We have a 5x20 input, it goes through our layer and gets a 5x10 output. You should get results like this: OK, now go back to our neural network codes and find the Mnist_Logistic class, change to . Run the code, you should see something like this plot: self.lin = nn.Linear(784,10, bias=True) self.lin = myLinear(784, 10, bias=True) As you can see it doesn’t converge quite well (around 2.5 loss with one epoch). That’s probably because of our poor initialization. Also, we didn’t take care of the part. Let’s fix that in the next iteration. The final code for looks like this: bias iteration 1 super().__init__() self.in_features = in_features self.out_features = out_features self.bias = bias self.weight = torch.nn.Parameter(torch.randn(out_features, in_features)) self.bias = torch.nn.Parameter(torch.randn(out_features)) x, y = input.shape y != self.in_features: sys.exit( ) output = input @ self.weight.t() + self.bias output : class myLinear (nn.Module) : def __init__ (self, in_features, out_features, bias=True) : def forward (self, input) if f'Wrong Input Features. Please use tensor with Input Features' {self.in_features} return Second iteration: Proper weight initialization and bias handling We’ve handled and , but remember we also have a attribute that if , will not learn additive bias. We have not implemented that yet. Also, we used to initialize the weight and bias, which is not optimum. Let’s fix this. The updated function looks like this: __init__ forward bias False torch.nn.randn __init__ super().__init__() self.in_features = in_features self.out_features = out_features self.bias = bias self.weight = torch.nn.Parameter(torch.Tensor(out_features, in_features)) bias: self.bias = torch.nn.Parameter(torch.Tensor(out_features)) : self.register_parameter( , ) self.reset_parameters() : def __init__ (self, in_features, out_features, bias=True) if else 'bias' None First of all, when we create the and parameters, we didn’t initialize them as the last iteration. We just allocate a regular Tensor object to it. The actual initialization is done in another function ( ). weight bias reset_parameters will explain later For , we added a condition that if , do what we did the last iteration, but if , will use (‘bias’, None) to give it value. Now for function, it looks like this: bias True False register_parameter None reset_parameter torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt( )) self.bias : fan_in, _ torch.nn.init._calculate_fan_in_and_fan_out(self.weight) bound = / math.sqrt(fan_in) torch.nn.init.uniform_(self.bias, -bound, bound) : def reset_parameters (self) 5 if is not None 1 The above code is taken directly from PyTorch source code. What PyTorch did with weight initialization is called . It’s from a paper . kaiming_uniform_ Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification — He, K. et al. (2015) What it actually does is by initializing weight with a normal distribution , it avoids the issue of issue( ). with mean 0 and variance bound vanishing/exploding gradients though we only have one layer here, when writing the Linear class, we should still keep MLN in mind Notice that for , we actually give the a value of instead of the , this is explained in of PyTorch repo for whom might be interested. self.weight a math.sqrt(5) math.sqrt(fan_in) this GitHub issue Also, we can add some string to the model: extra_repr .format( self.in_features, self.out_features, self.bias ) : def extra_repr (self) return 'in_features={}, out_features={}, bias={}' is not None The final model looks like this: super().__init__() self.in_features = in_features self.out_features = out_features self.bias = bias self.weight = torch.nn.Parameter(torch.Tensor(out_features, in_features)) bias: self.bias = torch.nn.Parameter(torch.Tensor(out_features)) : self.register_parameter( , ) self.reset_parameters() torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt( )) self.bias : fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.weight) bound = / math.sqrt(fan_in) torch.nn.init.uniform_(self.bias, -bound, bound) x, y = input.shape y != self.in_features: print( ) output = input.matmul(weight.t()) bias : output += bias ret = output ret .format( self.in_features, self.out_features, self.bias ) : class myLinear (nn.Module) : def __init__ (self, in_features, out_features, bias=True) if else 'bias' None : def reset_parameters (self) 5 if is not None 1 : def forward (self, input) if f'Wrong Input Features. Please use tensor with Input Features' {self.in_features} return 0 if is not None return : def extra_repr (self) return 'in_features={}, out_features={}, bias={}' is not None Rerun the code, you should be able to see this plot: We can see it converges much faster to a 0.5 loss in one epoch. Conclusion I hope this helps you clear the cloud on these PyTorch a bit. It might seem boring and redundant, but sometimes the fastest( and shortest) way is the ‘boring’ way. Once you get to the very bottom of this, the feeling of knowing that there’s nothing ‘more’ is priceless. You’ll come to the realization that: nn.modules Underneath PyTorch, there’s no trick, no myth, no catch, just rock-solid Python code. Also by writing your own code, then compare it with official source code, you’ll be able to see where the difference is and learn from the best in the industry. How cool is that? Found this article useful? Follow me on Twitter or my blog site . @lymenlee wayofnumbers.com Previously published at https://towardsdatascience.com/how-to-build-your-own-pytorch-neural-network-layer-from-scratch-842144d623f6