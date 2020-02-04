| Product Manager | Machine Learning Practitioner | UI/UX Designer/Preacher | Full-Stack Developer |
# We'll use fast.ai to showcase how to build your own 'nn.Linear' module
%matplotlib inline
from fastai.basics import *
import sys
# create and download/prepare our MNIST dataset
path = Config().data_path()/'mnist'
path.mkdir(parents=True)
!wget http://deeplearning.net/data/mnist/mnist.pkl.gz -P {path}
# Get the images downloaded into data set
with gzip.open(path/'mnist.pkl.gz', 'rb') as f:
((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')
# Have a look at the images and shape
plt.imshow(x_train[0].reshape((28,28)), cmap="gray")
x_train.shape
# convert numpy into PyTorch tensor
x_train,y_train,x_valid,y_valid = map(torch.tensor, (x_train,y_train,x_valid,y_valid))
n,c = x_train.shape
x_train.shape, y_train.min(), y_train.max()
# prepare dataset and create fast.ai DataBunch for training
bs=64
train_ds = TensorDataset(x_train, y_train)
valid_ds = TensorDataset(x_valid, y_valid)
data = DataBunch.create(train_ds, valid_ds, bs=bs)
# create a simple MNIST logistic model with only one Linear layer
class Mnist_Logistic(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(784, 10, bias=True)
def forward(self, xb): return self.lin(xb)
model =Mnist_Logistic()
lr=2e-2
loss_func = nn.CrossEntropyLoss()
# define update function with weight decay
def update(x,y,lr):
wd = 1e-5
y_hat = model(x)
# weight decay
w2 = 0.
for p in model.parameters(): w2 += (p**2).sum()
# add to regular loss
loss = loss_func(y_hat, y) + w2*wd
loss.requres_grad = True
loss.backward()
with torch.no_grad():
for p in model.parameters():
p.sub_(lr * p.grad)
p.grad.zero_()
return loss.item()
# iterate through one epoch and plot losses
losses = [update(x,y,lr) for x,y in data.train_dl]
plt.plot(losses);
layer. We also write our own
Linear
function instead of using the
update
optimizers since we could be writing our own optimizers from scratch as the next step of our PyTorch learning journey. Finally, we iterate through the dataset and plot the losses to see whether and how well it works.
torch.optim
.
torch.nn.Module
class myLinear(nn.Module):
dunder function to initialize our linear layer and a
__init__
function to do the forward calculation. Let’s look at the
forward
function first.
__init__
module has the following attributes:
nn.Linear
def __init__(self, in_features, out_features, bias=True):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.bias = bias
self.weight = torch.nn.Parameter(torch.randn(out_features, in_features))
self.bias = torch.nn.Parameter(torch.randn(out_features))
to set our
torch.nn.Parameter
and
weight
, otherwise, it won’t train.
bias
instead of what’s described in the document to initialize the parameters. This is not the best way of doing weights initialization, but our purpose is to get it to work first, we’ll tweak it in our next iteration.
torch.randn
part is done, let’s move on to
__init__
function. This is actually the easy part:
forward
def forward(self, input):
_, y = input.shape
if y != self.in_features:
sys.exit(f'Wrong Input Features. Please use tensor with {self.in_features} Input Features')
output = input @ self.weight.t() + self.bias
return output
my = myLinear(20,10)
a = torch.randn(5,20)
my(a)
to
self.lin = nn.Linear(784,10, bias=True)
. Run the code, you should see something like this plot:
self.lin = myLinear(784, 10, bias=True)
part. Let’s fix that in the next iteration. The final code for iteration 1 looks like this:
bias
class myLinear(nn.Module):
def __init__(self, in_features, out_features, bias=True):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.bias = bias
self.weight = torch.nn.Parameter(torch.randn(out_features, in_features))
self.bias = torch.nn.Parameter(torch.randn(out_features))
def forward(self, input):
x, y = input.shape
if y != self.in_features:
sys.exit(f'Wrong Input Features. Please use tensor with {self.in_features} Input Features')
output = input @ self.weight.t() + self.bias
return output
and
__init__
, but remember we also have a
forward
attribute that if
bias
, will not learn additive bias. We have not implemented that yet. Also, we used
False
to initialize the weight and bias, which is not optimum. Let’s fix this. The updated
torch.nn.randn
function looks like this:
__init__
def __init__(self, in_features, out_features, bias=True):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.bias = bias
self.weight = torch.nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = torch.nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
and
weight
parameters, we didn’t initialize them as the last iteration. We just allocate a regular Tensor object to it. The actual initialization is done in another function
bias
(will explain later).
reset_parameters
, we added a condition that if
bias
, do what we did the last iteration, but if
True
, will use
False
(‘bias’, None) to give it
register_parameter
value. Now for
None
function, it looks like this:
reset_parameter
def reset_parameters(self):
torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ torch.nn.init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
torch.nn.init.uniform_(self.bias, -bound, bound)
. It’s from a paper Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification — He, K. et al. (2015).
kaiming_uniform_
, it avoids the issue of vanishing/exploding gradients issue(though we only have one layer here, when writing the Linear class, we should still keep MLN in mind).
bound
, we actually give the a
self.weight
value of
a
instead of the
math.sqrt(5)
, this is explained in this GitHub issue of PyTorch repo for whom might be interested.
math.sqrt(fan_in)
string to the model:
extra_repr
def extra_repr(self):
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
class myLinear(nn.Module):
def __init__(self, in_features, out_features, bias=True):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.bias = bias
self.weight = torch.nn.Parameter(torch.Tensor(out_features, in_features))
if bias:
self.bias = torch.nn.Parameter(torch.Tensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = torch.nn.init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
torch.nn.init.uniform_(self.bias, -bound, bound)
def forward(self, input):
x, y = input.shape
if y != self.in_features:
print(f'Wrong Input Features. Please use tensor with {self.in_features} Input Features')
return 0
output = input.matmul(weight.t())
if bias is not None:
output += bias
ret = output
return ret
def extra_repr(self):
return 'in_features={}, out_features={}, bias={}'.format(
self.in_features, self.out_features, self.bias is not None
)
a bit. It might seem boring and redundant, but sometimes the fastest( and shortest) way is the ‘boring’ way. Once you get to the very bottom of this, the feeling of knowing that there’s nothing ‘more’ is priceless. You’ll come to the realization that:
nn.modules
Underneath PyTorch, there’s no trick, no myth, no catch, just rock-solid Python code.