The basic principles required to solve classification tasks with neural networks are used as building blocks in more complicated deep learning problems such as object detection and instance segmentation. Thus, it is important to understand the reasoning behind choosing one or another activation and loss functions. This post will answer the question "What activation and loss functions do you need to use to solve binary classification task?" with an
Traditionally binary classification models use sigmoid activation and binary cross-entropy loss (BCE). These two functions are broadly used in more complicated neural networks, such as object detection CNN models and recurrent neural networks. YOLOX object detection model, for example, uses sigmoid activation and BCE in two of its branches as you can see in the figure below.
Recurrent neural networks with gated units, such as LSTM, use sigmoid to help the recurrent NN decide whether to update or forget the data.
If you know the logic behind applying sigmoid activation and BCE loss you are one step closer to understanding and building more complicated NN models.
In supervised machine learning the classification problems can be represented as a set of samples {(x_1, y_1), (x_2, y_2),...,(x_n, y_n)}, where x_i is an m-dimensional vector that contains features of sample i and y_i is the class to which x_i belongs. The goal is to build a model that predicts the label y_i for each input sample x_i. There are three types of classification problems:
Moreover, there are two main types of classifiers:
Binary classification can be applied to real-life problems:
As discussed before, in the binary classification you are given:
To build a binary classification neural network as a probabilistic classifier we need:
The final linear layer of a neural network outputs a vector of "raw output values". In the case of classification, the output values represent the model's confidence that the input belongs to one of the classes. As discussed before the output layer needs to be the size of 1 and the output value should be converted into a probability p. To obtain the probability you can use the sigmoid activation function which maps the input to the output between 0 and 1. The sigmoid function is defined as
An example of input-output values for sigmoid is provided in the table below.
Input |
-5 |
-4 |
-3 |
-2 |
-1 |
0 |
1 |
2 |
3 |
4 |
5 |
---|---|---|---|---|---|---|---|---|---|---|---|
Output |
0.007 |
0.018 |
0.047 |
0.119 |
0.269 |
0.5 |
0.731 |
0.881 |
0.953 |
0.982 |
0.993 |
Let's plot this table with input values as the x-axis and output values as the y-axis to visualize the sigmoid function.
As you can see sigmoid is a function that maps all input values into a range from 0 to 1 and we can use it for the binary classification task with the output layer of size 1.
The most common loss function for probabilistic binary classifiers is the binary cross-entropy loss, which is defined as
Where N is the number of input samples, y is the ground truth, and p is the predicted probability.
The table below shows loss values if the ground truth is 1 and input values range from 0 to 1. From the table we can make several observations:
ground truth |
1 |
1 |
1 |
1 |
1 |
1 |
---|---|---|---|---|---|---|
prediction |
0 |
0.2 |
0.4 |
0.6 |
0.8 |
1 |
BCE loss |
inf |
1.609 |
0.916 |
0.511 |
0.223 |
0 |
Let's remove the sum from the equation and analyze the term inside:
The plot of -log(x) below shows that the function has the minimum value at x=1.
There are two things that can be observed from the plot and the formula:
The observed properties make BCE a perfect loss function for binary classification problems.
Before heading to the code let's summarize what we need to implement a probabilistic binary classification NN:
Let's code a neural network for binary classification with the PyTorch framework.
First, install
# used for accuracy metric and confusion matrix
!pip install torchmetrics
Import packages that will be used later in the code
from sklearn.datasets import make_classification
import numpy as np
import torch
import torchmetrics
import matplotlib.pyplot as plt
import seaborn as sn
import pandas as pd
from sklearn.decomposition import PCA
Set global variable with the number of classes
number_of_classes=2
I will use
The generated dataset will have X with shape [n_samples, n_features] and Y with shape [n_samples, ].
def get_dataset(n_samples=10000, n_features=20, n_classes=2):
# https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification
data_X, data_y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes,
n_informative=n_classes, n_redundant=0, n_clusters_per_class=2,
random_state=42,
class_sep=2)
return data_X, data_y
Define functions to visualize and print out dataset statistics.
show_dataset function uses
def print_dataset(X, y):
print(f'X shape: {X.shape}, min: {X.min()}, max: {X.max()}')
print(f'y shape: {y.shape}')
print(y[:10])
def show_dataset(X, y, title=''):
if X.shape[1] > 2:
X_pca = PCA(n_components=2).fit_transform(X)
else:
X_pca = X
fig = plt.figure(figsize=(4, 4))
plt.scatter(x=X_pca[:, 0], y=X_pca[:, 1], c=y, alpha=0.5)
# generate colors for all classes
colors = plt.cm.rainbow(np.linspace(0, 1, number_of_classes))
# iterate over classes and visualize them with the dedicated color
for class_id in range(number_of_classes):
class_mask = np.argwhere(y == class_id)
X_class = X_pca[class_mask[:, 0]]
plt.scatter(x=X_class[:, 0], y=X_class[:, 1],
c=np.full((X_class[:, 0].shape[0], 4), colors[class_id]),
label=class_id, alpha=0.5)
plt.title(title)
plt.legend(loc="best", title="Classes")
plt.xticks()
plt.yticks()
plt.show()
Scale the dataset features X to range [0,1] with min max scaler. This is usually done for faster and more stable training.
def scale(x_in):
return (x_in - x_in.min(axis=0))/(x_in.max(axis=0)-x_in.min(axis=0))
Let's print out the generated dataset statistics and visualized it with the functions from above.
X, y = get_dataset(n_classes=number_of_classes, n_features=2)
print('before scaling')
print_dataset(X, y)
show_dataset(X, y, 'before')
X_scaled = scale(X)
print('after scaling')
print_dataset(X_scaled, y)
show_dataset(X_scaled, y, 'after')
The outputs you should get are below.
before scaling
X shape: (10000, 2), min: -6.049090666105036, max: 5.311074029997754
y shape: (10000,)
[0 0 1 1 0 1 1 0 1 0]
after scaling
X shape: (10000, 2), min: 0.0, max: 1.0
y shape: (10000,)
[0 0 1 1 0 1 1 0 1 0]
As you can see min max scaling does not distort dataset features, it just transforms them into the range [0,1].
Create PyTorch data loaders.
def get_data_loaders(dataset, batch_size=32, shuffle=True):
data_X, data_y = dataset
# https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset
torch_dataset = torch.utils.data.TensorDataset(torch.tensor(data_X, dtype=torch.float32),
torch.tensor(data_y, dtype=torch.float32))
# https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split
train_dataset, val_dataset = torch.utils.data.random_split(torch_dataset, [int(len(torch_dataset)*0.8),
int(len(torch_dataset)*0.2)],
torch.Generator().manual_seed(42))
# https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
loader_train = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle)
loader_val = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=shuffle)
return loader_train, loader_val
Test PyTorch data loaders
dataloader_train, dataloader_val = get_data_loaders(get_dataset(n_classes=number_of_classes), batch_size=32)
train_batch_0 = next(iter(dataloader_train))
print(f'Batches in the train dataloader: {len(dataloader_train)}, X: {train_batch_0[0].shape}, Y: {train_batch_0[1].shape}')
val_batch_0 = next(iter(dataloader_val))
print(f'Batches in the validation dataloader: {len(dataloader_val)}, X: {val_batch_0[0].shape}, Y: {val_batch_0[1].shape}')
The output:
Batches in the train dataloader: 250, X: torch.Size([32, 20]), Y: torch.Size([32])
Batches in the validation dataloader: 63, X: torch.Size([32, 20]), Y: torch.Size([32])
Create pre and postprocessing functions. As you may have noted before current Y shape is [N,], we need it to be [N,1]. To do that we can expand the Y shape to [N,1] with
def preprocessing(y):
'''
expland input labels shape [N,] to [N,1]
input: y - [N,] numpy array or pytorch Tensor
output: [N, 1] the same type as input
'''
assert type(y) == np.ndarray or torch.is_tensor(
y), f'input should be numpy array or torch tensor. Received input is: {type(y)}'
assert len(y.shape) == 1, f'input shape should be [N,]. Received input shape is: {y.shape}'
if torch.is_tensor(y):
return torch.unsqueeze(y, dim=1)
else:
return np.expand_dims(y, axis=1)
Postprocessing is simply thresholding input values: if the value is larger than the threshold, set it to 1, if it's lower then set it to 0. Postprocessing is used to output class 0 or 1 based on the model's output probability.
def postprocessing(y, threshold=0.5):
'''
set input y with values larger than threshold to 1 and lower than threshold to 0
input: y - [N,1] numpy array or pytorch Tensor
output: int array [N,1] the same class type as input
'''
assert type(y) == np.ndarray or torch.is_tensor(
y), f'input should be numpy array or torch tensor. Received input is: {type(y)}'
assert len(y.shape) == 2, f'input shape should be [N,classes]. Received input shape is: {y.shape}'
if torch.is_tensor(y):
return (y >= threshold).int()
else:
return (y >= threshold).astype(int)
Test the defined pre and postprocessing functions.
y = np.random.rand(10, )
y_preprocessed = preprocessing(y)
print(f'y shape: {y.shape}, y preprocessed shape: {y_preprocessed.shape}')
y_postprocessed = postprocessing(y_preprocessed, threshold=0.5)
print(f'y preprocessed shape: {y_preprocessed.shape},y postprocessed shape: {y_postprocessed.shape}')
print('Postprocessing sets array elements>=threshold to 1 and elements<threshold to 0:')
for i in range(10):
print(f'\t{y_preprocessed[i, 0]:.2f} >> {y_postprocessed[i, 0]}')
The output:
y shape: (10,), y preprocessed shape: (10, 1)
y preprocessed shape: (10, 1),y postprocessed shape: (10, 1)
Postprocessing sets array elements>=threshold to 1 and elements<threshold to 0:
0.81 >> 1
0.67 >> 1
0.66 >> 1
0.10 >> 0
0.39 >> 0
0.50 >> 1
0.54 >> 1
0.06 >> 0
0.92 >> 1
0.93 >> 1
This section shows an implementation of all functions required to train a binary classification model.
The PyTorch-based implementation of the sigmoid formula
def sigmoid(x):
return 1/(1+torch.exp(-x))
Let's test sigmoid:
test_input = torch.arange(-10, 11, 1, dtype=torch.float32)
test_input = preprocessing(test_input)
sigmoid_output = sigmoid(test_input)
print(f'Input data shape: {test_input.shape}')
print(f'input data range: [{test_input.min():.3f}, {test_input.max():.3f}]')
print(f'sigmoid output data range: [{sigmoid_output.min():.3f}, {sigmoid_output.max():.3f}]')
print(test_input[:2])
print(sigmoid_output[:2])
# compare the sigmoid implementation with pytorch implementation
torch_sigmoid_output = torch.nn.functional.sigmoid(test_input)
print(f'sigmoid output is the same with pytorch implementation: {(torch_sigmoid_output == sigmoid_output).all().numpy()}')
fig = plt.figure(figsize=(4, 2), facecolor=(0.0, 1.0, 0.0))
ax = fig.add_subplot(1, 1, 1)
ax.plot(test_input, sigmoid_output, color='red')
ax.set_ylim([0, 1])
ax.set_title('sigmoid')
ax.set_facecolor((0.0, 1.0, 0.0))
fig.show()
The output of the code above:
Input data shape: torch.Size([21, 1])
input data range: [-10.000, 10.000]
sigmoid output data range: [0.000, 1.000]
tensor([[-10.],
[ -9.]])
tensor([[4.5398e-05],
[1.2339e-04]])
sigmoid output is the same with pytorch implementation: True
The PyTorch-based implementation of the BCE formula
To make sure that the inner term of log is never 0 use min=epsilon
and max=1-epsilon
.
def binary_cross_entropy(pred, y):
# log(0)=-inf
# to prevent that clamp NN output values into [eps, 1-eps] values
eps = 1e-8
pred = torch.clamp(pred, min=eps, max=1 - eps)
loss = -y * torch.log(pred) - (1 - y) * torch.log(1 - pred)
return loss.mean()
Test BCE implementation:
test_input = torch.rand(10, 1, dtype=torch.float32)
# get "ground truth" for test input by thresholding test_input
test_input_gt = postprocessing(test_input).float()
print(f'test input shape: {test_input.shape}, gt shape: {test_input_gt.shape}')
print(f'test_input range: [{test_input.min().numpy():.2f}, {test_input.max().numpy():.2f}]')
print(f'test_input gt range: [{test_input_gt.min().numpy()}, {test_input_gt.max().numpy()}]')
# get loss with the binary_cross_entropy implementation
loss = binary_cross_entropy(test_input, test_input_gt)
# get loss with pytorch binary_cross_entropy implementation
loss_pytorch = torch.nn.functional.binary_cross_entropy(test_input, test_input_gt)
print(f'loss outputs are the same: {(loss == loss_pytorch).numpy()}')
The expected output
test input shape: torch.Size([10, 1]), gt shape: torch.Size([10, 1])
test_input range: [0.02, 0.80]
test_input gt range: [0.0, 1.0]
loss outputs are the same: True
I will use
To create binary classification accuracy metric two parameters are required:
# https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html#module-interface
accuracy_metric=torchmetrics.classification.Accuracy(task="binary", threshold=0.5)
def compute_accuracy(y_pred, y):
return accuracy_metric(y_pred, y)
The NN used in this example is a deep NN with 2 hidden layers. Input and hidden layers use ReLU activation and the final layer uses the activation function provided as the class input (it will be the sigmoid activation function that was implemented before).
class ClassifierNN(torch.nn.Module):
def __init__(self, loss_function, activation_function, input_dims=2, output_dims=1):
super().__init__()
self.linear1 = torch.nn.Linear(input_dims, input_dims * 4)
self.linear2 = torch.nn.Linear(input_dims * 4, input_dims * 8)
self.linear3 = torch.nn.Linear(input_dims * 8, input_dims * 4)
self.output = torch.nn.Linear(input_dims * 4, output_dims)
self.loss_function = loss_function
self.activation_function = activation_function
def forward(self, x):
x = torch.nn.functional.relu(self.linear1(x))
x = torch.nn.functional.relu(self.linear2(x))
x = torch.nn.functional.relu(self.linear3(x))
x = self.activation_function(self.output(x))
return x
The figure above depicts the binary classification training logic for a single batch. Later the train_epoch function will be called multiple times (chosen number of epochs).
def train_epoch(model, optimizer, dataloader_train):
# set the model to the training mode
# https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train
model.train()
losses = []
accuracies = []
for step, (X_batch, y_batch) in enumerate(dataloader_train):
### forward propagation
# get model output and use loss function
y_pred = model(X_batch) # get class probabilities with shape [N,1]
# apply loss function on predicted probabilities and ground truth
loss = model.loss_function(y_pred, y_batch)
### backward propagation
# set gradients to zero before backpropagation
# https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
optimizer.zero_grad()
# compute gradients
# https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html
loss.backward()
# update weights
# https://pytorch.org/docs/stable/optim.html#taking-an-optimization-step
optimizer.step() # update model weights
# calculate batch accuracy
acc = compute_accuracy(y_pred, y_batch)
# append batch loss and accuracy to corresponding lists for later use
accuracies.append(acc)
losses.append(float(loss.detach().numpy()))
# compute average epoch accuracy
train_acc = np.array(accuracies).mean()
# compute average epoch loss
loss_epoch = np.array(losses).mean()
return train_acc, loss_epoch
The evaluate function iterates over provided PyTorch dataloader and computes current model accuracy and returns average loss and average accuracy.
def evaluate(model, dataloader_in):
# set the model to the evaluation mode
# https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval
model.eval()
val_acc_epoch = 0
losses = []
accuracies = []
# disable gradient calculation for evaluation
# https://pytorch.org/docs/stable/generated/torch.no_grad.html
with torch.no_grad():
for step, (X_batch, y_batch) in enumerate(dataloader_in):
# get predictions
y_pred = model(X_batch)
# calculate loss
loss = model.loss_function(y_pred, y_batch)
# calculate batch accuracy
acc = compute_accuracy(y_pred, y_batch)
accuracies.append(acc)
losses.append(float(loss.detach().numpy()))
# compute average accuracy
val_acc = np.array(accuracies).mean()
# compute average loss
loss_epoch = np.array(losses).mean()
return val_acc, loss_epoch
predict function iterates over the provided dataloader, collects post-processed model predictions and ground truth values into [N,1] PyTorch arrays, and returns both arrays. Later this function will be used to compute the confusion matrix and visualize predictions.
def predict(model, dataloader):
# set the model to the evaluation mode
# https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval
model.eval()
xs, ys = next(iter(dataloader))
y_pred = torch.empty([0, ys.shape[1]])
x = torch.empty([0, xs.shape[1]])
y = torch.empty([0, ys.shape[1]])
# disable gradient calculation for evaluation
# https://pytorch.org/docs/stable/generated/torch.no_grad.html
with torch.no_grad():
for step, (X_batch, y_batch) in enumerate(dataloader):
# get predictions
y_batch_pred = model(X_batch)
y_pred = torch.cat([y_pred, y_batch_pred])
y = torch.cat([y, y_batch])
x = torch.cat([x, X_batch])
# print(y_pred.shape, y.shape)
y_pred = postprocessing(y_pred)
y = postprocessing(y)
return y_pred, y, x
To train the model we just need to call the train_epoch function N times, where N is the number of epochs. The evaluate function is called to log the current model accuracy on the validation dataset. Finally, the best model is updated based on the validation accuracy. The model_train function returns the best validation accuracy and the training history.
def model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=50):
best_acc = 0
best_weights = None
history = {'loss': {'train': [], 'validation': []},
'accuracy': {'train': [], 'validation': []}}
for epoch in range(n_epochs):
# train on dataloader_train
acc_train, loss_train = train_epoch(model, optimizer, dataloader_train)
# evaluate on dataloader_val
acc_val, loss_val = evaluate(model, dataloader_val)
print(f'Epoch: {epoch} | Accuracy: {acc_train:.3f} / {acc_val:.3f} | ' +
f'loss: {loss_train:.5f} / {loss_val:.5f}')
# save epoch losses and accuracies in history dictionary
history['loss']['train'].append(loss_train)
history['loss']['validation'].append(loss_val)
history['accuracy']['train'].append(acc_train)
history['accuracy']['validation'].append(acc_val)
# Save the best validation accuracy model
if acc_val >= best_acc:
print(f'\tBest weights updated. Old accuracy: {best_acc:.4f}. New accuracy: {acc_val:.4f}')
best_acc = acc_val
torch.save(model.state_dict(), 'best_weights.pt')
# restore model and return best accuracy
model.load_state_dict(torch.load('best_weights.pt'))
return best_acc, history
def plot_history(history):
fig = plt.figure(figsize=(8, 4), facecolor=(0.0, 1.0, 0.0))
ax = fig.add_subplot(1, 2, 1)
ax.plot(np.arange(0, len(history['loss']['train'])), history['loss']['train'], color='red', label='train')
ax.plot(np.arange(0, len(history['loss']['validation'])), history['loss']['validation'], color='blue',
label='validation')
ax.set_title('Loss history')
ax.set_facecolor((0.0, 1.0, 0.0))
ax.legend()
ax = fig.add_subplot(1, 2, 2)
ax.plot(np.arange(0, len(history['accuracy']['train'])), history['accuracy']['train'], color='red', label='train')
ax.plot(np.arange(0, len(history['accuracy']['validation'])), history['accuracy']['validation'], color='blue',
label='validation')
ax.set_title('Accuracy history')
ax.legend()
fig.tight_layout()
ax.set_facecolor((0.0, 1.0, 0.0))
fig.show()
Let's put everything together and train the binary classification model.
#########################################
# Get the dataset
X, y = get_dataset(n_classes=number_of_classes)
print(f'Generated dataset shape. X:{X.shape}, y:{y.shape}')
# change y numpy array shape from [N,] to [N, 1] for binary classification
y = preprocessing(y)
print(f'Dataset shape prepared for binary classification with sigmoid activation and BCE loss.')
print(f'X:{X.shape}, y:{y.shape}')
# Get train and validation dataloaders
dataloader_train, dataloader_val = get_data_loaders(dataset=(scale(X), y), batch_size=32)
# get a batch from dataloader and output intput and output shape
X_0, y_0 = next(iter(dataloader_train))
print(f'Model input data shape: {X_0.shape}, output (ground truth) data shape: {y_0.shape}')
#########################################
# Create ClassifierNN for binary classification problem
# input dims: [N, features]
# output dims: [N, 1]
# activation - sigmoid to output probability p in range [0,1]
# loss - binary cross-entropy
model = ClassifierNN(loss_function=binary_cross_entropy,
activation_function=sigmoid,
input_dims=X.shape[1],
output_dims=y.shape[1])
#########################################
# create optimizer and train the model on the dataset
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
print(f'Model size: {sum([x.reshape(-1).shape[0] for x in model.parameters()])} parameters')
print('#' * 10)
print('Start training')
acc, history = model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=20)
print('Finished training')
print('#' * 10)
print("Model accuracy: %.2f%%" % (acc * 100))
plot_history(history)
The expected output should be similar to the one provided below.
Generated dataset shape. X:(10000, 20), y:(10000,)
Dataset shape prepared for binary classification with sigmoid activation and BCE loss.
X:(10000, 20), y:(10000, 1)
Model input data shape: torch.Size([32, 20]), output (ground truth) data shape: torch.Size([32, 1])
Model size: 27601 parameters
##########
Start training
Epoch: 0 | Accuracy: 0.690 / 0.952 | loss: 0.65095 / 0.53560
Best weights updated. Old accuracy: 0.0000. New accuracy: 0.9524
Epoch: 1 | Accuracy: 0.956 / 0.970 | loss: 0.33146 / 0.18328
Best weights updated. Old accuracy: 0.9524. New accuracy: 0.9702
Epoch: 2 | Accuracy: 0.965 / 0.973 | loss: 0.14162 / 0.11417
Best weights updated. Old accuracy: 0.9702. New accuracy: 0.9732
Epoch: 3 | Accuracy: 0.970 / 0.975 | loss: 0.10551 / 0.09519
Best weights updated. Old accuracy: 0.9732. New accuracy: 0.9752
Epoch: 4 | Accuracy: 0.972 / 0.976 | loss: 0.09295 / 0.09127
Best weights updated. Old accuracy: 0.9752. New accuracy: 0.9762
Epoch: 5 | Accuracy: 0.974 / 0.977 | loss: 0.08666 / 0.08467
Best weights updated. Old accuracy: 0.9762. New accuracy: 0.9772
Epoch: 6 | Accuracy: 0.976 / 0.977 | loss: 0.08243 / 0.08312
Best weights updated. Old accuracy: 0.9772. New accuracy: 0.9772
Epoch: 7 | Accuracy: 0.977 / 0.979 | loss: 0.07981 / 0.08914
Best weights updated. Old accuracy: 0.9772. New accuracy: 0.9787
Epoch: 8 | Accuracy: 0.977 / 0.981 | loss: 0.07876 / 0.08224
Best weights updated. Old accuracy: 0.9787. New accuracy: 0.9807
Epoch: 9 | Accuracy: 0.978 / 0.979 | loss: 0.07692 / 0.08362
Epoch: 10 | Accuracy: 0.979 / 0.979 | loss: 0.07478 / 0.07739
Epoch: 11 | Accuracy: 0.980 / 0.980 | loss: 0.07375 / 0.07708
Epoch: 12 | Accuracy: 0.980 / 0.980 | loss: 0.07253 / 0.07613
Epoch: 13 | Accuracy: 0.981 / 0.979 | loss: 0.07119 / 0.07788
Epoch: 14 | Accuracy: 0.982 / 0.982 | loss: 0.07148 / 0.07483
Best weights updated. Old accuracy: 0.9807. New accuracy: 0.9816
Epoch: 15 | Accuracy: 0.982 / 0.981 | loss: 0.06973 / 0.07474
Epoch: 16 | Accuracy: 0.981 / 0.982 | loss: 0.06900 / 0.07401
Best weights updated. Old accuracy: 0.9816. New accuracy: 0.9821
Epoch: 17 | Accuracy: 0.982 / 0.979 | loss: 0.06850 / 0.08130
Epoch: 18 | Accuracy: 0.982 / 0.980 | loss: 0.06796 / 0.07966
Epoch: 19 | Accuracy: 0.982 / 0.981 | loss: 0.06714 / 0.07458
Finished training
##########
Model accuracy: 98.21%
acc_train, _ = evaluate(model, dataloader_train)
acc_validation, _ = evaluate(model, dataloader_val)
print(f'Accuracy - Train: {acc_train:.4f} | Validation: {acc_validation:.4f}')
Accuracy - Train: 0.9816 | Validation: 0.9816
val_preds, val_y, _ = predict(model, dataloader_val)
print(val_preds.shape, val_y.shape)
binary_confusion_matrix = torchmetrics.classification.ConfusionMatrix('binary')
cm = binary_confusion_matrix(val_preds, val_y)
print(cm)
df_cm = pd.DataFrame(cm)
plt.figure(figsize=(6, 5), facecolor=(0.0,1.0,0.0))
sn.heatmap(df_cm, annot=True, fmt='d')
plt.show()
val_preds, val_y, val_x = predict(model, dataloader_val)
show_dataset(val_x.numpy(), postprocessing(val_y).numpy(), 'Ground Truth')
show_dataset(val_x.numpy(), postprocessing(val_preds).numpy(), 'Predictions')
Binary classification is a foundation for many deep learning tasks. For binary classification, you need to use sigmoid activation and binary cross-entropy loss. If you understand how these two functions work you will be able to understand not only classification NN models but more complicated NN architectures.