我的上一篇文章阐述了分类问题,并将其分为 3 种类型(二元、多类和多标签),并回答了“需要使用哪些激活函数和损失函数来解决二元分类任务?”的问题。
在这篇文章中,我将回答相同的问题,但针对多类分类任务,并为您提供
您需要使用哪些激活函数和损失函数来解决多类分类任务?
提供的代码主要基于二元分类实现,因为您只需对代码和神经网络进行很少的修改即可从二元分类切换到多类。修改后的代码块标有(已更改)以便于导航。
正如后面将要展示的,用于多类分类的激活函数是softmax激活。 Softmax 广泛应用于多类分类之外的不同神经网络架构中。例如,softmax 是 Transformer 模型中使用的多头注意力模块的核心(请参阅“注意力就是您所需要的”),因为它能够将输入值转换为概率分布(请参阅稍后的更多内容)。
如果您知道应用 softmax 激活和 CE 损失来解决多类分类问题背后的动机,您将能够理解和实现更复杂的 NN 架构和损失函数。
多类分类问题可以表示为一组样本{(x_1, y_1), (x_2, y_2),...,(x_n, y_n)} ,其中x_i是包含样本特征的 m 维向量i和y_i是x_i所属的类。其中标签y_i可以采用k个值之一,其中 k 是大于 2 的类数。目标是构建一个模型来预测每个输入样本x_i的标签 y_i 。
可以视为多类分类问题的任务示例:
在多类分类中,您将得到:
一组样本{(x_1, y_1), (x_2, y_2),...,(x_n, y_n)}
x_i是一个m维向量,包含样本i的特征
y_i是x_i所属的类,并且可以采用k个值之一,其中k>2是类的数量。
要构建多类分类神经网络作为概率分类器,我们需要:
神经网络的最后一个线性层输出“原始输出值”向量。在分类的情况下,输出值表示模型对输入属于k个类别之一的置信度。正如之前所讨论的,输出层需要具有大小k并且输出值应表示 k 个类别中每个类别的概率p_i且SUM(p_i)=1 。
关于二元分类的文章使用 sigmoid 激活将 NN 输出值转换为概率。让我们尝试对 [-3, 3] 范围内的k 个输出值应用 sigmoid,看看 sigmoid 是否满足前面列出的要求:
k输出值应在 (0,1) 范围内,其中k是类别数
k 个输出值的总和应等于 1
上一篇文章展示了 sigmoid 函数将输入值映射到范围 (0,1) 中。我们来看看 sigmoid 激活是否满足第二个要求。在下面的示例表中,我使用 sigmoid 激活处理了大小为k (k=7) 的向量,并对所有这些值求和 - 这 7 个值的总和等于 3.5。解决这个问题的一个简单方法是将所有k值除以它们的总和。
输入 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 和 |
---|---|---|---|---|---|---|---|---|
乙状结肠输出 | 0.04743 | 0.11920 | 0.26894 | 0.50000 | 0.73106 | 0.88080 | 0.95257 | 3.5000 |
另一种方法是获取输入值的指数并将其除以所有输入值的指数之和:
softmax 函数将实数向量转换为概率向量。结果中的每个概率都在 (0,1) 范围内,概率之和为 1。
输入 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 和 |
---|---|---|---|---|---|---|---|---|
软最大 | 0.00157 | 0.00426 | 0.01159 | 0.03150 | 0.08563 | 0.23276 | 0.63270 | 1 |
使用 softmax 时需要注意一件事:输出值p_i取决于输入数组中的所有值,因为我们将其除以所有值的指数总和。下表演示了这一点:两个输入向量有 3 个公共值 {1, 3, 4},但输出 softmax 值不同,因为第二个元素不同(2 和 4)。
输入1 | 1 | 2 | 3 | 4 |
---|---|---|---|---|
软最大1 | 0.0321 | 0.0871 | 0.2369 | 0.6439 |
输入2 | 1 | 4 | 3 | 4 |
软最大2 | 0.0206 | 0.4136 | 0.1522 | 0.4136 |
二元交叉熵损失定义为:
在二元分类中,有两个输出概率p_i和(1-p_i)以及真实值y_i和(1-y_i)。
多类分类问题使用 N 个类的 BCE 损失的推广:交叉熵损失。
N 是输入样本的数量, y_i是真实情况, p_i是类别i的预测概率。
为了实现概率多类分类神经网络,我们需要:
代码的大部分部分都是基于上一篇关于二元分类的文章中的代码。
更改的部分标有(Changed) :
让我们使用 PyTorch 框架编写一个用于多类分类的神经网络。
首先,安装
# used for accuracy metric and confusion matrix !pip install torchmetrics
导入稍后将在代码中使用的包
from sklearn.datasets import make_classification import numpy as np import torch import torchmetrics import matplotlib.pyplot as plt import seaborn as sn import pandas as pd from sklearn.decomposition import PCA
设置全局变量与类的数量(如果将其设置为 2 并获得使用 softmax 和交叉熵损失的二元分类神经网络)
number_of_classes=4
我会用
n_samples - 是生成的样本数
n_features - 设置生成样本X的维度数
n_classes - 生成的数据集中的类数。在多类分类问题中,应该有2个以上的类
生成的数据集将具有形状为[n_samples, n_features] 的X 和形状为[n_samples, ]的 Y。
def get_dataset(n_samples=10000, n_features=20, n_classes=2): # https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification data_X, data_y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes, n_informative=n_classes, n_redundant=0, n_clusters_per_class=2, random_state=42, class_sep=4) return data_X, data_y
定义函数来可视化和打印数据集统计数据。 show_dataset函数使用
def print_dataset(X, y): print(f'X shape: {X.shape}, min: {X.min()}, max: {X.max()}') print(f'y shape: {y.shape}') print(y[:10]) def show_dataset(X, y, title=''): if X.shape[1] > 2: X_pca = PCA(n_components=2).fit_transform(X) else: X_pca = X fig = plt.figure(figsize=(4, 4)) plt.scatter(x=X_pca[:, 0], y=X_pca[:, 1], c=y, alpha=0.5) # generate colors for all classes colors = plt.cm.rainbow(np.linspace(0, 1, number_of_classes)) # iterate over classes and visualize them with the dedicated color for class_id in range(number_of_classes): class_mask = np.argwhere(y == class_id) X_class = X_pca[class_mask[:, 0]] plt.scatter(x=X_class[:, 0], y=X_class[:, 1], c=np.full((X_class[:, 0].shape[0], 4), colors[class_id]), label=class_id, alpha=0.5) plt.title(title) plt.legend(loc="best", title="Classes") plt.xticks() plt.yticks() plt.show()
使用最小最大缩放器将数据集特征 X 缩放到范围 [0,1]。这样做通常是为了更快、更稳定的训练。
def scale(x_in): return (x_in - x_in.min(axis=0))/(x_in.max(axis=0)-x_in.min(axis=0))
让我们打印生成的数据集统计信息,并使用上面的函数将其可视化。
X, y = get_dataset(n_classes=number_of_classes) print('before scaling') print_dataset(X, y) show_dataset(X, y, 'before') X_scaled = scale(X) print('after scaling') print_dataset(X_scaled, y) show_dataset(X_scaled, y, 'after')
您应该得到的输出如下。
before scaling X shape: (10000, 20), min: -9.549551632357336, max: 9.727761741276673 y shape: (10000,) [0 2 1 2 0 2 0 1 1 2]
after scaling X shape: (10000, 20), min: 0.0, max: 1.0 y shape: (10000,) [0 2 1 2 0 2 0 1 1 2]
最小-最大缩放不会扭曲数据集特征,它将它们线性变换到范围 [0,1] 中。与上图相比,“最小-最大缩放后的数据集”图似乎失真,因为 PCA 算法将 20 个维度减少为 2,并且 PCA 算法可能会受到最小-最大缩放的影响。
创建 PyTorch 数据加载器。
def get_data_loaders(dataset, batch_size=32, shuffle=True): data_X, data_y = dataset # https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset torch_dataset = torch.utils.data.TensorDataset(torch.tensor(data_X, dtype=torch.float32), torch.tensor(data_y, dtype=torch.float32)) # https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split train_dataset, val_dataset = torch.utils.data.random_split(torch_dataset, [int(len(torch_dataset)*0.8), int(len(torch_dataset)*0.2)], torch.Generator().manual_seed(42)) # https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader loader_train = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle) loader_val = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=shuffle) return loader_train, loader_val
测试 PyTorch 数据加载器
dataloader_train, dataloader_val = get_data_loaders(get_dataset(n_classes=number_of_classes), batch_size=32) train_batch_0 = next(iter(dataloader_train)) print(f'Batches in the train dataloader: {len(dataloader_train)}, X: {train_batch_0[0].shape}, Y: {train_batch_0[1].shape}') val_batch_0 = next(iter(dataloader_val)) print(f'Batches in the validation dataloader: {len(dataloader_val)}, X: {val_batch_0[0].shape}, Y: {val_batch_0[1].shape}')
输出:
Batches in the train dataloader: 250, X: torch.Size([32, 20]), Y: torch.Size([32]) Batches in the validation dataloader: 63, X: torch.Size([32, 20]), Y: torch.Size([32])
创建预处理和后处理函数。正如您之前可能已经注意到的,当前 Y 形状是 [N],我们需要它是 [N,number_of_classes]。为此,我们需要对 Y 向量中的值进行 one-hot 编码。
One-hot 编码是将类索引转换为二进制表示的过程,其中每个类由唯一的二进制向量表示。
换句话说:创建一个大小为 [number_of_classes] 的零向量,并将位置 class_id 处的元素设置为 1,其中 class_ids {0,1,…,number_of_classes-1}:
0 >> [1。 0.0.0.]
1 >> [0。 1.0.0.]
2 >> [0。 0.1.0.]
2 >> [0。 0.0.1.]
Pytorch 张量可以使用 torch.nn.function.one_hot 进行处理,并且 numpy 实现非常简单。输出向量的形状为 [N,number_of_classes]。
def preprocessing(y, n_classes): ''' one-hot encoding for input numpy array or pytorch Tensor input: y - [N,] numpy array or pytorch Tensor output: [N, n_classes] the same type as input ''' assert type(y)==np.ndarray or torch.is_tensor(y), f'input should be numpy array or torch tensor. Received input is: {type(categorical)}' assert len(y.shape)==1, f'input shape should be [N,]. Received input shape is: {y.shape}' if torch.is_tensor(y): return torch.nn.functional.one_hot(y, num_classes=n_classes) else: categorical = np.zeros([y.shape[0], n_classes]) categorical[np.arange(y.shape[0]), y]=1 return categorical
要将 one-hot 编码向量转换回类 id,我们需要找到 one-hot 编码向量中最大元素的索引。可以使用下面的 torch.argmax 或 np.argmax 来完成。
def postprocessing(categorical): ''' one-hot to classes decoding with .argmax() input: categorical - [N,classes] numpy array or pytorch Tensor output: [N,] the same type as input ''' assert type(categorical)==np.ndarray or torch.is_tensor(categorical), f'input should be numpy array or torch tensor. Received input is: {type(categorical)}' assert len(categorical.shape)==2, f'input shape should be [N,classes]. Received input shape is: {categorical.shape}' if torch.is_tensor(categorical): return torch.argmax(categorical,dim=1) else: return np.argmax(categorical, axis=1)
测试定义的预处理和后处理函数。
y = get_dataset(n_classes=number_of_classes)[1] y_logits = preprocessing(y, n_classes=number_of_classes) y_class = postprocessing(y_logits) print(f'y shape: {y.shape}, y preprocessed shape: {y_logits.shape}, y postprocessed shape: {y_class.shape}') print('Preprocessing does one-hot encoding of class ids.') print('Postprocessing does one-hot decoding of class one-hot encoded class ids.') for i in range(10): print(f'{y[i]} >> {y_logits[i]} >> {y_class[i]}')
输出:
y shape: (10000,), y preprocessed shape: (10000, 4), y postprocessed shape: (10000,) Preprocessing does one-hot encoding of class ids. Postprocessing does one-hot decoding of one-hot encoded class ids. id>>one-hot encoding>>id 0 >> [1. 0. 0. 0.] >> 0 2 >> [0. 0. 1. 0.] >> 2 1 >> [0. 1. 0. 0.] >> 1 2 >> [0. 0. 1. 0.] >> 2 0 >> [1. 0. 0. 0.] >> 0 2 >> [0. 0. 1. 0.] >> 2 0 >> [1. 0. 0. 0.] >> 0 1 >> [0. 1. 0. 0.] >> 1 1 >> [0. 1. 0. 0.] >> 1 2 >> [0. 0. 1. 0.] >> 2
本节展示了训练二元分类模型所需的所有函数的实现。
基于 PyTorch 的 softmax 公式实现
def softmax(x): assert len(x.shape)==2, f'input shape should be [N,classes]. Received input shape is: {x.shape}' # Subtract the maximum value for numerical stability # you can find explanation here: https://www.deeplearningbook.org/contents/numerical.html x = x - torch.max(x, dim=1, keepdim=True)[0] # Exponentiate the values exp_x = torch.exp(x) # Sum along the specified dimension sum_exp_x = torch.sum(exp_x, dim=1, keepdim=True) # Compute the softmax return exp_x / sum_exp_x
让我们测试一下softmax:
使用步骤 1 生成 [-10, 11] 范围内的test_input numpy 数组
将其重塑为形状为 [7,3] 的张量
使用已实现的softmax函数和 PyTorch 默认实现torch.nn.function.softmax处理test_input
比较结果(它们应该相同)
输出所有七个 [1,3] 张量的 softmax 值和总和
test_input = torch.arange(-10, 11, 1, dtype=torch.float32) test_input = test_input.reshape(-1,3) softmax_output = softmax(test_input) print(f'Input data shape: {test_input.shape}') print(f'input data range: [{test_input.min():.3f}, {test_input.max():.3f}]') print(f'softmax output data range: [{softmax_output.min():.3f}, {softmax_output.max():.3f}]') print(f'softmax output data sum along axis 1: [{softmax_output.sum(axis=1).numpy()}]') softmax_output_pytorch = torch.nn.functional.softmax(test_input, dim=1) print(f'softmax output is the same with pytorch implementation: {(softmax_output_pytorch==softmax_output).all().numpy()}') print('Softmax activation changes values in the chosen axis (1) so that they always sum up to 1:') for i in range(softmax_output.shape[0]): print(f'\t{i}. Sum before softmax: {test_input[i].sum().numpy()} | Sum after softmax: {softmax_output[i].sum().numpy()}') print(f'\t values before softmax: {test_input[i].numpy()}, softmax output values: {softmax_output[i].numpy()}')
输出:
Input data shape: torch.Size([7, 3]) input data range: [-10.000, 10.000] softmax output data range: [0.090, 0.665] softmax output data sum along axis 1: [[1. 1. 1. 1. 1. 1. 1.]] softmax output is the same with pytorch implementation: True Softmax activation changes values in the chosen axis (1) so that they always sum up to 1: 0. Sum before softmax: -27.0 | Sum after softmax: 1.0 values before softmax: [-10. -9. -8.], softmax output values: [0.09003057 0.24472848 0.66524094] 1. Sum before softmax: -18.0 | Sum after softmax: 1.0 values before softmax: [-7. -6. -5.], softmax output values: [0.09003057 0.24472848 0.66524094] 2. Sum before softmax: -9.0 | Sum after softmax: 1.0 values before softmax: [-4. -3. -2.], softmax output values: [0.09003057 0.24472848 0.66524094] 3. Sum before softmax: 0.0 | Sum after softmax: 1.0 values before softmax: [-1. 0. 1.], softmax output values: [0.09003057 0.24472848 0.66524094] 4. Sum before softmax: 9.0 | Sum after softmax: 1.0 values before softmax: [2. 3. 4.], softmax output values: [0.09003057 0.24472848 0.66524094] 5. Sum before softmax: 18.0 | Sum after softmax: 1.0 values before softmax: [5. 6. 7.], softmax output values: [0.09003057 0.24472848 0.66524094] 6. Sum before softmax: 27.0 | Sum after softmax: 1.0 values before softmax: [ 8. 9. 10.], softmax output values: [0.09003057 0.24472848 0.66524094]
CE公式基于PyTorch的实现
def cross_entropy_loss(softmax_logits, labels): # Calculate the cross-entropy loss loss = -torch.sum(labels * torch.log(softmax_logits)) / softmax_logits.size(0) return loss
测试CE实施:
生成形状为 [10,5] 且值在 [0,1) 范围内的test_input数组
生成形状为 [10,] 且值在 [0,4] 范围内的test_target数组。
独热编码test_target数组
使用已实现的cross_entropy函数和 PyTorch 实现计算损失
比较结果(它们应该相同)
test_input = torch.rand(10, 5, requires_grad=False) test_target = torch.randint(0, 5, (10,), requires_grad=False) test_target = preprocessing(test_target, n_classes=5).float() print(f'test_input shape: {list(test_input.shape)}, test_target shape: {list(test_target.shape)}') # get loss with the cross_entropy_loss implementation loss = cross_entropy_loss(softmax(test_input), test_target) # get loss with the torch.nn.functional.cross_entropy implementation # !!!torch.nn.functional.cross_entropy applies softmax on input logits # !!!pass it test_input without softmax activation loss_pytorch = torch.nn.functional.cross_entropy(test_input, test_target) print(f'Loss outputs are the same: {(loss==loss_pytorch).numpy()}')
预期输出:
test_input shape: [10, 5], test_target shape: [10, 5] Loss outputs are the same: True
我会用
要创建多类分类准确度度量,需要两个参数:
任务类型“多类别”
类数 num_classes
# https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html#module-interface accuracy_metric=torchmetrics.classification.Accuracy(task="multiclass", num_classes=number_of_classes) def compute_accuracy(y_pred, y): assert len(y_pred.shape)==2 and y_pred.shape[1] == number_of_classes, 'y_pred shape should be [N, C]' assert len(y.shape)==2 and y.shape[1] == number_of_classes, 'y shape should be [N, C]' return accuracy_metric(postprocessing(y_pred), postprocessing(y))
本示例中使用的神经网络是具有 2 个隐藏层的深度神经网络。输入层和隐藏层使用 ReLU 激活,最后一层使用作为类输入提供的激活函数(它将是之前实现的 sigmoid 激活函数)。
class ClassifierNN(torch.nn.Module): def __init__(self, loss_function, activation_function, input_dims=2, output_dims=1): super().__init__() self.linear1 = torch.nn.Linear(input_dims, input_dims * 4) self.linear2 = torch.nn.Linear(input_dims * 4, input_dims * 8) self.linear3 = torch.nn.Linear(input_dims * 8, input_dims * 4) self.output = torch.nn.Linear(input_dims * 4, output_dims) self.loss_function = loss_function self.activation_function = activation_function def forward(self, x): x = torch.nn.functional.relu(self.linear1(x)) x = torch.nn.functional.relu(self.linear2(x)) x = torch.nn.functional.relu(self.linear3(x)) x = self.activation_function(self.output(x)) return x
上图描述了单批次的训练逻辑。稍后,train_epoch 函数将被多次调用(选择的纪元数)。
def train_epoch(model, optimizer, dataloader_train): # set the model to the training mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train model.train() losses = [] accuracies = [] for step, (X_batch, y_batch) in enumerate(dataloader_train): ### forward propagation # get model output and use loss function y_pred = model(X_batch) # get class probabilities with shape [N,1] # apply loss function on predicted probabilities and ground truth loss = model.loss_function(y_pred, y_batch) ### backward propagation # set gradients to zero before backpropagation # https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html optimizer.zero_grad() # compute gradients # https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html loss.backward() # update weights # https://pytorch.org/docs/stable/optim.html#taking-an-optimization-step optimizer.step() # update model weights # calculate batch accuracy acc = compute_accuracy(y_pred, y_batch) # append batch loss and accuracy to corresponding lists for later use accuracies.append(acc) losses.append(float(loss.detach().numpy())) # compute average epoch accuracy train_acc = np.array(accuracies).mean() # compute average epoch loss loss_epoch = np.array(losses).mean() return train_acc, loss_epoch
评估函数迭代提供的 PyTorch 数据加载器,计算当前模型精度并返回平均损失和平均精度。
def evaluate(model, dataloader_in): # set the model to the evaluation mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval model.eval() val_acc_epoch = 0 losses = [] accuracies = [] # disable gradient calculation for evaluation # https://pytorch.org/docs/stable/generated/torch.no_grad.html with torch.no_grad(): for step, (X_batch, y_batch) in enumerate(dataloader_in): # get predictions y_pred = model(X_batch) # calculate loss loss = model.loss_function(y_pred, y_batch) # calculate batch accuracy acc = compute_accuracy(y_pred, y_batch) accuracies.append(acc) losses.append(float(loss.detach().numpy())) # compute average accuracy val_acc = np.array(accuracies).mean() # compute average loss loss_epoch = np.array(losses).mean() return val_acc, loss_epoch
预测函数迭代提供的数据加载器,将后处理(单热解码)模型预测和地面实况值收集到 [N,1] PyTorch 数组中,并返回两个数组。稍后该函数将用于计算混淆矩阵并可视化预测。
def predict(model, dataloader): # set the model to the evaluation mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval model.eval() xs, ys = next(iter(dataloader)) y_pred = torch.empty([0, ys.shape[1]]) x = torch.empty([0, xs.shape[1]]) y = torch.empty([0, ys.shape[1]]) # disable gradient calculation for evaluation # https://pytorch.org/docs/stable/generated/torch.no_grad.html with torch.no_grad(): for step, (X_batch, y_batch) in enumerate(dataloader): # get predictions y_batch_pred = model(X_batch) y_pred = torch.cat([y_pred, y_batch_pred]) y = torch.cat([y, y_batch]) x = torch.cat([x, X_batch]) # print(y_pred.shape, y.shape) y_pred = postprocessing(y_pred) y = postprocessing(y) return y_pred, y, x
为了训练模型,我们只需要调用train_epoch函数 N 次,其中 N 是纪元数。调用评估函数来记录验证数据集上的当前模型准确性。最后,根据验证准确性更新最佳模型。 model_train函数返回最佳验证精度和训练历史记录。
def model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=50): best_acc = 0 best_weights = None history = {'loss': {'train': [], 'validation': []}, 'accuracy': {'train': [], 'validation': []}} for epoch in range(n_epochs): # train on dataloader_train acc_train, loss_train = train_epoch(model, optimizer, dataloader_train) # evaluate on dataloader_val acc_val, loss_val = evaluate(model, dataloader_val) print(f'Epoch: {epoch} | Accuracy: {acc_train:.3f} / {acc_val:.3f} | ' + f'loss: {loss_train:.5f} / {loss_val:.5f}') # save epoch losses and accuracies in history dictionary history['loss']['train'].append(loss_train) history['loss']['validation'].append(loss_val) history['accuracy']['train'].append(acc_train) history['accuracy']['validation'].append(acc_val) # Save the best validation accuracy model if acc_val >= best_acc: print(f'\tBest weights updated. Old accuracy: {best_acc:.4f}. New accuracy: {acc_val:.4f}') best_acc = acc_val torch.save(model.state_dict(), 'best_weights.pt') # restore model and return best accuracy model.load_state_dict(torch.load('best_weights.pt')) return best_acc, history
让我们将所有内容放在一起并训练多类分类模型。
######################################### # Get the dataset X, y = get_dataset(n_classes=number_of_classes) print(f'Generated dataset shape. X:{X.shape}, y:{y.shape}') # change y numpy array shape from [N,] to [N, C] for multi-class classification y = preprocessing(y, n_classes=number_of_classes) print(f'Dataset shape prepared for multi-class classification with softmax activation and CE loss.') print(f'X:{X.shape}, y:{y.shape}') # Get train and validation datal loaders dataloader_train, dataloader_val = get_data_loaders(dataset=(scale(X), y), batch_size=32) # get a batch from dataloader and output intput and output shape X_0, y_0 = next(iter(dataloader_train)) print(f'Model input data shape: {X_0.shape}, output (ground truth) data shape: {y_0.shape}') ######################################### # Create ClassifierNN for multi-class classification problem # input dims: [N, features] # output dims: [N, C] where C is number of classes # activation - softmax to output [,C] probabilities so that their sum(p_1,p_2,...,p_c)=1 # loss - cross-entropy model = ClassifierNN(loss_function=cross_entropy_loss, activation_function=softmax, input_dims=X.shape[1], output_dims=y.shape[1]) ######################################### # create optimizer and train the model on the dataset optimizer = torch.optim.Adam(model.parameters(), lr=0.001) print(f'Model size: {sum([x.reshape(-1).shape[0] for x in model.parameters()])} parameters') print('#'*10) print('Start training') acc, history = model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=20) print('Finished training') print('#'*10) print("Model accuracy: %.2f%%" % (acc*100))
预期输出应类似于下面提供的输出。
Generated dataset shape. X:(10000, 20), y:(10000,) Dataset shape prepared for multi-class classification with softmax activation and CE loss. X:(10000, 20), y:(10000, 4) Model input data shape: torch.Size([32, 20]), output (ground truth) data shape: torch.Size([32, 4]) Model size: 27844 parameters ########## Start training Epoch: 0 | Accuracy: 0.682 / 0.943 | loss: 0.78574 / 0.37459 Best weights updated. Old accuracy: 0.0000. New accuracy: 0.9435 Epoch: 1 | Accuracy: 0.960 / 0.967 | loss: 0.20272 / 0.17840 Best weights updated. Old accuracy: 0.9435. New accuracy: 0.9668 Epoch: 2 | Accuracy: 0.978 / 0.962 | loss: 0.12004 / 0.17931 Epoch: 3 | Accuracy: 0.984 / 0.979 | loss: 0.10028 / 0.13246 Best weights updated. Old accuracy: 0.9668. New accuracy: 0.9787 Epoch: 4 | Accuracy: 0.985 / 0.981 | loss: 0.08838 / 0.12720 Best weights updated. Old accuracy: 0.9787. New accuracy: 0.9807 Epoch: 5 | Accuracy: 0.986 / 0.981 | loss: 0.08096 / 0.12174 Best weights updated. Old accuracy: 0.9807. New accuracy: 0.9812 Epoch: 6 | Accuracy: 0.986 / 0.981 | loss: 0.07944 / 0.12036 Epoch: 7 | Accuracy: 0.988 / 0.982 | loss: 0.07605 / 0.11773 Best weights updated. Old accuracy: 0.9812. New accuracy: 0.9821 Epoch: 8 | Accuracy: 0.989 / 0.982 | loss: 0.07168 / 0.11514 Best weights updated. Old accuracy: 0.9821. New accuracy: 0.9821 Epoch: 9 | Accuracy: 0.989 / 0.983 | loss: 0.06890 / 0.11409 Best weights updated. Old accuracy: 0.9821. New accuracy: 0.9831 Epoch: 10 | Accuracy: 0.989 / 0.984 | loss: 0.06750 / 0.11128 Best weights updated. Old accuracy: 0.9831. New accuracy: 0.9841 Epoch: 11 | Accuracy: 0.990 / 0.982 | loss: 0.06505 / 0.11265 Epoch: 12 | Accuracy: 0.990 / 0.983 | loss: 0.06507 / 0.11272 Epoch: 13 | Accuracy: 0.991 / 0.985 | loss: 0.06209 / 0.11240 Best weights updated. Old accuracy: 0.9841. New accuracy: 0.9851 Epoch: 14 | Accuracy: 0.990 / 0.984 | loss: 0.06273 / 0.11157 Epoch: 15 | Accuracy: 0.991 / 0.984 | loss: 0.05998 / 0.11029 Epoch: 16 | Accuracy: 0.990 / 0.985 | loss: 0.06056 / 0.11164 Epoch: 17 | Accuracy: 0.991 / 0.984 | loss: 0.05981 / 0.11096 Epoch: 18 | Accuracy: 0.991 / 0.985 | loss: 0.05642 / 0.10975 Best weights updated. Old accuracy: 0.9851. New accuracy: 0.9851 Epoch: 19 | Accuracy: 0.990 / 0.986 | loss: 0.05929 / 0.10821 Best weights updated. Old accuracy: 0.9851. New accuracy: 0.9856 Finished training ########## Model accuracy: 98.56%
def plot_history(history): fig = plt.figure(figsize=(8, 4), facecolor=(0.0, 1.0, 0.0)) ax = fig.add_subplot(1, 2, 1) ax.plot(np.arange(0, len(history['loss']['train'])), history['loss']['train'], color='red', label='train') ax.plot(np.arange(0, len(history['loss']['validation'])), history['loss']['validation'], color='blue', label='validation') ax.set_title('Loss history') ax.set_facecolor((0.0, 1.0, 0.0)) ax.legend() ax = fig.add_subplot(1, 2, 2) ax.plot(np.arange(0, len(history['accuracy']['train'])), history['accuracy']['train'], color='red', label='train') ax.plot(np.arange(0, len(history['accuracy']['validation'])), history['accuracy']['validation'], color='blue', label='validation') ax.set_title('Accuracy history') ax.legend() fig.tight_layout() ax.set_facecolor((0.0, 1.0, 0.0)) fig.show()
acc_train, _ = evaluate(model, dataloader_train) acc_validation, _ = evaluate(model, dataloader_val) print(f'Accuracy - Train: {acc_train:.4f} | Validation: {acc_validation:.4f}')
Accuracy - Train: 0.9901 | Validation: 0.9851
val_preds, val_y, _ = predict(model, dataloader_val) print(val_preds.shape, val_y.shape) multiclass_confusion_matrix = torchmetrics.classification.ConfusionMatrix('multiclass', num_classes=number_of_classes) cm = multiclass_confusion_matrix(val_preds, val_y) print(cm) df_cm = pd.DataFrame(cm) plt.figure(figsize = (6,5), facecolor=(0.0,1.0,0.0)) sn.heatmap(df_cm, annot=True, fmt='d') plt.show()
val_preds, val_y, val_x = predict(model, dataloader_val) val_preds, val_y, val_x = val_preds.numpy(), val_y.numpy(), val_x.numpy() show_dataset(val_x, val_y,'Ground Truth') show_dataset(val_x, val_preds, 'Predictions')
对于多类分类,需要使用softmax激活和交叉熵损失。从二元分类切换到多类分类需要进行一些代码修改:数据预处理和后处理、激活和损失函数。此外,您可以通过使用 one-hot 编码、softmax 和交叉熵损失将类别数设置为 2 来解决二元分类问题。