多类分类：了解神经网络中的激活函数和损失函数

我的 阐述了分类问题，并将其分为 3 种类型（二元、多类和多标签），并回答了“需要使用哪些激活函数和损失函数来解决二元分类任务？”的问题。 上一篇文章 在这篇文章中，我将回答相同的问题，但针对多类分类任务，并为您提供 。 Google colab 中 pytorch 实现的示例 您需要使用哪些激活函数和损失函数来解决多类分类任务？ 提供的代码主要基于二元分类实现，因为您只需对代码和神经网络进行很少的修改即可从二元分类切换到多类。修改后的代码块标有 以便于导航。 （已更改） 1 为什么理解用于多类分类的激活函数和损失很重要？ 正如后面将要展示的，用于多类分类的激活函数是softmax激活。 Softmax 广泛应用于多类分类之外的不同神经网络架构中。例如，softmax 是 Transformer 模型中使用的多头注意力模块的核心（请参阅 ”），因为它能够将输入值转换为概率分布（请参阅稍后的更多内容）。 “注意力就是您所需要的 如果您知道应用 softmax 激活和 CE 损失来解决多类分类问题背后的动机，您将能够理解和实现更复杂的 NN 架构和损失函数。 2 多类分类问题表述 多类分类问题可以表示为一组样本 ，其中 是包含样本特征的 m 维向量 和 是 所属的类。其中标签 可以采用 个值之一，其中 k 是大于 2 的类数。目标是构建一个模型来预测每个输入样本 的标签 y_i 。 {(x_1, y_1), (x_2, y_2),...,(x_n, y_n)} x_i i y_i x_i y_i k x_i 可以视为多类分类问题的任务示例： 医学诊断 - 根据提供的数据（病史、检查结果、症状）诊断患有多种疾病之一的患者 产品分类-电商平台产品自动分类 天气预报 - 将未来天气分类为晴、阴、雨等 将电影、音乐和文章分类为不同的类型 将在线客户评论分为产品反馈、服务反馈、投诉等类别 3 多类分类的激活函数和损失函数 在多类分类中，您将得到： 一组样本 {(x_1, y_1), (x_2, y_2),...,(x_n, y_n)} 是一个m维向量，包含样本 的特征 x_i i 是 所属的类，并且可以采用 个值之一，其中 是类的数量。 y_i x_i k k>2 要构建多类分类神经网络作为概率分类器，我们需要： 大小为 的输出全连接层 k 输出值应在 [0,1] 范围内 输出值的总和应等于 1。在多类分类中，每个输入 只能属于一个类（互斥类），因此所有类的概率之和应为 1： x . SUM(p_0,…,p_k )=1 当预测和真实值相同时具有最低值的损失函数 3.1 softmax激活函数 神经网络的最后一个线性层输出“原始输出值”向量。在分类的情况下，输出值表示模型对输入属于 个类别之一的置信度。正如之前所讨论的，输出层需要具有大小 并且输出值应表示 k 个类别中每个类别的概率 且 。 k k p_i SUM(p_i)=1 关于 文章使用 sigmoid 激活将 NN 输出值转换为概率。让我们尝试对 [-3, 3] 范围内的 输出值应用 sigmoid，看看 sigmoid 是否满足前面列出的要求： 二元分类的 k 个 输出值应在 (0,1) 范围内，其中 是类别数 k k 输出值的总和应等于 1 k 个 上一篇文章展示了 sigmoid 函数将输入值映射到范围 (0,1) 中。我们来看看 sigmoid 激活是否满足第二个要求。在下面的示例表中，我使用 sigmoid 激活处理了大小为 (k=7) 的向量，并对所有这些值求和 - 这 7 个值的总和等于 3.5。解决这个问题的一个简单方法是将所有 值除以它们的总和。 k k 输入 -3 -2 -1 0 1 2 3 和 乙状结肠输出 0.04743 0.11920 0.26894 0.50000 0.73106 0.88080 0.95257 3.5000 另一种方法是获取输入值的指数并将其除以所有输入值的指数之和： softmax 函数将实数向量转换为概率向量。结果中的每个概率都在 (0,1) 范围内，概率之和为 1。 输入 -3 -2 -1 0 1 2 3 和 软最大 0.00157 0.00426 0.01159 0.03150 0.08563 0.23276 0.63270 1 使用 softmax 时需要注意一件事：输出值 取决于输入数组中的所有值，因为我们将其除以所有值的指数总和。下表演示了这一点：两个输入向量有 3 个公共值 {1, 3, 4}，但输出 softmax 值不同，因为第二个元素不同（2 和 4）。 p_i 输入1 1 2 3 4 软最大 1 0.0321 0.0871 0.2369 0.6439 输入2 1 4 3 4 软最大2 0.0206 0.4136 0.1522 0.4136 3.2 交叉熵损失 二元交叉熵损失定义为： 在二元分类中，有两个输出概率 和 以及真实值 和 p_i (1-p_i) y_i (1-y_i)。 多类分类问题使用 N 个类的 BCE 损失的推广：交叉熵损失。 N 是输入样本的数量， 是真实情况， 是类别 的预测概率。 y_i p_i i 4 使用 PyTorch 的多类分类 NN 示例 为了实现概率多类分类神经网络，我们需要： 真实情况和预测的维度应为 ，其中 是输入样本的数量， 是类的数量 - 类 id 需要编码为大小为 向量 [N,k] N k k 的 最终的线性层大小应该是 k 最后一层的输出应使用 激活进行处理以获得输出概率 softmax 损失应该应用于预测的类概率和地面真值 CE 从大小为 的输出向量中找到输出类 id k 代码的大部分部分都是基于上一篇关于二元分类的文章中的代码。 更改的部分标有 ： (Changed) 数据预处理和后处理 激活函数 损失函数 绩效指标 混淆矩阵 让我们使用 PyTorch 框架编写一个用于多类分类的神经网络。 首先，安装 - 该包稍后将用于计算分类精度和混淆矩阵。 火炬测量 # used for accuracy metric and confusion matrix !pip install torchmetrics 导入稍后将在代码中使用的包 from sklearn.datasets import make_classification import numpy as np import torch import torchmetrics import matplotlib.pyplot as plt import seaborn as sn import pandas as pd from sklearn.decomposition import PCA 4.1 创建数据集 设置全局变量与类的数量（如果将其设置为 2 并获得使用 softmax 和交叉熵损失的二元分类神经网络） number_of_classes=4 我会用 生成二元分类数据集： sklearn.datasets.make_classification - 是生成的样本数 n_samples - 设置生成样本X的维度数 n_features - 生成的数据集中的类数。在多类分类问题中，应该有2个以上的类 n_classes 生成的数据集将具有形状为 X 和形状为 的 Y。 [n_samples, n_features] 的 [n_samples, ] def get_dataset(n_samples=10000, n_features=20, n_classes=2): # https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn.datasets.make_classification data_X, data_y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes, n_informative=n_classes, n_redundant=0, n_clusters_per_class=2, random_state=42, class_sep=4) return data_X, data_y 4.2 数据集可视化 定义函数来可视化和打印数据集统计数据。 show_dataset函数使用 将 X 的维数从任意数字减少到 2，以简化二维图中输入数据 X 的可视化。 主成分分析 def print_dataset(X, y): print(f'X shape: {X.shape}, min: {X.min()}, max: {X.max()}') print(f'y shape: {y.shape}') print(y[:10]) def show_dataset(X, y, title=''): if X.shape[1] > 2: X_pca = PCA(n_components=2).fit_transform(X) else: X_pca = X fig = plt.figure(figsize=(4, 4)) plt.scatter(x=X_pca[:, 0], y=X_pca[:, 1], c=y, alpha=0.5) # generate colors for all classes colors = plt.cm.rainbow(np.linspace(0, 1, number_of_classes)) # iterate over classes and visualize them with the dedicated color for class_id in range(number_of_classes): class_mask = np.argwhere(y == class_id) X_class = X_pca[class_mask[:, 0]] plt.scatter(x=X_class[:, 0], y=X_class[:, 1], c=np.full((X_class[:, 0].shape[0], 4), colors[class_id]), label=class_id, alpha=0.5) plt.title(title) plt.legend(loc="best", title="Classes") plt.xticks() plt.yticks() plt.show() 4.3 数据集缩放器 使用最小最大缩放器将数据集特征 X 缩放到范围 [0,1]。这样做通常是为了更快、更稳定的训练。 def scale(x_in): return (x_in - x_in.min(axis=0))/(x_in.max(axis=0)-x_in.min(axis=0)) 让我们打印生成的数据集统计信息，并使用上面的函数将其可视化。 X, y = get_dataset(n_classes=number_of_classes) print('before scaling') print_dataset(X, y) show_dataset(X, y, 'before') X_scaled = scale(X) print('after scaling') print_dataset(X_scaled, y) show_dataset(X_scaled, y, 'after') 您应该得到的输出如下。 before scaling X shape: (10000, 20), min: -9.549551632357336, max: 9.727761741276673 y shape: (10000,) [0 2 1 2 0 2 0 1 1 2] after scaling X shape: (10000, 20), min: 0.0, max: 1.0 y shape: (10000,) [0 2 1 2 0 2 0 1 1 2] 最小-最大缩放不会扭曲数据集特征，它将它们线性变换到范围 [0,1] 中。与上图相比，“最小-最大缩放后的数据集”图似乎失真，因为 PCA 算法将 20 个维度减少为 2，并且 PCA 算法可能会受到最小-最大缩放的影响。 创建 PyTorch 数据加载器。 将数据集生成为两个 numpy 数组。要创建 PyTorch 数据加载器，我们需要使用 torch.utils.data.TensorDataset 将 numpy 数据集转换为 torch.tensor。 sklearn.datasets.make_classification def get_data_loaders(dataset, batch_size=32, shuffle=True): data_X, data_y = dataset # https://pytorch.org/docs/stable/data.html#torch.utils.data.TensorDataset torch_dataset = torch.utils.data.TensorDataset(torch.tensor(data_X, dtype=torch.float32), torch.tensor(data_y, dtype=torch.float32)) # https://pytorch.org/docs/stable/data.html#torch.utils.data.random_split train_dataset, val_dataset = torch.utils.data.random_split(torch_dataset, [int(len(torch_dataset)*0.8), int(len(torch_dataset)*0.2)], torch.Generator().manual_seed(42)) # https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader loader_train = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=shuffle) loader_val = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=shuffle) return loader_train, loader_val 测试 PyTorch 数据加载器 dataloader_train, dataloader_val = get_data_loaders(get_dataset(n_classes=number_of_classes), batch_size=32) train_batch_0 = next(iter(dataloader_train)) print(f'Batches in the train dataloader: {len(dataloader_train)}, X: {train_batch_0[0].shape}, Y: {train_batch_0[1].shape}') val_batch_0 = next(iter(dataloader_val)) print(f'Batches in the validation dataloader: {len(dataloader_val)}, X: {val_batch_0[0].shape}, Y: {val_batch_0[1].shape}') 输出： Batches in the train dataloader: 250, X: torch.Size([32, 20]), Y: torch.Size([32]) Batches in the validation dataloader: 63, X: torch.Size([32, 20]), Y: torch.Size([32]) 4.4 数据集预处理和后处理（已更改） 创建预处理和后处理函数。正如您之前可能已经注意到的，当前 Y 形状是 [N]，我们需要它是 [N,number_of_classes]。为此，我们需要对 Y 向量中的值进行 one-hot 编码。 One-hot 编码是将类索引转换为二进制表示的过程，其中每个类由唯一的二进制向量表示。 换句话说：创建一个大小为 [number_of_classes] 的零向量，并将位置 class_id 处的元素设置为 1，其中 class_ids {0,1,…,number_of_classes-1}： 0 >> [1。 0.0.0.] 1 >> [0。 1.0.0.] 2 >> [0。 0.1.0.] 2 >> [0。 0.0.1.] Pytorch 张量可以使用 torch.nn.function.one_hot 进行处理，并且 numpy 实现非常简单。输出向量的形状为 [N,number_of_classes]。 def preprocessing(y, n_classes): ''' one-hot encoding for input numpy array or pytorch Tensor input: y - [N,] numpy array or pytorch Tensor output: [N, n_classes] the same type as input ''' assert type(y)==np.ndarray or torch.is_tensor(y), f'input should be numpy array or torch tensor. Received input is: {type(categorical)}' assert len(y.shape)==1, f'input shape should be [N,]. Received input shape is: {y.shape}' if torch.is_tensor(y): return torch.nn.functional.one_hot(y, num_classes=n_classes) else: categorical = np.zeros([y.shape[0], n_classes]) categorical[np.arange(y.shape[0]), y]=1 return categorical 要将 one-hot 编码向量转换回类 id，我们需要找到 one-hot 编码向量中最大元素的索引。可以使用下面的 torch.argmax 或 np.argmax 来完成。 def postprocessing(categorical): ''' one-hot to classes decoding with .argmax() input: categorical - [N,classes] numpy array or pytorch Tensor output: [N,] the same type as input ''' assert type(categorical)==np.ndarray or torch.is_tensor(categorical), f'input should be numpy array or torch tensor. Received input is: {type(categorical)}' assert len(categorical.shape)==2, f'input shape should be [N,classes]. Received input shape is: {categorical.shape}' if torch.is_tensor(categorical): return torch.argmax(categorical,dim=1) else: return np.argmax(categorical, axis=1) 测试定义的预处理和后处理函数。 y = get_dataset(n_classes=number_of_classes)[1] y_logits = preprocessing(y, n_classes=number_of_classes) y_class = postprocessing(y_logits) print(f'y shape: {y.shape}, y preprocessed shape: {y_logits.shape}, y postprocessed shape: {y_class.shape}') print('Preprocessing does one-hot encoding of class ids.') print('Postprocessing does one-hot decoding of class one-hot encoded class ids.') for i in range(10): print(f'{y[i]} >> {y_logits[i]} >> {y_class[i]}') 输出： y shape: (10000,), y preprocessed shape: (10000, 4), y postprocessed shape: (10000,) Preprocessing does one-hot encoding of class ids. Postprocessing does one-hot decoding of one-hot encoded class ids. id>>one-hot encoding>>id 0 >> [1. 0. 0. 0.] >> 0 2 >> [0. 0. 1. 0.] >> 2 1 >> [0. 1. 0. 0.] >> 1 2 >> [0. 0. 1. 0.] >> 2 0 >> [1. 0. 0. 0.] >> 0 2 >> [0. 0. 1. 0.] >> 2 0 >> [1. 0. 0. 0.] >> 0 1 >> [0. 1. 0. 0.] >> 1 1 >> [0. 1. 0. 0.] >> 1 2 >> [0. 0. 1. 0.] >> 2 4.5 创建和训练多类分类模型 本节展示了训练二元分类模型所需的所有函数的实现。 4.5.1 Softmax 激活（已更改） 基于 PyTorch 的 softmax 公式实现 def softmax(x): assert len(x.shape)==2, f'input shape should be [N,classes]. Received input shape is: {x.shape}' # Subtract the maximum value for numerical stability # you can find explanation here: https://www.deeplearningbook.org/contents/numerical.html x = x - torch.max(x, dim=1, keepdim=True)[0] # Exponentiate the values exp_x = torch.exp(x) # Sum along the specified dimension sum_exp_x = torch.sum(exp_x, dim=1, keepdim=True) # Compute the softmax return exp_x / sum_exp_x 让我们测试一下softmax： 使用步骤 1 生成 [-10, 11] 范围内的 numpy 数组 test_input 将其重塑为形状为 [7,3] 的张量 使用已实现的 函数和 PyTorch 默认实现 处理 softmax torch.nn.function.softmax test_input 比较结果（它们应该相同） 输出所有七个 [1,3] 张量的 softmax 值和总和 test_input = torch.arange(-10, 11, 1, dtype=torch.float32) test_input = test_input.reshape(-1,3) softmax_output = softmax(test_input) print(f'Input data shape: {test_input.shape}') print(f'input data range: [{test_input.min():.3f}, {test_input.max():.3f}]') print(f'softmax output data range: [{softmax_output.min():.3f}, {softmax_output.max():.3f}]') print(f'softmax output data sum along axis 1: [{softmax_output.sum(axis=1).numpy()}]') softmax_output_pytorch = torch.nn.functional.softmax(test_input, dim=1) print(f'softmax output is the same with pytorch implementation: {(softmax_output_pytorch==softmax_output).all().numpy()}') print('Softmax activation changes values in the chosen axis (1) so that they always sum up to 1:') for i in range(softmax_output.shape[0]): print(f'\t{i}. Sum before softmax: {test_input[i].sum().numpy()} | Sum after softmax: {softmax_output[i].sum().numpy()}') print(f'\t values before softmax: {test_input[i].numpy()}, softmax output values: {softmax_output[i].numpy()}') 输出： Input data shape: torch.Size([7, 3]) input data range: [-10.000, 10.000] softmax output data range: [0.090, 0.665] softmax output data sum along axis 1: [[1. 1. 1. 1. 1. 1. 1.]] softmax output is the same with pytorch implementation: True Softmax activation changes values in the chosen axis (1) so that they always sum up to 1: 0. Sum before softmax: -27.0 | Sum after softmax: 1.0 values before softmax: [-10. -9. -8.], softmax output values: [0.09003057 0.24472848 0.66524094] 1. Sum before softmax: -18.0 | Sum after softmax: 1.0 values before softmax: [-7. -6. -5.], softmax output values: [0.09003057 0.24472848 0.66524094] 2. Sum before softmax: -9.0 | Sum after softmax: 1.0 values before softmax: [-4. -3. -2.], softmax output values: [0.09003057 0.24472848 0.66524094] 3. Sum before softmax: 0.0 | Sum after softmax: 1.0 values before softmax: [-1. 0. 1.], softmax output values: [0.09003057 0.24472848 0.66524094] 4. Sum before softmax: 9.0 | Sum after softmax: 1.0 values before softmax: [2. 3. 4.], softmax output values: [0.09003057 0.24472848 0.66524094] 5. Sum before softmax: 18.0 | Sum after softmax: 1.0 values before softmax: [5. 6. 7.], softmax output values: [0.09003057 0.24472848 0.66524094] 6. Sum before softmax: 27.0 | Sum after softmax: 1.0 values before softmax: [ 8. 9. 10.], softmax output values: [0.09003057 0.24472848 0.66524094] 4.5.2 损失函数：交叉熵（已更改） CE公式基于PyTorch的实现 def cross_entropy_loss(softmax_logits, labels): # Calculate the cross-entropy loss loss = -torch.sum(labels * torch.log(softmax_logits)) / softmax_logits.size(0) return loss 测试CE实施： 生成形状为 [10,5] 且值在 [0,1) 范围内的 数组 test_input 火炬.兰德 生成形状为 [10,] 且值在 [0,4] 范围内的 数组。 test_target 独热编码 数组 test_target 使用已实现的 函数和 PyTorch 实现计算损失 cross_entropy torch.nn.function.binary_cross_entropy 比较结果（它们应该相同） test_input = torch.rand(10, 5, requires_grad=False) test_target = torch.randint(0, 5, (10,), requires_grad=False) test_target = preprocessing(test_target, n_classes=5).float() print(f'test_input shape: {list(test_input.shape)}, test_target shape: {list(test_target.shape)}') # get loss with the cross_entropy_loss implementation loss = cross_entropy_loss(softmax(test_input), test_target) # get loss with the torch.nn.functional.cross_entropy implementation # !!!torch.nn.functional.cross_entropy applies softmax on input logits # !!!pass it test_input without softmax activation loss_pytorch = torch.nn.functional.cross_entropy(test_input, test_target) print(f'Loss outputs are the same: {(loss==loss_pytorch).numpy()}') 预期输出： test_input shape: [10, 5], test_target shape: [10, 5] Loss outputs are the same: True 4.5.3 准确度指标（已更改） 我会用 实现基于模型预测和地面实况计算准确性。 火炬测量 要创建多类分类准确度度量，需要两个参数： 任务类型“多类别” 类数 num_classes # https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html#module-interface accuracy_metric=torchmetrics.classification.Accuracy(task="multiclass", num_classes=number_of_classes) def compute_accuracy(y_pred, y): assert len(y_pred.shape)==2 and y_pred.shape[1] == number_of_classes, 'y_pred shape should be [N, C]' assert len(y.shape)==2 and y.shape[1] == number_of_classes, 'y shape should be [N, C]' return accuracy_metric(postprocessing(y_pred), postprocessing(y)) 4.5.4 神经网络模型 本示例中使用的神经网络是具有 2 个隐藏层的深度神经网络。输入层和隐藏层使用 ReLU 激活，最后一层使用作为类输入提供的激活函数（它将是之前实现的 sigmoid 激活函数）。 class ClassifierNN(torch.nn.Module): def __init__(self, loss_function, activation_function, input_dims=2, output_dims=1): super().__init__() self.linear1 = torch.nn.Linear(input_dims, input_dims * 4) self.linear2 = torch.nn.Linear(input_dims * 4, input_dims * 8) self.linear3 = torch.nn.Linear(input_dims * 8, input_dims * 4) self.output = torch.nn.Linear(input_dims * 4, output_dims) self.loss_function = loss_function self.activation_function = activation_function def forward(self, x): x = torch.nn.functional.relu(self.linear1(x)) x = torch.nn.functional.relu(self.linear2(x)) x = torch.nn.functional.relu(self.linear3(x)) x = self.activation_function(self.output(x)) return x 4.5.5 训练、评估和预测 上图描述了单批次的训练逻辑。稍后，train_epoch 函数将被多次调用（选择的纪元数）。 def train_epoch(model, optimizer, dataloader_train): # set the model to the training mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train model.train() losses = [] accuracies = [] for step, (X_batch, y_batch) in enumerate(dataloader_train): ### forward propagation # get model output and use loss function y_pred = model(X_batch) # get class probabilities with shape [N,1] # apply loss function on predicted probabilities and ground truth loss = model.loss_function(y_pred, y_batch) ### backward propagation # set gradients to zero before backpropagation # https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html optimizer.zero_grad() # compute gradients # https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html loss.backward() # update weights # https://pytorch.org/docs/stable/optim.html#taking-an-optimization-step optimizer.step() # update model weights # calculate batch accuracy acc = compute_accuracy(y_pred, y_batch) # append batch loss and accuracy to corresponding lists for later use accuracies.append(acc) losses.append(float(loss.detach().numpy())) # compute average epoch accuracy train_acc = np.array(accuracies).mean() # compute average epoch loss loss_epoch = np.array(losses).mean() return train_acc, loss_epoch 评估函数迭代提供的 PyTorch 数据加载器，计算当前模型精度并返回平均损失和平均精度。 def evaluate(model, dataloader_in): # set the model to the evaluation mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval model.eval() val_acc_epoch = 0 losses = [] accuracies = [] # disable gradient calculation for evaluation # https://pytorch.org/docs/stable/generated/torch.no_grad.html with torch.no_grad(): for step, (X_batch, y_batch) in enumerate(dataloader_in): # get predictions y_pred = model(X_batch) # calculate loss loss = model.loss_function(y_pred, y_batch) # calculate batch accuracy acc = compute_accuracy(y_pred, y_batch) accuracies.append(acc) losses.append(float(loss.detach().numpy())) # compute average accuracy val_acc = np.array(accuracies).mean() # compute average loss loss_epoch = np.array(losses).mean() return val_acc, loss_epoch 函数迭代提供的数据加载器，将后处理（单热解码）模型预测和地面实况值收集到 [N,1] PyTorch 数组中，并返回两个数组。稍后该函数将用于计算混淆矩阵并可视化预测。 预测 def predict(model, dataloader): # set the model to the evaluation mode # https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval model.eval() xs, ys = next(iter(dataloader)) y_pred = torch.empty([0, ys.shape[1]]) x = torch.empty([0, xs.shape[1]]) y = torch.empty([0, ys.shape[1]]) # disable gradient calculation for evaluation # https://pytorch.org/docs/stable/generated/torch.no_grad.html with torch.no_grad(): for step, (X_batch, y_batch) in enumerate(dataloader): # get predictions y_batch_pred = model(X_batch) y_pred = torch.cat([y_pred, y_batch_pred]) y = torch.cat([y, y_batch]) x = torch.cat([x, X_batch]) # print(y_pred.shape, y.shape) y_pred = postprocessing(y_pred) y = postprocessing(y) return y_pred, y, x 为了训练模型，我们只需要调用 函数 N 次，其中 N 是纪元数。调用 函数来记录验证数据集上的当前模型准确性。最后，根据验证准确性更新最佳模型。 函数返回最佳验证精度和训练历史记录。 train_epoch 评估 model_train def model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=50): best_acc = 0 best_weights = None history = {'loss': {'train': [], 'validation': []}, 'accuracy': {'train': [], 'validation': []}} for epoch in range(n_epochs): # train on dataloader_train acc_train, loss_train = train_epoch(model, optimizer, dataloader_train) # evaluate on dataloader_val acc_val, loss_val = evaluate(model, dataloader_val) print(f'Epoch: {epoch} | Accuracy: {acc_train:.3f} / {acc_val:.3f} | ' + f'loss: {loss_train:.5f} / {loss_val:.5f}') # save epoch losses and accuracies in history dictionary history['loss']['train'].append(loss_train) history['loss']['validation'].append(loss_val) history['accuracy']['train'].append(acc_train) history['accuracy']['validation'].append(acc_val) # Save the best validation accuracy model if acc_val >= best_acc: print(f'\tBest weights updated. Old accuracy: {best_acc:.4f}. New accuracy: {acc_val:.4f}') best_acc = acc_val torch.save(model.state_dict(), 'best_weights.pt') # restore model and return best accuracy model.load_state_dict(torch.load('best_weights.pt')) return best_acc, history 4.5.6 获取数据集、创建模型并训练（已更改） 让我们将所有内容放在一起并训练多类分类模型。 ######################################### # Get the dataset X, y = get_dataset(n_classes=number_of_classes) print(f'Generated dataset shape. X:{X.shape}, y:{y.shape}') # change y numpy array shape from [N,] to [N, C] for multi-class classification y = preprocessing(y, n_classes=number_of_classes) print(f'Dataset shape prepared for multi-class classification with softmax activation and CE loss.') print(f'X:{X.shape}, y:{y.shape}') # Get train and validation datal loaders dataloader_train, dataloader_val = get_data_loaders(dataset=(scale(X), y), batch_size=32) # get a batch from dataloader and output intput and output shape X_0, y_0 = next(iter(dataloader_train)) print(f'Model input data shape: {X_0.shape}, output (ground truth) data shape: {y_0.shape}') ######################################### # Create ClassifierNN for multi-class classification problem # input dims: [N, features] # output dims: [N, C] where C is number of classes # activation - softmax to output [,C] probabilities so that their sum(p_1,p_2,...,p_c)=1 # loss - cross-entropy model = ClassifierNN(loss_function=cross_entropy_loss, activation_function=softmax, input_dims=X.shape[1], output_dims=y.shape[1]) ######################################### # create optimizer and train the model on the dataset optimizer = torch.optim.Adam(model.parameters(), lr=0.001) print(f'Model size: {sum([x.reshape(-1).shape[0] for x in model.parameters()])} parameters') print('#'*10) print('Start training') acc, history = model_train(model, optimizer, dataloader_train, dataloader_val, n_epochs=20) print('Finished training') print('#'*10) print("Model accuracy: %.2f%%" % (acc*100)) 预期输出应类似于下面提供的输出。 Generated dataset shape. X:(10000, 20), y:(10000,) Dataset shape prepared for multi-class classification with softmax activation and CE loss. X:(10000, 20), y:(10000, 4) Model input data shape: torch.Size([32, 20]), output (ground truth) data shape: torch.Size([32, 4]) Model size: 27844 parameters ########## Start training Epoch: 0 | Accuracy: 0.682 / 0.943 | loss: 0.78574 / 0.37459 Best weights updated. Old accuracy: 0.0000. New accuracy: 0.9435 Epoch: 1 | Accuracy: 0.960 / 0.967 | loss: 0.20272 / 0.17840 Best weights updated. Old accuracy: 0.9435. New accuracy: 0.9668 Epoch: 2 | Accuracy: 0.978 / 0.962 | loss: 0.12004 / 0.17931 Epoch: 3 | Accuracy: 0.984 / 0.979 | loss: 0.10028 / 0.13246 Best weights updated. Old accuracy: 0.9668. New accuracy: 0.9787 Epoch: 4 | Accuracy: 0.985 / 0.981 | loss: 0.08838 / 0.12720 Best weights updated. Old accuracy: 0.9787. New accuracy: 0.9807 Epoch: 5 | Accuracy: 0.986 / 0.981 | loss: 0.08096 / 0.12174 Best weights updated. Old accuracy: 0.9807. New accuracy: 0.9812 Epoch: 6 | Accuracy: 0.986 / 0.981 | loss: 0.07944 / 0.12036 Epoch: 7 | Accuracy: 0.988 / 0.982 | loss: 0.07605 / 0.11773 Best weights updated. Old accuracy: 0.9812. New accuracy: 0.9821 Epoch: 8 | Accuracy: 0.989 / 0.982 | loss: 0.07168 / 0.11514 Best weights updated. Old accuracy: 0.9821. New accuracy: 0.9821 Epoch: 9 | Accuracy: 0.989 / 0.983 | loss: 0.06890 / 0.11409 Best weights updated. Old accuracy: 0.9821. New accuracy: 0.9831 Epoch: 10 | Accuracy: 0.989 / 0.984 | loss: 0.06750 / 0.11128 Best weights updated. Old accuracy: 0.9831. New accuracy: 0.9841 Epoch: 11 | Accuracy: 0.990 / 0.982 | loss: 0.06505 / 0.11265 Epoch: 12 | Accuracy: 0.990 / 0.983 | loss: 0.06507 / 0.11272 Epoch: 13 | Accuracy: 0.991 / 0.985 | loss: 0.06209 / 0.11240 Best weights updated. Old accuracy: 0.9841. New accuracy: 0.9851 Epoch: 14 | Accuracy: 0.990 / 0.984 | loss: 0.06273 / 0.11157 Epoch: 15 | Accuracy: 0.991 / 0.984 | loss: 0.05998 / 0.11029 Epoch: 16 | Accuracy: 0.990 / 0.985 | loss: 0.06056 / 0.11164 Epoch: 17 | Accuracy: 0.991 / 0.984 | loss: 0.05981 / 0.11096 Epoch: 18 | Accuracy: 0.991 / 0.985 | loss: 0.05642 / 0.10975 Best weights updated. Old accuracy: 0.9851. New accuracy: 0.9851 Epoch: 19 | Accuracy: 0.990 / 0.986 | loss: 0.05929 / 0.10821 Best weights updated. Old accuracy: 0.9851. New accuracy: 0.9856 Finished training ########## Model accuracy: 98.56% 4.5.7 绘制训练历史 def plot_history(history): fig = plt.figure(figsize=(8, 4), facecolor=(0.0, 1.0, 0.0)) ax = fig.add_subplot(1, 2, 1) ax.plot(np.arange(0, len(history['loss']['train'])), history['loss']['train'], color='red', label='train') ax.plot(np.arange(0, len(history['loss']['validation'])), history['loss']['validation'], color='blue', label='validation') ax.set_title('Loss history') ax.set_facecolor((0.0, 1.0, 0.0)) ax.legend() ax = fig.add_subplot(1, 2, 2) ax.plot(np.arange(0, len(history['accuracy']['train'])), history['accuracy']['train'], color='red', label='train') ax.plot(np.arange(0, len(history['accuracy']['validation'])), history['accuracy']['validation'], color='blue', label='validation') ax.set_title('Accuracy history') ax.legend() fig.tight_layout() ax.set_facecolor((0.0, 1.0, 0.0)) fig.show() 4.6 评估模型 4.6.1 计算训练和验证精度 acc_train, _ = evaluate(model, dataloader_train) acc_validation, _ = evaluate(model, dataloader_val) print(f'Accuracy - Train: {acc_train:.4f} | Validation: {acc_validation:.4f}') Accuracy - Train: 0.9901 | Validation: 0.9851 4.6.2 打印混淆矩阵（已更改） val_preds, val_y, _ = predict(model, dataloader_val) print(val_preds.shape, val_y.shape) multiclass_confusion_matrix = torchmetrics.classification.ConfusionMatrix('multiclass', num_classes=number_of_classes) cm = multiclass_confusion_matrix(val_preds, val_y) print(cm) df_cm = pd.DataFrame(cm) plt.figure(figsize = (6,5), facecolor=(0.0,1.0,0.0)) sn.heatmap(df_cm, annot=True, fmt='d') plt.show() 4.6.3 情节预测和真实情况 val_preds, val_y, val_x = predict(model, dataloader_val) val_preds, val_y, val_x = val_preds.numpy(), val_y.numpy(), val_x.numpy() show_dataset(val_x, val_y,'Ground Truth') show_dataset(val_x, val_preds, 'Predictions') 结论 对于多类分类，需要使用softmax激活和交叉熵损失。从二元分类切换到多类分类需要进行一些代码修改：数据预处理和后处理、激活和损失函数。此外，您可以通过使用 one-hot 编码、softmax 和交叉熵损失将类别数设置为 2 来解决二元分类问题。