pytorch小记（十一）：pytorch中 `torch.nn.Dropout` 详解

PyTorch `torch.nn.Dropout` 详解
- 1. 什么是 Dropout？
- 2. `torch.nn.Dropout` 语法
- 3. `torch.nn.Dropout` 示例
- - 📌 示例 1：基本用法
  - 📌 示例 2：`p=0.75` 和 `p=0.25` 的 Dropout
- 4. Dropout 只在训练时生效
- - 📌 示例 3：训练 vs. 评估
- 5. 在神经网络中使用 Dropout
- - 📌 示例 4：Dropout 在神经网络中的应用
- 6. `torch.nn.functional.dropout`
- 7. Dropout 适用场景
- 8. 总结

PyTorch `torch.nn.Dropout` 详解

在深度学习中，过拟合（overfitting） 是一个常见的问题，它会导致模型在训练集上表现良好，但在测试集上的泛化能力较差。为了防止过拟合，我们可以使用 Dropout 作为一种正则化方法。在 PyTorch 中，torch.nn.Dropout 提供了 Dropout 机制，用于在训练过程中随机丢弃部分神经元的输出，以提高模型的泛化能力。

1. 什么是 Dropout？

Dropout 是一种常用的 正则化（regularization） 技术，它的核心思想是：

训练时：以一定的概率 p（通常在 0.1 到 0.5 之间）随机将一部分神经元的输出设为 0，而剩余的神经元的输出被放大 $\frac{1}{1-p}$ 倍，以保持期望值不变。
测试时：不使用 Dropout，所有神经元的输出都正常工作。

数学公式：
给定一个输入 x，Dropout 计算方式如下：
$\frac{x \cdot M}{1 - p}$
其中：

M 是一个与 x 形状相同的 二进制掩码（mask），其元素以 p 概率取 0，以 1 - p 概率取 1。
p 是 Dropout 率（即被丢弃的神经元比例）。
训练时需要 放大剩余神经元的输出 以保持数值的期望不变。

2. `torch.nn.Dropout` 语法

在 PyTorch 中，可以使用 torch.nn.Dropout 进行 Dropout 操作：

python">torch.nn.Dropout(p=0.5)

参数说明：

参数	说明
`p`	被丢弃的神经元的比例（默认为 `0.5`）。
`inplace`	是否进行原地操作，默认 `False`。如果设置为 `True`，则会直接修改输入数据。

3. `torch.nn.Dropout` 示例

📌 示例 1：基本用法

python">import torch# 创建 Dropout 层，p=0.5 表示 50% 的神经元会被随机丢弃
dropout = torch.nn.Dropout(0.5)# 创建一个 6×6 的全 1 矩阵
example = torch.ones(6, 6)# 经过 Dropout 处理
output = dropout(example)
print(output)

示例输出（每次运行可能不同，因为 Dropout 是随机的）：

tensor([[0., 0., 2., 0., 0., 2.],[0., 2., 2., 0., 2., 2.],[0., 2., 0., 2., 0., 0.],[2., 0., 2., 0., 2., 0.],[2., 2., 0., 0., 2., 0.],[0., 2., 0., 2., 0., 2.]])

🔍 结果分析：

p=0.5，所以大约 50% 的元素变成了 0。
其余的元素被 放大 2 倍（ $1/ (1 - 0.5) = 2$ ），以保持输出的总期望不变。
例如，输入是 1，经过 Dropout 处理后：
- 有 50% 的概率变为 0。
- 有 50% 的概率变为 2。

📌 示例 2：`p=0.75` 和 `p=0.25` 的 Dropout

python">import torch# 设置不同的 Dropout 率
dropout_75 = torch.nn.Dropout(0.75)  # 75% 的神经元会被丢弃
dropout_25 = torch.nn.Dropout(0.25)  # 25% 的神经元会被丢弃# 创建一个 6×6 的全 1 矩阵
example = torch.ones(6, 6)# 经过 Dropout(0.75) 处理
output_75 = dropout_75(example)
print("Dropout p=0.75:")
print(output_75)# 经过 Dropout(0.25) 处理
output_25 = dropout_25(example)
print("\nDropout p=0.25:")
print(output_25)

示例输出（每次可能不同）：

Dropout p=0.75:
tensor([[ 4.,  0.,  0.,  0.,  0.,  4.],[ 0.,  4.,  0.,  0.,  0.,  4.],[ 0.,  0.,  0.,  4.,  4.,  0.],[ 0.,  0.,  0.,  4.,  0.,  0.],[ 4.,  0.,  0.,  0.,  4.,  0.],[ 0.,  0.,  4.,  4.,  0.,  0.]])Dropout p=0.25:
tensor([[1.3333, 1.3333, 1.3333, 0.0000, 1.3333, 1.3333],[1.3333, 0.0000, 1.3333, 0.0000, 0.0000, 1.3333],[1.3333, 1.3333, 1.3333, 1.3333, 1.3333, 1.3333],[1.3333, 1.3333, 1.3333, 1.3333, 1.3333, 0.0000],[1.3333, 0.0000, 1.3333, 1.3333, 1.3333, 1.3333],[1.3333, 1.3333, 1.3333, 1.3333, 1.3333, 0.0000]])

🔍 结果分析：

p=0.75：
- 75% 的元素变成了 0。
- 其余的元素被 放大 $1/ (1 - 0.75) = 4$ 倍。
p=0.25：
- 25% 的元素变成了 0。
- 其余的元素被 放大 $1/ (1 - 0.25) = 1.3333$ 倍。

🚀 不同的 p 值控制了不同的神经元丢弃比例，合理选择可以帮助模型提高泛化能力！ 🎯

4. Dropout 只在训练时生效

在 PyTorch 中，Dropout 只在训练模式 (train()) 生效，在 评估模式 (eval()) 自动关闭。

📌 示例 3：训练 vs. 评估

python">dropout = torch.nn.Dropout(0.5)
example = torch.ones(6, 6)# 训练模式 (默认)
dropout.train()
print("Train Mode Output:")
print(dropout(example))  # 50% 的元素会被置 0，其余的元素放大 2 倍# 评估模式
dropout.eval()
print("\nEval Mode Output:")
print(dropout(example))  # 所有元素保持不变

示例输出（每次可能不同）：

Train Mode Output:
tensor([[0., 0., 2., 0., 2., 0.],[0., 2., 2., 0., 2., 0.],[2., 0., 0., 2., 0., 2.],[0., 2., 2., 0., 0., 2.],[2., 0., 0., 0., 2., 2.],[2., 2., 2., 0., 0., 0.]])Eval Mode Output:
tensor([[1., 1., 1., 1., 1., 1.],[1., 1., 1., 1., 1., 1.],[1., 1., 1., 1., 1., 1.],[1., 1., 1., 1., 1., 1.],[1., 1., 1., 1., 1., 1.],[1., 1., 1., 1., 1., 1.]])

💡 关键点：

在 train() 模式下，Dropout 随机丢弃部分神经元。
在 eval() 模式下，Dropout 被禁用，所有值保持原样。

5. 在神经网络中使用 Dropout

在深度神经网络中，Dropout 主要用于 全连接层 和 卷积层 之间，防止模型过拟合。

📌 示例 4：Dropout 在神经网络中的应用

python">import torch.nn as nnclass SimpleNN(nn.Module):def __init__(self):super(SimpleNN, self).__init__()self.fc1 = nn.Linear(10, 5)self.dropout = nn.Dropout(p=0.5)  # Dropout 层self.fc2 = nn.Linear(5, 2)def forward(self, x):x = torch.relu(self.fc1(x))x = self.dropout(x)  # 在隐藏层添加 Dropoutx = self.fc2(x)return x# 创建模型
model = SimpleNN()
print(model)

输出：

SimpleNN((fc1): Linear(in_features=10, out_features=5, bias=True)(dropout): Dropout(p=0.5, inplace=False)(fc2): Linear(in_features=5, out_features=2, bias=True)
)

💡 解释：

在 fc1 层后面加了 Dropout，防止全连接层过拟合。
Dropout 只在 train() 时生效，在 eval() 时自动关闭。

6. `torch.nn.functional.dropout`

除了 torch.nn.Dropout 以外，PyTorch 还提供了 torch.nn.functional.dropout()，它可以直接对张量进行 Dropout 操作：

python">import torch.nn.functional as Fx = torch.ones(6, 6)
output = F.dropout(x, p=0.5, training=True)  # 50% Dropout
print(output)

与 nn.Dropout() 不同的是，这种方式 不需要创建 Dropout 层，但必须手动指定 training=True。

7. Dropout 适用场景

📌 Dropout 适用于：

深度神经网络（DNN），防止全连接层过拟合。
卷积神经网络（CNN），可在全连接层后使用 Dropout。
循环神经网络（RNN），防止时间序列模型过拟合。

📌 Dropout 不适用于：

训练完模型后（需要 model.eval()）。
Batch Normalization（BN）后，因为 BN 本身有正则化效果。

8. 总结

作用	说明
`torch.nn.Dropout(p=0.5)`	在神经网络中使用 Dropout（默认 `p=0.5`）
Dropout 在 `train()` 生效	训练时随机丢弃 `p%` 的神经元
Dropout 在 `eval()` 失效	评估时所有神经元都工作
`torch.nn.functional.dropout(x, p=0.5, training=True)`	直接对张量应用 Dropout

🚀 合理使用 Dropout，可以显著提高模型的泛化能力，防止过拟合！ 🎯