BERT outputs

news/2024/12/26 3:56:07/

Yes so BERT (the base model without any heads on top) outputs 2 things: last_hidden_state and pooler_output.
是的，BERT（顶部没有任何头部的基础模型）输出 2 个东西： last_hidden_state 和 pooler_output 。

First question: 第一个问题：

last_hidden_state contains the hidden representations for each token in each sequence of the batch. So the size is (batch_size, seq_len, hidden_size).
last_hidden_state 包含批次中每个序列中每个标记的隐藏表示。因此大小为 (batch_size, seq_len, hidden_size) 。
pooler_output contains a “representation” of each sequence in the batch, and is of size (batch_size, hidden_size). What it basically does is take the hidden representation of the [CLS] token of each sequence in the batch (which is a vector of size hidden_size), and then run that through the BertPooler nn.Module. This consists of a linear layer followed by a Tanh activation function. The weights of this linear layer are already pretrained on the next sentence prediction task (note that BERT is pretrained on 2 tasks: masked language modeling and next sentence prediction). I assume that the authors of the Transformers library have taken the weights from the original TF implementation, and initialized the layer with them. In theory, they would come from BertForPretraining - which is BERT with the 2 pretraining heads on top.
pooler_output 包含批次中每个序列的“表示”，大小为 (batch_size, hidden_size) 。它的基本作用是获取批次中每个序列的 [CLS] 标记的隐藏表示（大小为 hidden_size 的向量），然后通过 BertPooler nn.Module 运行。这包括一个线性层，后跟一个 Tanh 激活函数。这个线性层的权重已经在下一个句子预测任务上进行了预训练（请注意，BERT 在 2 个任务上进行了预训练：掩码语言建模和下一个句子预测）。我假设 Transformers 库的作者已经从原始 TF 实现中获取了这个线性层的权重，并用它们初始化了该层。理论上，它们应该来自 BertForPretraining - 这是在顶部具有 2 个预训练头的 BERT。
Second question: 第二个问题：
Yes you can fine-tune them, just like the hidden states, because the weights of the linear layer are updated when you perform a loss.backward().
是的，您可以微调它们，就像隐藏状态一样，因为当您执行 loss.backward() 时，线性层的权重会被更新。

BTW, please ask questions related to BERT/other models (which are not related to bugs) on the forum, rather than posting them here.
顺便说一句，请在论坛上提出与 BERT/其他模型相关的问题（与错误无关），而不是在这里发布。

BERT outputs

相关文章

Scala图书管理系统

STM32在bootloader跳转到application时设置MSP

《第十二部分》1.STM32之RTC实时时钟介绍---BKP实验

遥感影像目标检测：从CNN（Faster-RCNN）到Transformer（DETR

如何使用vscode解决git冲突

概率论期末笔记

centos server系统新装后的网络配置

【kubernetes】资源管理方式