DeepSeek-VL2 后训练版本的网络架构

ops/2025/2/12 10:53:51/

DeepSeek-VL2 后训练版本的网络架构

flyfish

通过视觉模块提取图像特征,投影模块将视觉特征映射到与语言模块兼容的特征空间,语言模块则结合视觉和文本信息进行因果语言建模。同时,使用 PEFT 和 LoRA 技术进行参数高效微调,以及 MoE 架构提高模型的计算效率和表达能力。
请添加图片描述

整体架构概述

这是一个多模态的因果语言模型,使用了参数高效微调(PEFT)技术,结合了视觉和语言处理能力。模型主要由包装层、视觉模块、投影模块和语言模块组成,各部分协同工作以处理视觉和文本输入。

各部分详细分析

1. PeftModelForCausalLMLoraModel 包装层
  • PeftModelForCausalLM:它是基于因果语言模型的参数高效微调包装类。在微调过程中,为了减少计算资源和时间消耗,PEFT 方法只调整模型的部分参数,而非全部。这使得模型能够在特定任务上快速适应,同时保留预训练模型的大部分权重。
  • LoraModel:LoRA(Low - Rank Adaptation)是 PEFT 中的一种具体技术。LoraModel 通过在原始线性层上添加低秩矩阵来调整模型的权重。在训练时,只需更新这些低秩矩阵的参数,从而大大减少了可训练参数的数量。例如,在模型中的多个 lora.Linear 层中,lora_Alora_B 矩阵就是 LoRA 技术引入的低秩矩阵。
2. 视觉模块(vision

该模块基于 Vision Transformer(ViT)架构,用于处理视觉输入,具体结构如下:

  • PatchEmbed
    • proj:一个卷积层 Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14)),将输入的 3 通道图像分割成固定大小的图像块,并将每个图像块映射到 1152 维的特征向量。
    • norm:这里是 Identity(),即不进行归一化操作。
  • pos_drop:一个 Dropout 层,丢弃概率 p = 0.0,意味着在训练过程中不丢弃任何元素。
  • blocks:由 26 个 Block 组成的序列,每个 Block 包含多头自注意力机制(Attention)和多层感知机(Mlp),并使用 LayerNorm 进行归一化。
  • norm:一个 LayerNorm 层,对视觉特征进行归一化处理。
  • attn_pool:注意力池化层 AttentionPoolLatent,用于从视觉特征中提取潜在表示。
  • head:当前为 Identity(),可能用于后续的任务特定输出层。
3. 投影模块(projector
(projector): MlpProjector((layers): Sequential((0): Linear(in_features=4608, out_features=2048, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=2048, out_features=2048, bias=True))
)
  • 这是一个多层感知机(MLP)投影器,将视觉模块提取的 4608 维特征投影到 2048 维,以匹配语言模块的输入维度。中间使用 GELU 激活函数引入非线性。
4. 语言模块(language

该模块基于 DeepseekV2ForCausalLM,是一个用于因果语言建模的模型,具体结构如下:

  • embed_tokens:一个嵌入层 Embedding(102400, 2048),将输入的词索引映射到 2048 维的词嵌入向量。
  • layers:由 27 个 DeepseekV2DecoderLayer 组成的 ModuleList,每个 DeepseekV2DecoderLayer 包含自注意力机制(DeepseekV2Attention)和多层感知机(DeepseekV2MLPDeepseekV2MoE),并使用 DeepseekV2RMSNorm 进行归一化。
    • DeepseekV2Attention:自注意力机制,包含多个 lora.Linear 层,使用 LoRA 技术进行微调。其中 rotary_emb 是旋转嵌入层,用于对位置信息进行编码。
    • DeepseekV2MLPDeepseekV2MoE:多层感知机模块,部分层使用了混合专家(MoE)架构。DeepseekV2MoE 包含多个专家网络(experts)和一个门控网络(MoEGate),门控网络根据输入决定将输入分配给哪些专家网络进行处理。
  • norm:一个 DeepseekV2RMSNorm 层,对语言特征进行归一化处理。
  • lm_head:一个线性层 Linear(in_features=2048, out_features=102400, bias=False),将语言特征映射到词汇表大小(102400),用于预测下一个词的概率。
model: PeftModelForCausalLM((base_model): LoraModel((model): DeepseekVLV2ForCausalLM((vision): VisionTransformer((patch_embed): PatchEmbed((proj): Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14))(norm): Identity())(pos_drop): Dropout(p=0.0, inplace=False)(patch_drop): Identity()(norm_pre): Identity()(blocks): Sequential((0): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(1): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(2): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(3): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(4): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(5): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(6): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(7): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(8): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(9): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(10): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(11): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(12): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(13): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(14): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(15): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(16): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(17): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(18): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(19): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(20): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(21): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(22): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(23): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(24): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(25): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity())(26): Block((norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn): Attention((qkv): Linear(in_features=1152, out_features=3456, bias=True)(q_norm): Identity()(k_norm): Identity()(attn_drop): Dropout(p=0.0, inplace=False)(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Identity())(ls1): Identity()(drop_path1): Identity()(norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='tanh')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False))(ls2): Identity()(drop_path2): Identity()))(norm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(attn_pool): AttentionPoolLatent((q): Linear(in_features=1152, out_features=1152, bias=True)(kv): Linear(in_features=1152, out_features=2304, bias=True)(q_norm): Identity()(k_norm): Identity()(proj): Linear(in_features=1152, out_features=1152, bias=True)(proj_drop): Dropout(p=0.0, inplace=False)(norm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)(mlp): Mlp((fc1): Linear(in_features=1152, out_features=4304, bias=True)(act): GELU(approximate='none')(drop1): Dropout(p=0.0, inplace=False)(norm): Identity()(fc2): Linear(in_features=4304, out_features=1152, bias=True)(drop2): Dropout(p=0.0, inplace=False)))(fc_norm): Identity()(head_drop): Dropout(p=0.0, inplace=False)(head): Identity())(projector): MlpProjector((layers): Sequential((0): Linear(in_features=4608, out_features=2048, bias=True)(1): GELU(approximate='none')(2): Linear(in_features=2048, out_features=2048, bias=True)))(language): DeepseekV2ForCausalLM((model): DeepseekV2Model((embed_tokens): Embedding(102400, 2048)(layers): ModuleList((0): DeepseekV2DecoderLayer((self_attn): DeepseekV2Attention((q_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=3072, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=3072, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(kv_a_proj_with_mqa): lora.Linear((base_layer): Linear(in_features=2048, out_features=576, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=576, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(kv_a_layernorm): DeepseekV2RMSNorm()(kv_b_proj): lora.Linear((base_layer): Linear(in_features=512, out_features=4096, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=512, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=4096, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(o_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=2048, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=2048, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(rotary_emb): DeepseekV2RotaryEmbedding())(mlp): DeepseekV2MLP((gate_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=10944, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=10944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(up_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=10944, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=10944, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(down_proj): lora.Linear((base_layer): Linear(in_features=10944, out_features=2048, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=10944, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=2048, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(act_fn): SiLU())(input_layernorm): DeepseekV2RMSNorm()(post_attention_layernorm): DeepseekV2RMSNorm())(1-26): 26 x DeepseekV2DecoderLayer((self_attn): DeepseekV2Attention((q_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=3072, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=3072, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(kv_a_proj_with_mqa): lora.Linear((base_layer): Linear(in_features=2048, out_features=576, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=576, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(kv_a_layernorm): DeepseekV2RMSNorm()(kv_b_proj): lora.Linear((base_layer): Linear(in_features=512, out_features=4096, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=512, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=4096, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(o_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=2048, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=2048, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(rotary_emb): DeepseekV2RotaryEmbedding())(mlp): DeepseekV2MoE((experts): ModuleList((0-63): 64 x DeepseekV2MLP((gate_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=1408, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=1408, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(up_proj): lora.Linear((base_layer): Linear(in_features=2048, out_features=1408, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=2048, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=1408, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(down_proj): lora.Linear((base_layer): Linear(in_features=1408, out_features=2048, bias=False)(lora_dropout): ModuleDict((default): Dropout(p=0.05, inplace=False))(lora_A): ModuleDict((default): Linear(in_features=1408, out_features=8, bias=False))(lora_B): ModuleDict((default): Linear(in_features=8, out_features=2048, bias=False))(lora_embedding_A): ParameterDict()(lora_embedding_B): ParameterDict()(lora_magnitude_vector): ModuleDict())(act_fn): SiLU()))(gate): MoEGate()(shared_experts): DeepseekV2MLP((gate_proj): Linear(in_features=2048, out_features=2816, bias=False)(up_proj): Linear(in_features=2048, out_features=2816, bias=False)(down_proj): Linear(in_features=2816, out_features=2048, bias=False)(act_fn): SiLU()))(input_layernorm): DeepseekV2RMSNorm()(post_attention_layernorm): DeepseekV2RMSNorm()))(norm): DeepseekV2RMSNorm())(lm_head): Linear(in_features=2048, out_features=102400, bias=False))))
)

http://www.ppmy.cn/ops/157752.html

相关文章

MIPI 详解:C-PHY

提示:本文基于 MIPI Specification for C-PHY Version 1.2 – 26 November 2016 文章目录 简介C-PHY 概述PHY 功能概述Lane 信号状态概述高速模式下符号的表示高速信号状态表示 体系结构Lane 模块主机和从机高频时钟产生通道和物理协议接口可选择的通道选项 Global …

使用OBS推流,大华摄像头 srs服务器播放

说明: ffmpeg可以推流,但是是命令行方式不太友好,还可以使用主流的OBS开源推流软件,可从官网Open Broadcaster Software | OBS 下载最新版本,目前很多网络主播都是用它做直播。该软件支持本地视频文件以及摄像头推流。…

郭羽冲IOI2024参赛总结

非常荣幸能代表中国参加第 36 36 36 届国际信息学奥林匹克竞赛( I O I 2024 IOI2024 IOI2024)。感谢 C C F CCF CCF 为我们提供竞赛的平台,感谢随行的老师们一路上为我们提供的帮助与支持。 在每场比赛的前一个晚上,领队、副领…

杜绝遛狗不牵绳,AI技术助力智慧城市宠物管理

在我们的生活中,宠物扮演着越来越重要的角色。然而,随着养宠人数的增加,一系列问题也随之而来,如烈性犬伤人、遛狗不牵绳、流浪犬泛滥等。这些问题不仅影响了社会秩序,也给宠物本身带来了安全隐患。幸运的是&#xff0…

Kafka 的消费offset原来是使用ZK管理,现在新版本是怎么管理的?

目录 基于 ZooKeeper 管理消费 offset 原理 缺点 新版本基于内部主题管理消费 offset 原理 优点 示例代码(Java) 在 Kafka 早期版本中,消费者的消费偏移量(offset)是存储在 ZooKeeper 中的,但由于 ZooKeeper 并不适合高频读写操作,从 Kafka 0.9 版本开始,消费偏…

探秘Hugging Face与DeepSeek:AI开源世界的闪耀双子星

目录 一、引言:AI 开源浪潮的澎湃二、Hugging Face:AI 开源社区的基石(一)起源与发展历程(二)核心技术与特色(三)在 AI 领域的广泛应用 三、DeepSeek:东方崛起的 AI 新势…

android的Lifecycle简介

嗯,我现在需要了解Android的Lifecycle组件。Lifecycle是Jetpack的一部分,对吧?听说它帮助管理Activity和Fragment的生命周期,避免内存泄漏。那它具体是怎么工作的呢? 首先,LifecycleOwner和LifecycleObser…

28、Spring Boot 定时任务:轻松实现任务自动化

引言 在实际的项目开发中,我们常常会遇到需要定时执行某些任务的场景,比如每天凌晨自动备份数据、每小时更新缓存信息等。Spring Boot 为我们提供了便捷的方式来实现定时任务,本文将全面介绍 Spring Boot 定时任务的相关知识,包括…