Attention ( Q , K , V ) = softmax ( Q K T d k ) V \text{Attention}(Q,K,V) = \text{softmax}(\frac{QK^T}{\sqrt{d_k}})V Attention(Q,K,V)=softmax(dkQKT)V
2.1.2 位置编码方案对比
类型
公式
优点
绝对位置编码
P E ( p o s , 2 i ) = sin ( p o s / 1000 0 2 i / d ) PE(pos,2i)=\sin(pos/10000^{2i/d}) PE(pos,2i)=sin(pos/100002i/d)
简单易实现
相对位置编码
a i j = q i T k j + q i T r i − j a_{ij}=q_i^Tk_j + q_i^Tr_{i-j} aij=qiTkj+qiTri−j
更好处理长序列
旋转位置编码
q m = f q ( x m ) e i m θ q_m = f_q(x_m)e^{im\theta} qm=fq(xm)eimθ
classMoELayer(nn.Module):def__init__(self, num_experts=8, d_model=1024):super().__init__()self.experts = nn.ModuleList([nn.Sequential(nn.Linear(d_model,4*d_model),nn.GELU(),nn.Linear(4*d_model, d_model))for _ inrange(num_experts)])self.gate = nn.Linear(d_model, num_experts)defforward(self, x):gates = torch.softmax(self.gate(x), dim=-1)expert_outputs =[e(x)for e in self.experts]returnsum(g[...,None]* o for g, o inzip(gates.unbind(-1), expert_outputs))
# 使用CodeLlama生成代码
prompt ="""
Implement a Python function to calculate Fibonacci sequence with memoization
Include type hints and docstring
"""response = code_llama.generate(prompt,max_tokens=200,temperature=0.2)print(response)
6.2 多模态理解示例
# 使用Flamingo处理图文问答
image = load_image("chart.png")
question ="What is the main trend shown in this chart?"answer = flamingo_model.generate(image=image,text=question,max_length=100)
GGUF(Gigabyte-Graded Unified Format)和LLaMA(Large Language Model Meta AI)是两个不同层面的概念,分别属于大模型技术栈中的不同环节。它们的核心区别在于定位和功能: 1. LLaMA(Meta的大语言…