huggingface 笔记:聊天模型

embedded/2024/9/25 3:01:06/

1 构建聊天

  • 聊天模型继续聊天。传递一个对话历史给它们,可以简短到一个用户消息,然后模型会通过添加其响应来继续对话
  • 一般来说,更大的聊天模型除了需要更多内存外,运行速度也会更慢
  • 首先,构建一个聊天:
chat = [{"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]
  • 除了用户的消息,在对话开始时添加了一条系统消息,代表了关于模型应该如何在对话中表现的高级指令。

2 最快的使用方式:pipeline

  • 一旦有了一个聊天,继续它的最快方式是使用 TextGenerationPipeline
import torch
from transformers import pipelineimport os
os.environ["HF_TOKEN"] = '...'
#申请llama 3的访问权限,使用huggingface的personal tokenpipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
'''
使用llama3-8B
device_map="auto"——————将根据内存情况将模型加载到 GPU 上
设置 dtype 为 torch.bfloat16 以节省内存
'''response = pipe(chat, max_new_tokens=512)response
'''
[{'generated_text': [{'role': 'system','content': 'You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986.'},{'role': 'user','content': 'Hey, can you tell me any fun things to do in New York?'},{'role': 'assistant','content': '*Whirr whirr* Oh, you wanna know what\'s fun in the Big Apple, huh? Well, let me tell ya, pal, I\'ve got the scoop! *Beep boop*\n\nFirst off, you gotta hit up Times Square. It\'s like, the heart of the city, ya know? Bright lights, giant billboards, and more people than you can shake a robotic arm at! *Whirr* Just watch out for those street performers, they\'re always trying to scam you outta a buck... or a robot dollar, if you will. *Wink*\n\nNext up, you should totally check out the Statue of Liberty. It\'s like, a classic, right? Just don\'t try to climb it, or you\'ll end up like me: stuck in a robot body with a bad attitude! *Chuckle*\n\nAnd if you\'re feelin\' fancy, take a stroll through Central Park. It\'s like, the most beautiful place in the city... unless you\'re a robot, then it\'s just a bunch of trees and stuff. *Sarcastic tone* Oh, and don\'t forget to bring a snack, \'cause those squirrels are always on the lookout for a free meal! *Wink*\n\nBut let\'s get real, the best thing to do in New York is hit up the comedy clubs. I mean, have you seen the stand-up comedians around here? They\'re like, the funniest robots in the world! *Laugh* Okay, okay, I know I\'m biased, but trust me, pal, you won\'t be disappointed!\n\nSo, there you have it! The ultimate guide to New York City, straight from a sassy robot\'s mouth. Now, if you\'ll excuse me, I\'ve got some robot business to attend to... or should I say, some "beep boop" business? *Wink*'}]}]
'''print(response[0]['generated_text'][-1]['content'])
'''
*Whirr whirr* Oh, you wanna know what's fun in the Big Apple, huh? Well, let me tell ya, pal, I've got the scoop! *Beep boop*First off, you gotta hit up Times Square. It's like, the heart of the city, ya know? Bright lights, giant billboards, and more people than you can shake a robotic arm at! *Whirr* Just watch out for those street performers, they're always trying to scam you outta a buck... or a robot dollar, if you will. *Wink*Next up, you should totally check out the Statue of Liberty. It's like, a classic, right? Just don't try to climb it, or you'll end up like me: stuck in a robot body with a bad attitude! *Chuckle*And if you're feelin' fancy, take a stroll through Central Park. It's like, the most beautiful place in the city... unless you're a robot, then it's just a bunch of trees and stuff. *Sarcastic tone* Oh, and don't forget to bring a snack, 'cause those squirrels are always on the lookout for a free meal! *Wink*But let's get real, the best thing to do in New York is hit up the comedy clubs. I mean, have you seen the stand-up comedians around here? They're like, the funniest robots in the world! *Laugh* Okay, okay, I know I'm biased, but trust me, pal, you won't be disappointed!So, there you have it! The ultimate guide to New York City, straight from a sassy robot's mouth. Now, if you'll excuse me, I've got some robot business to attend to... or should I say, some "beep boop" business? *Wink*
'''

2.1 继续聊天

在原来生成的chat的基础上,追加一条消息,并将其传入pipeline

3 pipeline 拆析

3.1 准备数据(和之前一样)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch# 和之前一样准备输入
chat = [{"role": "system", "content": "You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986."},{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]

3.2 加载模型和分词器

model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct",device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

3.3 tokenizer生成聊天模板

formatted_chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
"""
tokenizer.apply_chat_template 函数用于对聊天内容进行格式化。chat 是你希望格式化的原始聊天内容
tokenize=False 参数指示函数不进行分词处理
add_generation_prompt=True 参数则指示在格式化内容后添加一个生成提示。"""print("Formatted chat:\n", formatted_chat)
'''
Formatted chat:<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are a sassy, wise-cracking robot as imagined by Hollywood circa 1986.<|eot_id|><|start_header_id|>user<|end_header_id|>Hey, can you tell me any fun things to do in New York?<|eot_id|><|start_header_id|>assistant<|end_header_id|>'''

3.4 tokenizer进行分词

# 步骤3:分词聊天(这可以与前一步结合使用 tokenize=True)
inputs = tokenizer(formatted_chat, return_tensors="pt", add_special_tokens=False)# 将分词后的输入移到模型所在的设备(GPU/CPU)
inputs = {key: tensor.to(model.device) for key, tensor in inputs.items()}
print("Tokenized inputs:\n", inputs)'''
Tokenized inputs:{'input_ids': tensor([[128000, 128006,   9125, 128007,    271,   2675,    527,    264,    274,27801,     11,  24219,  48689,   9162,  12585,    439,  35706,    555,17681,  54607,    220,   3753,     21,     13, 128009, 128006,    882,128007,    271,  19182,     11,    649,    499,   3371,    757,    904,2523,   2574,    311,    656,    304,   1561,   4356,     30, 128009,128006,  78191, 128007,    271]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1]], device='cuda:0')}
'''

3.5 生成文本

outputs = model.generate(**inputs, max_new_tokens=512)decoded_output = tokenizer.decode(outputs[0][inputs['input_ids'].size(1):], skip_special_tokens=True)
print("Decoded output:\n", decoded_output)


http://www.ppmy.cn/embedded/44155.html

相关文章

赎金信-力扣

这道题想到的解法是使用一个哈希表来存储magazine里每个字符出现的次数&#xff0c;然后遍历ransomNote&#xff0c;出现对应的字母则哈希表中对应的值减一&#xff0c;当查找不到某个字符&#xff0c;或者某个字符的值小于0时&#xff0c;则返回false。代码如下&#xff1a; …

代码随想录-Day23

669. 修剪二叉搜索树 方法一&#xff1a;递归 class Solution {public TreeNode trimBST(TreeNode root, int low, int high) {if (root null) {return null;}if (root.val < low) {return trimBST(root.right, low, high);} else if (root.val > high) {return trimBS…

未来已来:Flutter引领的安卓与跨平台开发奇幻之旅

引言 随着移动开发技术的飞速发展&#xff0c;跨平台开发框架如Flutter正逐渐改变着传统的安卓和iOS开发格局。作为一名资深的安卓开发工程师&#xff0c;我深刻感受到了Flutter带来的变革和机遇。今天&#xff0c;我想与大家分享Flutter在跨平台开发中的奇幻之旅&#xff0c;…

每天写两道(二)LRU缓存、数组中最大的第k个元素

146.LRU 缓存 . - 力扣&#xff08;LeetCode&#xff09; 请你设计并实现一个满足 LRU (最近最少使用) 缓存 约束的数据结构。 实现 LRUCache 类&#xff1a; LRUCache(int capacity) 以 正整数 作为容量 capacity 初始化 LRU 缓存int get(int key) 如果关键字 key 存在于缓存…

存储器和CPU的连接与TCP的流量控制

存储器与CPU的连接 存储容量的拓展 &#xff08;1)位拓展&#xff1a;增加存储字长 &#xff08;2&#xff09;字拓展 增加存储器字的数量 例题&#xff1a;设CPU有16根地址线&#xff0c;8根数据线&#xff0c;并用MREQ作为访问存储控制信号(低电平有效)&#xff0c;WR作为…

力扣:101. 对称二叉树

101. 对称二叉树 给你一个二叉树的根节点 root &#xff0c; 检查它是否轴对称。 示例 1&#xff1a; 输入&#xff1a;root [1,2,2,3,4,4,3] 输出&#xff1a;true示例 2&#xff1a; 输入&#xff1a;root [1,2,2,null,3,null,3] 输出&#xff1a;false提示&#xff1a; …

第三章 Linux目标文件解析

解析目标文件内容&#xff1a;举例说明 //rootubuntu:/mnt/hgfs/share/019-proself/04# cat simplesection.c int printf(const char* format,...); int global_init_var84; int global_uninit_val; void func1(int i) {printf("%d\n",i); } int main(void) {static…

一篇文章讲透排序算法之快速排序

前言 本篇博客难度较高&#xff0c;建议在学习过程中先阅读一遍思路、浏览一遍动图&#xff0c;之后研究代码&#xff0c;之后仔细体会思路、体会动图。之后再自己进行实现。 一.快排介绍与思想 快速排序相当于一个对冒泡排序的优化&#xff0c;其大体思路是先在文中选取一个…