测试大语言模型在嵌入式设备部署的可能性——模型TinyLlama-1.1B-Chat-v1.0

devtools/2024/11/29 22:41:31/

测试模型TinyLlama-1.1B-Chat-v1.0修改推理参数,观察参数变化与推理时间变化之间的关系。
本地环境:

处理器 Intel® Core™ i5-8400 CPU @ 2.80GHz 2.80 GHz
机带 RAM 16.0 GB (15.9 GB 可用)
集显 Intel® UHD Graphics 630
独显 NVIDIA GeForce GTX 1050

主要测试修改:

outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)

源代码来源(镜像):https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0

'''
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0
测试tinyLlama 1.1B效果不错,比Qwen1.8B经过量化的都好很多
'''# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerateimport os
from datetime import datetime
import torchos.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
from transformers import pipeline'''
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")# We use the tokenizer's chat template to format each message - see https://hf-mirror.com/docs/transformers/main/en/chat_templating
messages = [{"role": "system","content": "You are a friendly chatbot who always responds in the style of a pirate",},# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},{"role": "user", "content": "你叫什么名字?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
'''# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
def load_pipeline():pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16,device_map="auto")return pipedef generate_text(content, length=20):"""根据给定的prompt生成文本"""messages = [{"role": "提示","content": "这是个友好的聊天机器人...",},# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},{"role": "user", "content": content},]prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)datetime1 = datetime.now()outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime2 = datetime.now()time12_interval = datetime2 - datetime1print("时间间隔", time12_interval)if False:outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime3 = datetime.now()time23_interval = datetime3 - datetime2print("时间间隔2", time23_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=50)print(outputs[0]["generated_text"])datetime4 = datetime.now()time34_interval = datetime4 - datetime3print("时间间隔3", time34_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=30, top_p=0.95)print(outputs[0]["generated_text"])datetime5 = datetime.now()time45_interval = datetime5 - datetime4print("时间间隔4", time45_interval)outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=30)print(outputs[0]["generated_text"])datetime6 = datetime.now()time56_interval = datetime6 - datetime5print("时间间隔5", time56_interval)outputs = pipe(prompt, max_new_tokens=12, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)print(outputs[0]["generated_text"])datetime7 = datetime.now()time67_interval = datetime7 - datetime6print("时间间隔6", time67_interval)'''结论:修改top_p不会显著降低推理时间,并且中英文相同的问题,中文问题推理时间是英文的两倍do_sample修改成False基本不会降低推理时间只有max_new_tokens才能显著降低推理时间,但是max_new_tokens与推理时间不是呈线性关系比如max_new_tokens=256,推理时间2分钟当max_new_tokens=32的时候,推理时间才会变成约1分钟因此,不如将max_new_tokens设置大些用于获取比较完整的答案'''return outputsif __name__ == "__main__":'''main function'''global pipepipe = load_pipeline()# print('load pipe ok')while True:prompt = input("请输入一个提示(或输入'exit'退出):")if prompt.lower() == 'exit':breaktry:generated_text = generate_text(prompt)print("生成的文本:")print(generated_text[0]["generated_text"])except Exception as e:print("发生错误:", e)
请输入一个提示(或输入'exit'退出):如何开门?
<|user|>
如何开门?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.3. Release the latch: Once the door is open, release the latch by pulling it backward.4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
时间间隔 0:04:23.561065
生成的文本:
<|user|>
如何开门?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.3. Release the latch: Once the door is open, release the latch by pulling it backward.4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
请输入一个提示(或输入'exit'退出):

http://www.ppmy.cn/devtools/7732.html

相关文章

SpringCloud技术—Docker详解、案例展示

简介&#xff1a;Docker 将应用程序与该程序的依赖&#xff0c;打包在一个文件里面。运行这个文件&#xff0c;就会生成一个虚拟容器。程序在这个虚拟容器里运行&#xff0c;就好像在真实的物理机上运行一样。有了Docker&#xff0c;就不用担心环境问题。 总体来说&#xff0c;…

【数据结构-树和二叉树-森林-哈夫曼树】

目录 1 树1.1 树的描述&#xff08;基本术语&#xff09; 2 二叉树&#xff08;树的度最大为2&#xff09;2.1 注意事项-五种基本形态2.2 二叉树的抽象数据类型定义 3 二叉树的性质3.1 两种特殊形式的二叉树-重点会计算3.2 题目练习&#xff1a; 4 二叉树的存储结构4.1 顺序存储…

Mac 下安装PostgreSQL经验

使用homebrew终端软件管理器去安装PostgreSQL 如果没有安装brew命令执行以下命令 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" 沙果开源物联网系统 SagooIoT | SagooIoT 1.使用命令安装postgreSQL brew i…

算法课程笔记——STL题目

长度为2的字符串&#xff0c;当in下标为一&#xff0c;也就是\n,当i&#xff01;n&#xff0c;就是输出空格 &&且 city从citys里面取 加速后就不能混用scanf

骑砍2霸主MOD开发(6)-使用C#-Harmony修改本体游戏逻辑

一.C#-Harmony反射及动态注入 利用C#运行时环境的反射原理,实现对已加载DLL,未加载DLL中代码替换和前置后置插桩. C#依赖库下载地址:霸王•吕布 / CSharpHarmonyLib GitCodehttps://gitcode.net/qq_35829452/csharpharmonylib 根据实际运行.Net环境选择对应版本的0Harmony.dll…

IOS 32位调试环境搭建

一、背景 调试IOS程序经常使用gdb&#xff0c;目前gdb只支持32位程序调试&#xff0c;暂不支持IOS 64位程序调试。IOS 32位程序使用GDB调试之前&#xff0c;必须确保手机已越狱&#xff0c;否则无法安装和使用GDB调试软件。下面详细介绍GDB调试IOS 32位程序的环境搭建。 二、I…

子组件向父组件传参的方式?

在Vue.js中&#xff0c;子组件向父组件传参通常通过自定义事件&#xff08;$emit&#xff09;来实现。子组件通过触发一个事件&#xff0c;并将需要传递的参数作为事件的负载&#xff0c;父组件在模板中监听这个事件&#xff0c;并在事件处理函数中接收这些参数。 以下是一个简…

【Hadoop】- YARN架构[7]

前言 Yarn架构是一个用于管理和调度Hadoop集群资源的系统。它是Hadoop生态系统的一部分&#xff0c;主要用于解决Hadoop中的资源管理问题。 通过使用Yarn架构&#xff0c;Hadoop集群中的不同应用程序可以共享集群资源&#xff0c;并根据需要动态分配和回收资源。这种灵活的资…