超GPT3.5性能,无限长文本,超强RAG三件套,MiniCPM3-4B模型分享

embedded/2024/10/20 19:23:12/

MiniCPM3-4B是由面壁智能与清华大学自然语言处理实验室合作开发的一款高性能端侧AI模型,它是MiniCPM系列的第三代产品,具有4亿参数量。

MiniCPM3-4B模型在性能上超过了Phi-3.5-mini-Instruct和GPT-3.5-Turbo-0125,并且与多款70亿至90亿参数的AI模型相媲美。

MiniCPM3-4B在多项指标上都有显著提升,包括词汇表大小、模型层数和隐藏层节点的增加,使其处理能力更为出色。

MiniCPM3-4B支持32k的上下文窗口设计,理论上可以处理无限的上下文信息,这对于需要处理大量数据和复杂查询的用户来说是一个巨大的优势。

MiniCPM3-4B还支持更高效的代码执行和函数调用,使开发者能够更快速地实现复杂的任务。

此外,面壁智能还发布了针对RAG场景的微调版MiniCPM3-RAG-LoRA模型,以及RAG套件MiniCPM-Embedding模型和MiniCPM-Reranker模型。

github项目地址:https://github.com/OpenBMB/MiniCPM。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、pip库安装

pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio==2.3.0 --extra-index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install datamodel_code_generator -i https://pypi.tuna.tsinghua.edu.cn/simple

3、MiniCPM3-4B模型下载

git lfs install

git clone https://modelscope.cn/models/OpenBMB/MiniCPM3-4B 4、MiniCPM3-RAG-LoRA模型下载

git lfs install

git clone https://modelscope.cn/models/OpenBMB/MiniCPM3-RAG-LoRA 5、MiniCPM-Reranker模型下载

git lfs install

git clone https://modelscope.cn/models/OpenBMB/MiniCPM-Reranker 6、MiniCPM-Embedding模型下载

git lfs install

git clone https://modelscope.cn/models/OpenBMB/MiniCPM-Embedding

、功能测试

1、运行测试

(1)python代码调用测试

import torch
from modelscope import AutoModelForCausalLM, AutoModel, AutoTokenizer, snapshot_download
from transformers import AutoModelForSequenceClassification
from peft import PeftModel
import torch.nn.functional as F
import numpy as npdef MiniCPM3_4B_inference(message, model_path="OpenBMB/MiniCPM3-4B", device="cuda"):tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)messages = [{"role": "user", "content": message}]model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)model_outputs = model.generate(model_inputs,max_new_tokens=1024,top_p=0.7,temperature=0.7,repetition_penalty=1.02)output_token_ids = [model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))]responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]return responsesdef MiniCPM3_RAG_LoRA_inference(instruction, passages_list, base_model_dir="OpenBMB/MiniCPM3-4B", lora_model_dir="OpenBMB/MiniCPM3-RAG-LoRA"):base_model_dir = snapshot_download(base_model_dir)lora_model_dir = snapshot_download(lora_model_dir)model = AutoModelForCausalLM.from_pretrained(base_model_dir, device_map="auto", torch_dtype=torch.bfloat16).eval()tokenizer = AutoTokenizer.from_pretrained(lora_model_dir)model = PeftModel.from_pretrained(model, lora_model_dir)passages = '\n'.join(passages_list)input_text = 'Background:\n' + passages + '\n\n' + instructionmessages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": input_text},]prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)outputs = model.chat(tokenizer, prompt, temperature=0.8, top_p=0.8)return outputs[0]def MiniCPM_Embedding_inference(queries, passages, model_name="OpenBMB/MiniCPM-Embedding", device="cuda"):tokenizer = AutoTokenizer.from_pretrained(model_name)model = AutoModel.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to(device)model.eval()def weighted_mean_pooling(hidden, attention_mask):attention_mask_ = attention_mask * attention_mask.cumsum(dim=1)s = torch.sum(hidden * attention_mask_.unsqueeze(-1).float(), dim=1)d = attention_mask_.sum(dim=1, keepdim=True).float()reps = s / dreturn reps@torch.no_grad()def encode(input_texts):batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt', return_attention_mask=True).to(device)outputs = model(**batch_dict)attention_mask = batch_dict["attention_mask"]hidden = outputs.last_hidden_statereps = weighted_mean_pooling(hidden, attention_mask)embeddings = F.normalize(reps, p=2, dim=1).detach().cpu().numpy()return embeddingsINSTRUCTION = "Query: "queries = [INSTRUCTION + query for query in queries]embeddings_query = encode(queries)embeddings_doc = encode(passages)scores = (embeddings_query @ embeddings_doc.T)return scores.tolist()def MiniCPM_Reranker_rerank(queries, passages, model_name='OpenBMB/MiniCPM-Reranker', device="cuda", max_len_q=512, max_len_d=512):model_name = snapshot_download(model_name)tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)tokenizer.padding_side = "right"model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True, attn_implementation="flash_attention_2", torch_dtype=torch.float16).to(device)model.eval()def tokenize_our(query, doc):input_id_query = tokenizer.encode(query, add_special_tokens=False, max_length=max_len_q, truncation=True)input_id_doc = tokenizer.encode(doc, add_special_tokens=False, max_length=max_len_d, truncation=True)pad_input = {"input_ids": [tokenizer.bos_token_id] + input_id_query + [tokenizer.eos_token_id] + input_id_doc}return tokenizer.pad(pad_input,padding="max_length",max_length=max_len_q + max_len_d + 2,return_tensors="pt",)@torch.no_grad()def rerank(input_query, input_docs):tokenized_inputs = [tokenize_our(input_query, input_doc).to(device) for input_doc in input_docs]input_ids = {"input_ids": [tokenized_input["input_ids"] for tokenized_input in tokenized_inputs],"attention_mask": [tokenized_input["attention_mask"] for tokenized_input in tokenized_inputs]}for k in input_ids:input_ids[k] = torch.stack(input_ids[k]).to(device)outputs = model(**input_ids)score = outputs.logitsreturn score.float().detach().cpu().numpy()INSTRUCTION = "Query: "queries = [INSTRUCTION + query for query in queries]scores = [rerank(query, docs) for query, docs in zip(queries, passages)]return np.array(scores)def main():# Example use casesresponse_4B = MiniCPM3_4B_inference("推荐5个北京的景点。")print(f"MiniCPM3-4B Response: {response_4B}")instruction = "Q: What is the name of the lead character in the novel 'The Silent Watcher'?\nA:"passages_list = ["In the novel 'The Silent Watcher,' the lead character is named Alex Carter. Alex is a private detective who uncovers a series of mysterious events in a small town.","Set in a quiet town, 'The Silent Watcher' follows Alex Carter, a former police officer turned private investigator, as he unravels the town's dark secrets.","'The Silent Watcher' revolves around Alex Carter's journey as he confronts his past while solving complex cases in his hometown."]response_RAG_LoRA = MiniCPM3_RAG_LoRA_inference(instruction, passages_list)print(f"MiniCPM3-RAG-LoRA Response: {response_RAG_LoRA}")queries = ["China capital?"]passages = ["beijing", "shanghai"]scores_embedding = MiniCPM_Embedding_inference(queries, passages)print(f"MiniCPM-Embedding Scores: {scores_embedding}")rerank_queries = ["China capital?"]rerank_passages = [["beijing", "shanghai"]]scores_reranker = MiniCPM_Reranker_rerank(rerank_queries, rerank_passages)print(f"MiniCPM-Reranker Scores: {scores_reranker}")if __name__ == "__main__":main()

未完......

更多详细的欢迎关注:杰哥新技术


http://www.ppmy.cn/embedded/129070.html

相关文章

爬虫逆向学习(十二):一个案例入门补环境

此分享只用于学习用途,不作商业用途,若有冒犯,请联系处理 反爬前置信息 站点:aHR0cDovLzEyMC4yMTEuMTExLjIwNjo4MDkwL3hqendkdC94anp3ZHQvcGFnZXMvaW5mby9wb2xpY3k 接口:/xjzwdt/rest/xmzInfoDeliveryRest/getInfoDe…

线程池原理(一)

一、常用线程池体系结构图如下: 由上边的体系图可以知道,要想了解线程池 ThreadPoolExecutor 的实现原理,则需要先 了解下 Executor、ExecutorService、AbstractExecutorService 的实现,下面就分别看下 这3个类的实现 二、Executo…

6.计算机网络_UDP

UDP的主要特点: 无连接,发送数据之前不需要建立连接。不保证可靠交付。面向报文。应用层给UDP报文后,UDP并不会抽象为一个一个的字节,而是整个报文一起发送。没有拥塞控制。网络拥堵时,发送端并不会降低发送速率。可以…

MongoDB如何查找数据以及条件运算符使用的详细说明

以下是关于MongoDB如何查找数据以及条件运算符使用的详细说明: 查找数据的基本方法 在MongoDB中,使用db.collection.find()方法来查找集合中的数据。如果不添加任何条件,直接使用db.collection.find()会返回集合中的所有文档。例如&#xf…

【STM32 HAL库】MPU6050姿态解算 卡尔曼滤波

【STM32 HAL库】MPU6050姿态解算 卡尔曼滤波 前言MPU6050寄存器代码详解mpu6050.cmpu6050.h 使用说明 前言 本篇文章基于卡尔曼滤波的原理详解与公式推导,来详细的解释下如何使用卡尔曼滤波来解算MPU6050的姿态 参考资料:Github_mpu6050 MPU6050寄存器…

26备战秋招day6——计算机视觉概述

计算机视觉(Computer Vision)概述 计算机视觉是一个研究如何让机器理解、分析和生成视觉信息的领域。它涉及从图像、视频中获取有意义的信息,目的是通过自动化的方式“看懂”世界。其典型的任务包括:物体识别、图像理解、目标检测…

AI金融攻防赛:YOLO理论学习及赛题进阶思路(DataWhale组队学习)

引言 大家好,我是GISer Liu😁,一名热爱AI技术的GIS开发者。本系列文章是我跟随DataWhale 2024年10月学习赛的AI金融攻防赛学习总结文档。本文主要讲解如何在金融场景凭证篡改检测中应用YOLO算法。我们将从模型概述、数据准备、训练流程以及模…

得物iOS函数调用栈及符号化调优实践|得物技术

一、背景 随着《个人信息保护法》等法律法规的逐步发布实施,个人隐私保护受到越来越多人的关注。在这个整体的大背景下,得物持续完善App的各项合规属性,而在这个过程中,绕不开法务、安全、产品、设计、研发、测试几个重要环节&am…