微调大模型(Finetuning Large Language Models)—Evaluation(六)

devtools/2024/9/29 13:49:32/

1. 微调后对模型进行评估

模型的评估目前没有统一的标准,有从正向角度,核对是否命中,当然也有从反向角度,考虑未命中的错误分析。

常见的评估方式如图所示:
在这里插入图片描述
本节学习资料地址:传送门

2. 代码测试

2.1 准备环境

import datasets
import tempfile
import logging
import random
import config
import os
import yaml
import logging
import difflib
import pandas as pdimport transformers
import datasets
import torchfrom tqdm import tqdm
from utilities import *
from transformers import AutoTokenizer, AutoModelForCausalLMlogger = logging.getLogger(__name__)
global_config = None

2.2 读取数据

dataset = datasets.load_dataset("lamini/lamini_docs")test_dataset = dataset["test"]
print(test_dataset[0]["question"])
print(test_dataset[0]["answer"])

输出如下:

Can Lamini generate technical documentation or user manuals for software projects?
Yes, Lamini can generate technical documentation and user manuals for software projects. It uses natural language generation techniques to create clear and concise documentation that is easy to understand for both technical and non-technical users. This can save developers a significant amount of time and effort in creating documentation, allowing them to focus on other aspects of their projects.

2.3 读取模型

model_name = "lamini/lamini_docs_finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

2.4 设置基础评估函数

def is_exact_match(a, b):return a.strip() == b.strip()model.eval()

输出如下:

GPTNeoXForCausalLM((gpt_neox): GPTNeoXModel((embed_in): Embedding(50304, 512)(emb_dropout): Dropout(p=0.0, inplace=False)(layers): ModuleList((0-5): 6 x GPTNeoXLayer((input_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(post_attention_layernorm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)(post_attention_dropout): Dropout(p=0.0, inplace=False)(post_mlp_dropout): Dropout(p=0.0, inplace=False)(attention): GPTNeoXAttention((rotary_emb): GPTNeoXRotaryEmbedding()(query_key_value): Linear(in_features=512, out_features=1536, bias=True)(dense): Linear(in_features=512, out_features=512, bias=True)(attention_dropout): Dropout(p=0.0, inplace=False))(mlp): GPTNeoXMLP((dense_h_to_4h): Linear(in_features=512, out_features=2048, bias=True)(dense_4h_to_h): Linear(in_features=2048, out_features=512, bias=True)(act): GELUActivation())))(final_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True))(embed_out): Linear(in_features=512, out_features=50304, bias=False)
)

2.5 设置生成函数

def inference(text, model, tokenizer, max_input_tokens=1000, max_output_tokens=100):# Tokenizetokenizer.pad_token = tokenizer.eos_tokeninput_ids = tokenizer.encode(text,return_tensors="pt",truncation=True,max_length=max_input_tokens)# Generatedevice = model.devicegenerated_tokens_with_prompt = model.generate(input_ids=input_ids.to(device),max_length=max_output_tokens)# Decodegenerated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True)# Strip the promptgenerated_text_answer = generated_text_with_prompt[0][len(text):]return generated_text_answer

2.6 运行模型并对比输出结果

test_question = test_dataset[0]["question"]
generated_answer = inference(test_question, model, tokenizer)
print(test_question)
print(generated_answer)

输出如下:

Can Lamini generate technical documentation or user manuals for software projects?
Yes, Lamini can generate technical documentation or user manuals for software projects. This can be achieved by providing a prompt for a specific technical question or question to the LLM Engine, or by providing a prompt for a specific technical question or question. Additionally, Lamini can be trained on specific technical questions or questions to help users understand the process and provide feedback to the LLM Engine. Additionally, Lamini
answer = test_dataset[0]["answer"]
print(answer)

输出如下:

Yes, Lamini can generate technical documentation and user manuals for software projects. It uses natural language generation techniques to create clear and concise documentation that is easy to understand for both technical and non-technical users. This can save developers a significant amount of time and effort in creating documentation, allowing them to focus on other aspects of their projects.

进行评估

exact_match = is_exact_match(generated_answer, answer)
print(exact_match)

输出如下:

False

2.7 在整个数据集上运行

n = 10
metrics = {'exact_matches': []}
predictions = []
for i, item in tqdm(enumerate(test_dataset)):print("i Evaluating: " + str(item))question = item['question']answer = item['answer']try:predicted_answer = inference(question, model, tokenizer)except:continuepredictions.append([predicted_answer, answer])#fixed: exact_match = is_exact_match(generated_answer, answer)exact_match = is_exact_match(predicted_answer, answer)metrics['exact_matches'].append(exact_match)if i > n and n != -1:break
print('Number of exact matches: ', sum(metrics['exact_matches']))

输出如下:

0it [00:00, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Can Lamini generate technical documentation or user manuals for software projects?', 'answer': 'Yes, Lamini can generate technical documentation and user manuals for software projects. It uses natural language generation techniques to create clear and concise documentation that is easy to understand for both technical and non-technical users. This can save developers a significant amount of time and effort in creating documentation, allowing them to focus on other aspects of their projects.', 'input_ids': [5804, 418, 4988, 74, 6635, 7681, 10097, 390, 2608, 11595, 84, 323, 3694, 6493, 32, 4374, 13, 418, 4988, 74, 476, 6635, 7681, 10097, 285, 2608, 11595, 84, 323, 3694, 6493, 15, 733, 4648, 3626, 3448, 5978, 5609, 281, 2794, 2590, 285, 44003, 10097, 326, 310, 3477, 281, 2096, 323, 1097, 7681, 285, 1327, 14, 48746, 4212, 15, 831, 476, 5321, 12259, 247, 1534, 2408, 273, 673, 285, 3434, 275, 6153, 10097, 13, 6941, 731, 281, 2770, 327, 643, 7794, 273, 616, 6493, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [5804, 418, 4988, 74, 6635, 7681, 10097, 390, 2608, 11595, 84, 323, 3694, 6493, 32, 4374, 13, 418, 4988, 74, 476, 6635, 7681, 10097, 285, 2608, 11595, 84, 323, 3694, 6493, 15, 733, 4648, 3626, 3448, 5978, 5609, 281, 2794, 2590, 285, 44003, 10097, 326, 310, 3477, 281, 2096, 323, 1097, 7681, 285, 1327, 14, 48746, 4212, 15, 831, 476, 5321, 12259, 247, 1534, 2408, 273, 673, 285, 3434, 275, 6153, 10097, 13, 6941, 731, 281, 2770, 327, 643, 7794, 273, 616, 6493, 15]}
1it [00:00,  1.47it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'How do I include my API key in the Authorization HTTP header?', 'answer': 'The Authorization HTTP header should include the API key in the following format: Authorization: Bearer <YOUR-KEY-HERE>.', 'input_ids': [2347, 513, 309, 2486, 619, 8990, 2234, 275, 253, 10360, 1320, 17607, 10478, 32, 510, 10360, 1320, 17607, 10478, 943, 2486, 253, 8990, 2234, 275, 253, 1563, 5981, 27, 10360, 1320, 27, 2325, 12287, 654, 58, 11862, 14, 13888, 14, 41, 8147, 13208], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2347, 513, 309, 2486, 619, 8990, 2234, 275, 253, 10360, 1320, 17607, 10478, 32, 510, 10360, 1320, 17607, 10478, 943, 2486, 253, 8990, 2234, 275, 253, 1563, 5981, 27, 10360, 1320, 27, 2325, 12287, 654, 58, 11862, 14, 13888, 14, 41, 8147, 13208]}
2it [00:01,  1.58it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': "Is there a section explaining the code's approach to handling versioning and compatibility?", 'answer': 'Yes, the code includes a version parameter in the FeedbackOperation class constructor, which allows for handling versioning and compatibility.', 'input_ids': [2513, 627, 247, 2593, 15571, 253, 2127, 434, 2746, 281, 10885, 2715, 272, 285, 22862, 32, 4374, 13, 253, 2127, 3797, 247, 2715, 4764, 275, 253, 34600, 2135, 17547, 966, 16757, 13, 534, 4483, 323, 10885, 2715, 272, 285, 22862, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2513, 627, 247, 2593, 15571, 253, 2127, 434, 2746, 281, 10885, 2715, 272, 285, 22862, 32, 4374, 13, 253, 2127, 3797, 247, 2715, 4764, 275, 253, 34600, 2135, 17547, 966, 16757, 13, 534, 4483, 323, 10885, 2715, 272, 285, 22862, 15]}
3it [00:01,  1.62it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Is there a community or support forum available for Lamini users?', 'answer': 'Yes, there is a community forum available for Lamini users. The Lamini community forum can be accessed through the Lamini website and provides a platform for users to ask questions, share ideas, and collaborate with other developers using the library. Additionally, the Lamini team is active on the forum and provides support and guidance to users as needed.', 'input_ids': [2513, 627, 247, 3114, 390, 1329, 12209, 2130, 323, 418, 4988, 74, 4212, 32, 4374, 13, 627, 310, 247, 3114, 12209, 2130, 323, 418, 4988, 74, 4212, 15, 380, 418, 4988, 74, 3114, 12209, 476, 320, 19197, 949, 253, 418, 4988, 74, 4422, 285, 3400, 247, 5147, 323, 4212, 281, 1642, 3533, 13, 3894, 5697, 13, 285, 42124, 342, 643, 12259, 970, 253, 6335, 15, 9157, 13, 253, 418, 4988, 74, 2285, 310, 3939, 327, 253, 12209, 285, 3400, 1329, 285, 12925, 281, 4212, 347, 3058, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2513, 627, 247, 3114, 390, 1329, 12209, 2130, 323, 418, 4988, 74, 4212, 32, 4374, 13, 627, 310, 247, 3114, 12209, 2130, 323, 418, 4988, 74, 4212, 15, 380, 418, 4988, 74, 3114, 12209, 476, 320, 19197, 949, 253, 418, 4988, 74, 4422, 285, 3400, 247, 5147, 323, 4212, 281, 1642, 3533, 13, 3894, 5697, 13, 285, 42124, 342, 643, 12259, 970, 253, 6335, 15, 9157, 13, 253, 418, 4988, 74, 2285, 310, 3939, 327, 253, 12209, 285, 3400, 1329, 285, 12925, 281, 4212, 347, 3058, 15]}
4it [00:02,  1.64it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Can the Lamini library be utilized for text completion or auto-completion tasks, such as filling in missing words or predicting the next word in a sentence?', 'answer': 'The Lamini library is not specifically designed for text completion or auto-completion tasks. However, it can be used for language modeling and generating text based on a given prompt.', 'input_ids': [5804, 253, 418, 4988, 74, 6335, 320, 12845, 323, 2505, 12240, 390, 6753, 14, 45634, 8892, 13, 824, 347, 12868, 275, 5816, 3000, 390, 21565, 253, 1735, 3159, 275, 247, 6197, 32, 510, 418, 4988, 74, 6335, 310, 417, 5742, 4158, 323, 2505, 12240, 390, 6753, 14, 45634, 8892, 15, 1723, 13, 352, 476, 320, 908, 323, 3448, 14053, 285, 11365, 2505, 1754, 327, 247, 1677, 8959, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [5804, 253, 418, 4988, 74, 6335, 320, 12845, 323, 2505, 12240, 390, 6753, 14, 45634, 8892, 13, 824, 347, 12868, 275, 5816, 3000, 390, 21565, 253, 1735, 3159, 275, 247, 6197, 32, 510, 418, 4988, 74, 6335, 310, 417, 5742, 4158, 323, 2505, 12240, 390, 6753, 14, 45634, 8892, 15, 1723, 13, 352, 476, 320, 908, 323, 3448, 14053, 285, 11365, 2505, 1754, 327, 247, 1677, 8959, 15]}
5it [00:02,  1.77it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Are there any costs associated with using Lamini for machine learning tasks, and how does the pricing structure work?', 'answer': 'Lamini offers both free and paid plans for using their machine learning services. The free plan includes limited access to their models and data generator, while the paid plans offer more advanced features and higher usage limits. The pricing structure is based on a pay-as-you-go model, where users are charged based on the number of API requests and data processed. Lamini also offers custom enterprise plans for larger organizations with specific needs.', 'input_ids': [6723, 627, 667, 4815, 2330, 342, 970, 418, 4988, 74, 323, 5145, 4715, 8892, 13, 285, 849, 1057, 253, 20910, 2605, 789, 32, 45, 4988, 74, 6131, 1097, 1959, 285, 5087, 5827, 323, 970, 616, 5145, 4715, 3238, 15, 380, 1959, 2098, 3797, 3710, 2289, 281, 616, 3210, 285, 941, 14156, 13, 1223, 253, 5087, 5827, 3959, 625, 7269, 3386, 285, 2169, 10393, 7787, 15, 380, 20910, 2605, 310, 1754, 327, 247, 2075, 14, 284, 14, 5658, 14, 2184, 1566, 13, 835, 4212, 403, 6636, 1754, 327, 253, 1180, 273, 8990, 9762, 285, 941, 11742, 15, 418, 4988, 74, 671, 6131, 2840, 16100, 5827, 323, 4067, 8889, 342, 2173, 3198, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [6723, 627, 667, 4815, 2330, 342, 970, 418, 4988, 74, 323, 5145, 4715, 8892, 13, 285, 849, 1057, 253, 20910, 2605, 789, 32, 45, 4988, 74, 6131, 1097, 1959, 285, 5087, 5827, 323, 970, 616, 5145, 4715, 3238, 15, 380, 1959, 2098, 3797, 3710, 2289, 281, 616, 3210, 285, 941, 14156, 13, 1223, 253, 5087, 5827, 3959, 625, 7269, 3386, 285, 2169, 10393, 7787, 15, 380, 20910, 2605, 310, 1754, 327, 247, 2075, 14, 284, 14, 5658, 14, 2184, 1566, 13, 835, 4212, 403, 6636, 1754, 327, 253, 1180, 273, 8990, 9762, 285, 941, 11742, 15, 418, 4988, 74, 671, 6131, 2840, 16100, 5827, 323, 4067, 8889, 342, 2173, 3198, 15]}
6it [00:03,  1.80it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'How do I instantiate the LLM engine using the Lamini Python package?', 'answer': 'You can instantiate the LLM engine using the llama module in the Lamini Python package. To do this, you need to import the LLM engine from the llama module, like this: from llama import LLM.', 'input_ids': [2347, 513, 309, 8164, 4513, 253, 21708, 46, 3948, 970, 253, 418, 4988, 74, 13814, 5522, 32, 1394, 476, 8164, 4513, 253, 21708, 46, 3948, 970, 253, 26198, 2902, 6333, 275, 253, 418, 4988, 74, 13814, 5522, 15, 1916, 513, 436, 13, 368, 878, 281, 1395, 253, 21708, 46, 3948, 432, 253, 26198, 2902, 6333, 13, 751, 436, 27, 432, 26198, 2902, 1395, 21708, 46, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2347, 513, 309, 8164, 4513, 253, 21708, 46, 3948, 970, 253, 418, 4988, 74, 13814, 5522, 32, 1394, 476, 8164, 4513, 253, 21708, 46, 3948, 970, 253, 26198, 2902, 6333, 275, 253, 418, 4988, 74, 13814, 5522, 15, 1916, 513, 436, 13, 368, 878, 281, 1395, 253, 21708, 46, 3948, 432, 253, 26198, 2902, 6333, 13, 751, 436, 27, 432, 26198, 2902, 1395, 21708, 46, 15]}
7it [00:04,  1.77it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Does Lamini provide any mechanisms for model compression or optimization to reduce memory footprint?', 'answer': 'Yes, Lamini provides mechanisms for model compression and optimization to reduce memory footprint. These include techniques such as pruning, quantization, and distillation, which can significantly reduce the size of the model while maintaining its performance. Additionally, Lamini offers support for deploying customized LLMs on edge devices with limited resources, such as mobile phones or IoT devices, through techniques such as model quantization and on-device inference.', 'input_ids': [10795, 418, 4988, 74, 2085, 667, 6297, 323, 1566, 13800, 390, 13757, 281, 4796, 3541, 33257, 32, 4374, 13, 418, 4988, 74, 3400, 6297, 323, 1566, 13800, 285, 13757, 281, 4796, 3541, 33257, 15, 2053, 2486, 5609, 824, 347, 819, 25004, 13, 36643, 13, 285, 940, 21755, 13, 534, 476, 3012, 4796, 253, 1979, 273, 253, 1566, 1223, 11850, 697, 3045, 15, 9157, 13, 418, 4988, 74, 6131, 1329, 323, 45021, 32176, 21708, 12822, 327, 5024, 4095, 342, 3710, 5300, 13, 824, 347, 6109, 15169, 390, 37377, 4095, 13, 949, 5609, 824, 347, 1566, 36643, 285, 327, 14, 10933, 17032, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [10795, 418, 4988, 74, 2085, 667, 6297, 323, 1566, 13800, 390, 13757, 281, 4796, 3541, 33257, 32, 4374, 13, 418, 4988, 74, 3400, 6297, 323, 1566, 13800, 285, 13757, 281, 4796, 3541, 33257, 15, 2053, 2486, 5609, 824, 347, 819, 25004, 13, 36643, 13, 285, 940, 21755, 13, 534, 476, 3012, 4796, 253, 1979, 273, 253, 1566, 1223, 11850, 697, 3045, 15, 9157, 13, 418, 4988, 74, 6131, 1329, 323, 45021, 32176, 21708, 12822, 327, 5024, 4095, 342, 3710, 5300, 13, 824, 347, 6109, 15169, 390, 37377, 4095, 13, 949, 5609, 824, 347, 1566, 36643, 285, 327, 14, 10933, 17032, 15]}
8it [00:04,  1.76it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'How does the performance of LLMs trained using Lamini compare to models fine-tuned with traditional approaches?', 'answer': 'According to the information provided, Lamini allows developers to train high-performing LLMs on large datasets with just a few lines of code from the Lamini library. The optimizations in this library reach far beyond what’s available to developers now, from more challenging optimizations like RLHF to simpler ones like reducing hallucinations. While there is no direct comparison to traditional approaches mentioned, Lamini aims to make training LLMs faster and more accessible to a wider range of developers.', 'input_ids': [2347, 1057, 253, 3045, 273, 21708, 12822, 10166, 970, 418, 4988, 74, 7277, 281, 3210, 4030, 14, 85, 37437, 342, 5899, 7274, 32, 7130, 281, 253, 1491, 2530, 13, 418, 4988, 74, 4483, 12259, 281, 6194, 1029, 14, 468, 14692, 21708, 12822, 327, 1781, 15302, 342, 816, 247, 1643, 3104, 273, 2127, 432, 253, 418, 4988, 74, 6335, 15, 380, 5556, 5904, 275, 436, 6335, 3986, 2080, 4457, 752, 457, 84, 2130, 281, 12259, 1024, 13, 432, 625, 11132, 5556, 5904, 751, 40228, 21996, 281, 19554, 4394, 751, 8493, 33092, 7097, 15, 3900, 627, 310, 642, 1480, 5301, 281, 5899, 7274, 5393, 13, 418, 4988, 74, 13698, 281, 1056, 3733, 21708, 12822, 7938, 285, 625, 12482, 281, 247, 14200, 2491, 273, 12259, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2347, 1057, 253, 3045, 273, 21708, 12822, 10166, 970, 418, 4988, 74, 7277, 281, 3210, 4030, 14, 85, 37437, 342, 5899, 7274, 32, 7130, 281, 253, 1491, 2530, 13, 418, 4988, 74, 4483, 12259, 281, 6194, 1029, 14, 468, 14692, 21708, 12822, 327, 1781, 15302, 342, 816, 247, 1643, 3104, 273, 2127, 432, 253, 418, 4988, 74, 6335, 15, 380, 5556, 5904, 275, 436, 6335, 3986, 2080, 4457, 752, 457, 84, 2130, 281, 12259, 1024, 13, 432, 625, 11132, 5556, 5904, 751, 40228, 21996, 281, 19554, 4394, 751, 8493, 33092, 7097, 15, 3900, 627, 310, 642, 1480, 5301, 281, 5899, 7274, 5393, 13, 418, 4988, 74, 13698, 281, 1056, 3733, 21708, 12822, 7938, 285, 625, 12482, 281, 247, 14200, 2491, 273, 12259, 15]}
9it [00:05,  1.79it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Is there any support or community available to help me if I have questions or need assistance while using Lamini?', 'answer': 'Yes, there is a support community available to assist you with any questions or issues you may have while using Lamini. You can join the Lamini Discord server or reach out to the Lamini team directly for assistance.', 'input_ids': [2513, 627, 667, 1329, 390, 3114, 2130, 281, 1361, 479, 604, 309, 452, 3533, 390, 878, 8385, 1223, 970, 418, 4988, 74, 32, 4374, 13, 627, 310, 247, 1329, 3114, 2130, 281, 10073, 368, 342, 667, 3533, 390, 3374, 368, 778, 452, 1223, 970, 418, 4988, 74, 15, 1422, 476, 6604, 253, 418, 4988, 74, 15292, 636, 4771, 390, 3986, 562, 281, 253, 418, 4988, 74, 2285, 3587, 323, 8385, 15], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [2513, 627, 667, 1329, 390, 3114, 2130, 281, 1361, 479, 604, 309, 452, 3533, 390, 878, 8385, 1223, 970, 418, 4988, 74, 32, 4374, 13, 627, 310, 247, 1329, 3114, 2130, 281, 10073, 368, 342, 667, 3533, 390, 3374, 368, 778, 452, 1223, 970, 418, 4988, 74, 15, 1422, 476, 6604, 253, 418, 4988, 74, 15292, 636, 4771, 390, 3986, 562, 281, 253, 418, 4988, 74, 2285, 3587, 323, 8385, 15]}
10it [00:05,  1.82it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Are there any code samples illustrating how to implement custom logging handlers?', 'answer': 'Yes, the Python logging module documentation provides several examples of how to implement custom logging handlers. You can find them in the official documentation here: https://docs.python.org/3/howto/logging-cookbook.html#developing-new-handlers', 'input_ids': [6723, 627, 667, 2127, 3530, 34805, 849, 281, 3359, 2840, 20893, 40093, 32, 4374, 13, 253, 13814, 20893, 6333, 10097, 3400, 2067, 6667, 273, 849, 281, 3359, 2840, 20893, 40093, 15, 1422, 476, 1089, 731, 275, 253, 3565, 10097, 1060, 27, 5987, 1358, 13880, 15, 16659, 15, 2061, 16, 20, 16, 5430, 936, 16, 36193, 14, 29519, 3305, 15, 2974, 4, 16714, 272, 14, 1826, 14, 4608, 10787], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [6723, 627, 667, 2127, 3530, 34805, 849, 281, 3359, 2840, 20893, 40093, 32, 4374, 13, 253, 13814, 20893, 6333, 10097, 3400, 2067, 6667, 273, 849, 281, 3359, 2840, 20893, 40093, 15, 1422, 476, 1089, 731, 275, 253, 3565, 10097, 1060, 27, 5987, 1358, 13880, 15, 16659, 15, 2061, 16, 20, 16, 5430, 936, 16, 36193, 14, 29519, 3305, 15, 2974, 4, 16714, 272, 14, 1826, 14, 4608, 10787]}
11it [00:06,  1.76it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
i Evaluating: {'question': 'Are there any code samples illustrating how to handle authentication and authorization?', 'answer': 'Yes, there is a separate section in the documentation explaining authentication, for more information visit https://lamini-ai.github.io/auth/', 'input_ids': [6723, 627, 667, 2127, 3530, 34805, 849, 281, 6016, 19676, 285, 26239, 32, 4374, 13, 627, 310, 247, 4858, 2593, 275, 253, 10097, 15571, 19676, 13, 323, 625, 1491, 4143, 5987, 1358, 77, 4988, 74, 14, 2284, 15, 7280, 15, 900, 16, 14399, 16], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [6723, 627, 667, 2127, 3530, 34805, 849, 281, 6016, 19676, 285, 26239, 32, 4374, 13, 627, 310, 247, 4858, 2593, 275, 253, 10097, 15571, 19676, 13, 323, 625, 1491, 4143, 5987, 1358, 77, 4988, 74, 14, 2284, 15, 7280, 15, 900, 16, 14399, 16]}
11it [00:06,  1.58it/s]
Number of exact matches:  0

2.8 输出并查看结果

df = pd.DataFrame(predictions, columns=["predicted_answer", "target_answer"])
print(df)

输出如下:

                                     predicted_answer  \
0   Yes, Lamini can generate technical documentati...   
1   You can use the Authorization HTTP header to g...   
2   Lamini’s LLM Engine is a LLM Engine for develo...   
3   Yes, there is a community or support forum ava...   
4   Yes, the Lamini library can be utilized for te...   
5   Lamini is designed to be flexible and scalable...   
6   You can instantiate the LLM engine using the L...   
7   Yes, Lamini provides mechanisms for compressio...   
8   The performance of LLMs trained using Lamini c...   
9   Yes, there is a support and community availabl...   
10  Yes, there are some code samples available in ...   
11  Yes, there are some code samples available tha...   target_answer  
0   Yes, Lamini can generate technical documentati...  
1   The Authorization HTTP header should include t...  
2   Yes, the code includes a version parameter in ...  
3   Yes, there is a community forum available for ...  
4   The Lamini library is not specifically designe...  
5   Lamini offers both free and paid plans for usi...  
6   You can instantiate the LLM engine using the l...  
7   Yes, Lamini provides mechanisms for model comp...  
8   According to the information provided, Lamini ...  
9   Yes, there is a support community available to...  
10  Yes, the Python logging module documentation p...  
11  Yes, there is a separate section in the docume...  

2.9 评估整体数据

evaluation_dataset_path = "lamini/lamini_docs_evaluation"
evaluation_dataset = datasets.load_dataset(evaluation_dataset_path)pd.DataFrame(evaluation_dataset)

输出如下:

train
0	{'predicted_answer': 'Yes, Lamini can generate...
1	{'predicted_answer': 'You can use the Authoriz...
2	{'predicted_answer': 'Lamini’s LLM Engine is a...
3	{'predicted_answer': 'Yes, there is a communit...
4	{'predicted_answer': 'Yes, the Lamini library ...
...	...
134	{'predicted_answer': 'Yes, Lamini has the abil...
135	{'predicted_answer': 'I wish! This documentati...
136	{'predicted_answer': 'Yes, Lamini can generate...
137	{'predicted_answer': 'Yes, the documentation h...
138	{'predicted_answer': 'Yes, you can learn to us...
139 rows × 1 columns

2.10 使用ARC benchmark评估

!python lm-evaluation-harness/main.py --model hf-causal --model_args pretrained=lamini/lamini_docs_finetuned --tasks arc_easy --device cpu

输出如下:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:- Avoid using `tokenizers` before the fork if possible- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Selected Tasks: ['arc_easy']
Using device 'cpu'
/usr/local/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.warnings.warn(
Downloading readme: 100%|██████████████████| 9.00k/9.00k [00:00<00:00, 11.6MB/s]
Downloading data files:   0%|                             | 0/3 [00:00<?, ?it/s]
Downloading data:   0%|                              | 0.00/331k [00:00<?, ?B/s]
Downloading data: 100%|██████████████████████| 331k/331k [00:00<00:00, 1.60MB/s]
Downloading data files:  33%|███████              | 1/3 [00:00<00:00,  4.82it/s]
Downloading data:   0%|                              | 0.00/346k [00:00<?, ?B/s]
Downloading data: 100%|██████████████████████| 346k/346k [00:00<00:00, 1.79MB/s]
Downloading data files:  67%|██████████████       | 2/3 [00:00<00:00,  4.98it/s]
Downloading data:   0%|                             | 0.00/86.1k [00:00<?, ?B/s]
Downloading data: 100%|█████████████████████| 86.1k/86.1k [00:00<00:00, 513kB/s]
Downloading data files: 100%|█████████████████████| 3/3 [00:00<00:00,  5.24it/s]
Extracting data files: 100%|████████████████████| 3/3 [00:00<00:00, 3100.77it/s]
Generating train split: 100%|████| 2251/2251 [00:00<00:00, 526128.63 examples/s]
Generating test split: 100%|█████| 2376/2376 [00:00<00:00, 879659.84 examples/s]
Generating validation split: 100%|█| 570/570 [00:00<00:00, 352566.48 examples/s]
Task: arc_easy; number of docs: 2376
Task: arc_easy; document 0; context prompt (starting on next line):
Question: Which is the function of the gallbladder?
Answer:
(end of prompt on previous line)
Requests: [Req_loglikelihood('Question: Which is the function of the gallbladder?\nAnswer:', ' store bile')[0]
, Req_loglikelihood('Question: Which is the function of the gallbladder?\nAnswer:', ' produce bile')[0]
, Req_loglikelihood('Question: Which is the function of the gallbladder?\nAnswer:', ' store digestive enzymes')[0]
, Req_loglikelihood('Question: Which is the function of the gallbladder?\nAnswer:', ' produce digestive enzymes')[0]
]
Running loglikelihood requests
100%|███████████████████████████████████████| 9496/9496 [03:23<00:00, 46.63it/s]
{"results": {"arc_easy": {"acc": 0.3122895622895623,"acc_stderr": 0.009509325983631455,"acc_norm": 0.3135521885521885,"acc_norm_stderr": 0.009519779157242258}},"versions": {"arc_easy": 0},"config": {"model": "hf-causal","model_args": "pretrained=lamini/lamini_docs_finetuned","num_fewshot": 0,"batch_size": null,"batch_sizes": [],"device": "cpu","no_cache": false,"limit": null,"bootstrap_iters": 100000,"description_dict": {}}
}
hf-causal (pretrained=lamini/lamini_docs_finetuned), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|  Task  |Version| Metric |Value |   |Stderr|
|--------|------:|--------|-----:|---|-----:|
|arc_easy|      0|acc     |0.3123|±  |0.0095|
|        |       |acc_norm|0.3136|±  |0.0095|

3. 总结

整体看,模型的评估目前没有一个统一的指标或者标准,场景适配大于统一标准,其次是介绍了几种适配的评估方法,可以借鉴参考。


http://www.ppmy.cn/devtools/118711.html

相关文章

Python/大数据/机器识别毕业设计选题题目推荐

基于Python和Diango在线购物商城系统报告文档指导搭建视频 基于深度学习的人脸识别与管理系统&#xff0c;Python实现 基于Python/机器学习链家网新房数据可视化及预测系统 Python豆瓣电影情感分析推荐系统爬虫可视化&#xff0c;过滤算法 基于python的django框架生鲜商城管…

【系统架构设计师】专题:系统质量属性和架构评估

更多内容请见: 备考系统架构设计师-核心总结目录 文章目录 一、质量属性概念1、开发期质量属性2、运行期质量属性二、面向架构评估的质量属性三、质量属性场景描述四、系统架构评估1、系统架构评估中的重要概念2、系统架构评估方法(1)`软件架构分析方法(Software Architecture…

【Qt】Qt中的窗口坐标 信号与槽

Qt中的窗口坐标 && 信号与槽 1. Qt中的窗口坐标2. 信号与槽的概述3. 信号和槽的使用3.1 connect函数的使用3.2 查看内置信号和槽3.2 connect的参数类型不匹配问题 4. 自定义信号 && 自定义槽4.1 自定义槽4.2 自定义信号 5. 带参数的信号和槽6. 信号与槽的关联方…

跨境电商平台内部功能与工具

前言 跨境电商平台的内部功能与工具是支撑其运营的核心&#xff0c;涵盖了从市场分析、产品管理到客户服务、物流跟踪等多个方面。以下具体介绍一些关键的内部功能与工具&#xff1a; 一、选品与市场分析工具 Google关键词规划师&#xff1a;这是一款强大的关键词研究工具&a…

在CentOS 7上配置NFS服务与Kali客户端的挂载及定时备份

在CentOS 7上配置NFS服务与Kali客户端的挂载及定时备份 网络文件系统&#xff08;NFS&#xff09;是一种协议&#xff0c;允许在不同计算机之间共享文件和目录。本文将逐步指导您如何在CentOS 7上搭建NFS服务器&#xff0c;如何在Kali客户端上挂载共享目录&#xff0c;并实现定…

前端面试题14 | Http1和Http2有什么区别?

哈喽小伙伴们大家好&#xff0c;今天来分享一道面试常考的题目&#xff0c;Http1和Http2的区别 HTTP/1 和 HTTP/2 是两种不同版本的 HTTP 协议&#xff0c;它们在性能、效率和功能上有一些显著的区别。以下是 HTTP/1 和 HTTP/2 的主要区别&#xff1a; 多路复用&#xff1a; H…

Linux---文件io

1.系统调用 由操作系统实现并提供给外部应用程序的编程接口。(Application Programming Interface&#xff0c;API)。是应用程序同系统之间数据交互的桥梁。 C标准函数和系统函数调用关系。一个helloworld如何打印到屏幕。 man手册中一共有九卷&#xff0c;其中一卷就有讲到系…

使用API有效率地管理Dynadot域名,设置域名服务器(NS)

前言 Dynadot是通过ICANN认证的域名注册商&#xff0c;自2002年成立以来&#xff0c;服务于全球108个国家和地区的客户&#xff0c;为数以万计的客户提供简洁&#xff0c;优惠&#xff0c;安全的域名注册以及管理服务。 Dynadot平台操作教程索引&#xff08;包括域名邮箱&…