大模型开发和微调工具Llama-Factory--＞推理与评估

推理

LLaMA-Factory 支持多种推理方式。

您可以使用 llamafactory-cli chat inference_config.yaml 或 llamafactory-cli webchat inference_config.yaml 进行推理与模型对话。

对话时配置文件只需指定原始模型 model_name_or_path 和 template ，并根据是否是微调模型指定 adapter_name_or_path 和 finetuning_type。

如果您希望向模型输入大量数据集并记录推理输出，您可以使用 llamafactory-cli train inference_config.yaml 使用数据集或 llamafactory-cli api 使用 api 进行批量推理。

Note：
使用任何方式推理时，模型 model_name_or_path 需要存在且与 template 相对应。

1.原始模型推理配置

对于原始模型推理， inference_config.yaml 中只需指定原始模型 model_name_or_path 和 template 即可。

### examples/inference/llama3.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3

2.微调模型推理配置

对于微调模型推理，除原始模型和模板外，还需要指定适配器路径 adapter_name_or_path 和微调类型 finetuning_type。

### examples/inference/llama3_lora_sft.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora

3.多模态模型

对于多模态模型，如下

llamafactory-cli webchat examples/inferece/llava1_5.yaml

examples/inference/llava1_5.yaml 的配置示例如下：

model_name_or_path: llava-hf/llava-1.5-7b-hf
template: vicuna
visual_inputs: true

4.vllm 推理框架

若使用vllm推理框架，请在配置中指定： infer_backend 与 vllm_enforce_eager

### examples/inference/llama3_vllm.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
template: llama3
infer_backend: vllm
vllm_enforce_eager: true

5.批量推理

1.数据集

使用数据集批量推理时，您需要指定模型、适配器（可选）、评估数据集、输出路径等信息并且指定 do_predict 为 true。

下面提供一个示例,您可以通过 llamafactory-cli train examples/train_lora/llama3_lora_predict.yaml 使用数据集进行批量推理。

如果您需要多卡推理，则需要在配置文件中指定 deepspeed 参数。

# examples/train_lora/llama3_lora_predict.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sftdeepspeed: examples/deepspeed/ds_z3_config.yaml # deepspeed配置文件### method
stage: sft
do_predict: true
finetuning_type: lora### dataset
eval_dataset: identity,alpaca_en_demo
template: llama3
cutoff_len: 1024
max_samples: 50
overwrite_cache: true
preprocessing_num_workers: 16### output
output_dir: saves/llama3-8b/lora/predict
overwrite_output_dir: true### eval
per_device_eval_batch_size: 1
predict_with_generate: true
ddp_timeout: 180000000

只有 stage 为 sft 的时候才可设置 predict_with_generate 为 true

2.api

如果您需要使用 api 进行批量推理，您只需指定模型、适配器（可选）、模板、微调方式等信息。

下面是一个配置文件的示例：

# examples/inference/llama3_lora_sft.yaml
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft
template: llama3
finetuning_type: lora

下面是一个启动并调用 api 服务的示例：

您可以使用 API_PORT=8000 CUDA_VISIBLE_DEVICES=0 llamafactory-cli api examples/inference/llama3_lora_sft.yaml 启动 api 服务并运行以下示例程序进行调用：

# api_call_example.py
from openai import OpenAI
client = OpenAI(api_key="0",base_url="http://0.0.0.0:8000/v1")
messages = [{"role": "user", "content": "Who are you?"}]
result = client.chat.completions.create(messages=messages, model="meta-llama/Meta-Llama-3-8B-Instruct")
print(result.choices[0].message)

评估

在完成模型训练后，您可以通过 llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml 来评估模型效果。

配置示例文件 examples/train_lora/llama3_lora_eval.yaml 具体如下：

### examples/train_lora/llama3_lora_eval.yaml
### model
model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
adapter_name_or_path: saves/llama3-8b/lora/sft # 可选项### method
finetuning_type: lora### dataset
task: mmlu_test
template: fewshot
lang: en
n_shot: 5### output
save_dir: saves/llama3-8b/lora/eval### eval
batch_size: 4

在批量推理的过程中，模型的 BLEU 和 ROUGE 分数会被自动计算并保存，您也可以通过此方法评估模型。

下面是相关参数的介绍:

参数名称	类型	介绍
task	str	评估任务的名称，可选项有 mmlu_test, ceval_validation, cmmlu_test
task_dir	str	包含评估数据集的文件夹路径，默认值为 `evaluation`。
batch_size	int	每个GPU使用的批量大小，默认值为 `4`。
seed	int	用于数据加载器的随机种子，默认值为 `42`。
lang	str	评估使用的语言，可选值为 `en`、 `zh`。默认值为 `en`。
n_shot	int	few-shot 的示例数量，默认值为 `5`。
save_dir	str	保存评估结果的路径，默认值为 `None`。如果该路径已经存在则会抛出错误。
download_mode	str	评估数据集的下载模式，默认值为 `DownloadMode.REUSE_DATASET_IF_EXISTS`。如果数据集已经存在则重复使用，否则则下载。