swift自定义数据集微调Qwen-7B大模型，转换模型后使用ollama跑起来

我详细介绍了swift如何进行微调，但数据集均来自魔搭社区，如何想训练自定义数据集，实际上也很简单。

一、自定义数据集微调

export MKL_THREADING_LAYER=GNU \ 
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
swift sft --model_type qwen2-7b-instruct \--model_id_or_path /root/.cache/modelscope/hub/qwen/Qwen2-7B-Instruct \--sft_type lora \--dtype AUTO \--dataset AI-ModelScope/alpaca-gpt4-data-zh#200 \--self_cognition_sample 3000 \--model_name 阿盛 Master Coder \--model_author 盛世芳华 LLM_ROME \--num_train_epochs 1 \--lora_rank 8 \--lora_alpha 32 \--lora_dropout_p 0.05 \--lora_target_modules ALL \--gradient_checkpointing true \--batch_size 1 \--weight_decay 0.1 \--learning_rate 1e-4 \--gradient_accumulation_steps 16 \--output_dir output

微调时，只需指定--dataset为本地csv文件路径即可，csv文件的格式如下：

instruction是问题，input大概能理解为问题背景，output为答案，数据集准备好以后就可以直接进行训练。

参考：ms-swift/docs/source/LLM/自定义与拓展.md at main · modelscope/ms-swift (github.com)

前文在训练时很慢，原因是就用了一张卡，多卡训练时一定要记着加上：

export MKL_THREADING_LAYER=GNU \ 
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4

二、推理

CUDA_VISIBLE_DEVICES=0,1,2,3 swift infer --ckpt_dir output/qwen2-7b-instruct/v2-20240826-101129/checkpoint-48

常识用CLI推理，问问它：“贝贝是谁”，可以正确回答出我的答案，说明微调生效。

三、合并Lora

CUDA_VISIBLE_DEVICES=0,1,2,3 swift export --ckpt_dir output/qwen2-7b-instruct/v2-20240826-101129/checkpoint-48 --merge_lora true

执行后，生成./output/qwen2-7b-instruct/v2-20240826-101129/checkpoint-48-merged目录

包含了模型文件。

四、安装ollama

玩大模型的朋友都认识ollama，它的好处不言而喻，想要把合并后的模型跑在ollama中，就需要将模型转换为ollama需要的模型格式。

curl -fsSL https://ollama.com/install.sh | sh

安装成功后，启动ollama服务

ollama serve

五、下载ollama、llm源码

之所以需要下载ollama的源码，是因为需要源码编译来进行模型的转换。

git clone https://github.com/ollama/ollama.git
cd ollama
git submodule init
git submodule update llm/llama.cpp

5.1安装环境依赖

python -m venv llm/llama.cpp/.venv
source llm/llama.cpp/.venv/bin/activate
pip install -r llm/llama.cpp/requirements.txt

5.2构建量化工具

cd llm/llama.cpp/
make

可参考：llama.cpp/docs/build.md at 1e6f6554aa11fa10160a5fda689e736c3c34169f · ggerganov/llama.cpp (github.com)

编译成功后，会在目录下看到很多工具：

如果编译时报错，请先安装：

apt install cmake
apt install ccache

六、模型转化

用convert-hf-to-gguf.py 转换模型：

python ../ollama/llm/llama.cpp/convert_hf_to_gguf.py output/qwen2-7b-instruct/v2-20240826-101129/checkpoint-48-merged --outtype f16 --outfile converted.bin

其中output/qwen2-7b-instruct/v2-20240826-101129/checkpoint-48-merged就是lora合并后的文件夹路径，--outtype f16是不损失精度，--outfile converted.bin是转换后的文件名。

结束后，得到了converted.bin文件，大小14.2G

七、模型量化

一个7b的模型，12.4G还是有点大，使用模型量化工具进行量化，这里我使用4比特量化。

../ollama/llm/llama.cpp/llama-quantize converted.bin quantized.bin q4_0

最终得到quantized.bin，文件大小4.1G。

八、构建ollama包

ollama包可以理解为Dockfile，创建Modelfile文件，文件内容：

FROM quantized.bin# set the temperature to 0.7 [higher is more creative, lower is more coherent]
PARAMETER temperature 0.7
PARAMETER top_p 0.8
PARAMETER repeat_penalty 1.05
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>"""
# set the system message
SYSTEM """
You are a helpful assistant.
"""

最关键的就是第一句：FROM quantized.bin（文件名一定要和你量化后的文件名对的上）