一、创建虚拟环境
好习惯,首先创建单独的运行环境
conda create -n uie python=3.10.9
conda activate uie
二、安装paddle框架及paddlenlp
2.1 参考官方文档安装paddle
开始使用_飞桨-源于产业实践的开源深度学习平台
首先查看自己服务器cuda版本,如下我的版本时10.2
(PyTorch-1.8) [ma-user work]$nvidia-smi
Wed Apr 19 23:35:11 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:00:0E.0 Off | 0 |
| N/A 39C P0 28W / 250W | 0MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
在Paddle官网直接复制命令即可。
2.2 安装paddlenlp
pip install --upgrade paddlenlp
2.2.1 问题一 ERROR: Failed building wheel for numpy Failed to build numpy
-x86_64-3.10/numpy/core/src/multiarray/scalartypes.o -MMD -MF build/temp.linux-x86_64-3.10/build/src.linux-x86_64-3.10/numpy/core/src/multiarray/scalartypes.o.d" failed with exit status 1[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.ERROR: Failed building wheel for numpyFailed to build numpyERROR: Could not build wheels for numpy, which is required to install pyproject.toml-based projects[end of output]note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error× pip subprocess to install backend dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.note: This error originates from a subprocess, and is likely not a problem with pip
手工安装numpy包,再次执行nlp包安装,还是不行。
pip install numpy
换另外一种方式成功
python3 -m pip install --upgrade paddlenlp -i https://mirror.baidu.com/pypi/simple
三、下载PaddleNLP源码
$git clone https://github.com/PaddlePaddle/PaddleNLP.git
四、执行训练
4.1、对标注数据进行预处理
python ../PaddleNLP/model_zoo/uie/doccano.py --doccano_file ./data.json --task_type ext --save_dir ./ --splits 0.7 0.2 0.1 --schema_lang ch
4.2、模型精调
$python ../PaddleNLP/model_zoo/uie/finetune.py --device gpu --logging_steps 10 --save_steps 100 --eval_steps 100 --seed 42 --model_name_or_path uie-base --output_dir $finetuned_model --train_path ./train.txt --dev_path ./dev.txt --max_seq_length 512 --per_device_eval_batch_size 16 --per_device_train_batch_size 16 --num_train_epochs 20 --learning_rate 1e-5 --label_names "start_positions" "end_positions" --do_train --do_eval --do_export --export_model_dir $finetuned_model --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1
出现下图及训练成功
五、模型应用
from pprint import pprint
from paddlenlp import Taskflow
schema = ['时间', '地区', '指标名']
ie = Taskflow('information_extraction', schema=schema, task_path="./checkpoint/model_best")
pprint(ie("我想查询2022年山东省主营业务收入数据"))