PaddleOCR训练模型转换推理模型

运行环境准备

Windows和Mac用户推荐使用Anaconda搭建Python环境，Linux用户建议使用docker搭建Python环境。

推荐环境：

PaddlePaddle >= 2.1.2
Python 3.7
CUDA10.1 / CUDA10.2
CUDNN 7.6

1.1 安装PaddlePaddle

如果您没有基础的Python运行环境，请参考运行环境准备。

pip3 install --upgrade pip# 如果您的机器安装的是CUDA9或CUDA10，请运行以下命令安装
python3 -m pip install paddlepaddle-gpu==2.0.0 -i https://mirror.baidu.com/pypi/simple# 如果您的机器是CPU，请运行以下命令安装
python3 -m pip install paddlepaddle==2.0.0 -i https://mirror.baidu.com/pypi/simple# 更多的版本需求，请参照[安装文档](https://www.paddlepaddle.org.cn/install/quick)中的说明进行操作。

更多的版本需求，请参照飞桨官网安装文档中的说明进行操作。

1.2 安装PaddleOCR whl包

pip install paddleocr

对于Windows环境用户：直接通过pip安装的shapely库可能出现[winRrror 126] 找不到指定模块的问题。建议从这里下载shapely安装包完成安装。

1.3 克隆PaddleOCR repo代码

#【推荐】
git clone https://github.com/PaddlePaddle/PaddleOCR# 如果因为网络问题无法pull成功，也可选择使用码云上的托管：git clone https://gitee.com/paddlepaddle/PaddleOCR# 注：码云托管代码可能无法实时同步本github项目更新，存在3~5天延时，请优先使用推荐方式。

安装第三方库

cd PaddleOCR
pip3 install -r requirements.txt

注意，windows环境下，建议从这里下载shapely安装包完成安装，直接通过pip安装的shapely库可能出现[winRrror 126] 找不到指定模块的问题。

模型转换

这里以液晶屏读数识别这个垂直领域的导出模型为例，

模型下载地址：《动手学OCR》、《OCR产业范例20讲》电子书、OCR垂类模型、PDF2Word软件以及其他学习大礼包领取链接：百度网盘 PaddleOCR 开源大礼包，提取码：4232

下载好的模型包含了训练过程中的教师模型和学生模型，模型会比较大，导出推理后就可以选择需要的类型

在这里插入图片描述

检测模型转换

# -c 后面设置训练算法的yml配置文件
# -o 配置可选参数
# Global.pretrained_model 参数设置待转换的训练模型地址，不用添加文件后缀 .pdmodel，.pdopt或.pdparams。
# Global.load_static_weights 参数需要设置为 False。
# Global.save_inference_dir参数设置转换的模型将保存的地址。python .\tools\export_model.py -c .\configs\det\ch_PP-OCRv3\ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=.\output\ch_PP-OCR_v3_det_finetune\best_accuracy Global.load_static_weights=False Global.save_inference_dir=.\inference\ch_PP-OCR_v3_det_finetune

*（注意相对路径填写正确,“./ ”表示同级目录，“…/”表示上级目录，“…/…/”表示上上级目录，以当前路径为主）*

转inference模型时，使用的配置文件和训练时使用的配置文件相同（参考液晶屏读数识别，用到的配置文件为configs\det\ch_PP-OCRv3\ch_PP-OCRv3_det_cml.yml）。另外，还需要设置配置文件中的Global.pretrained_model参数，其指向训练中保存的模型参数文件。转换成功后，在模型保存目录下有三个文件：

inference/ch_PP-OCR_v3_det_finetune ├── Student├── inference.pdiparams         # 检测inference模型的参数文件├── inference.pdiparams.info    # 检测inference模型的参数信息，可忽略└── inference.pdmodel           # 检测inference模型的program文件├── Student2├── inference.pdiparams         # 检测inference模型的参数文件├── inference.pdiparams.info    # 检测inference模型的参数信息，可忽略└── inference.pdmodel           # 检测inference模型的program文件    ├── Teacher├── inference.pdiparams         # 检测inference模型的参数文件├── inference.pdiparams.info    # 检测inference模型的参数信息，可忽略└── inference.pdmodel           # 检测inference模型的program文件

三个文件内容是一样的，但是大小不同，使用的训练策略也不同，具体请参考液晶屏读数识别，保存生成后的文件

在这里插入图片描述

如果生成后的inference.pdmodel文件大小如下，这种模型是无法使用的，推理过程中会出现异常退出，

解决办法：见注意事项1

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

文本识别模型转换

文本模型的转换同检测模型一样，指定配置文件和训练文件即可，

# -c 后面设置训练算法的yml配置文件
# -o 配置可选参数
# Global.pretrained_model 参数设置待转换的训练模型地址，不用添加文件后缀 .pdmodel，.pdopt或.pdparams。
# Global.load_static_weights 参数需要设置为 False。
# Global.save_inference_dir参数设置转换的模型将保存的地址。python .\tools\export_model.py -c .\configs\rec\PP-OCRv3\ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=.\output\ch_PP-OCR_v3_rec\best_accuracy Global.load_static_weights=False Global.save_inference_dir=.\inference\ch_PP-OCR_v3_rec

文字识别使用的配置文件和训练时使用的配置文件相同（参考液晶屏读数识别，用到的配置文件为configs\rec\PP-OCRv3\ch_PP-OCRv3_rec_distillation.yml）。转换成功后，在模型保存目录下有两个文件：

inference/ch_PP-OCR_v3_det_finetune ├── Student├── inference.pdiparams         # 检测inference模型的参数文件├── inference.pdiparams.info    # 检测inference模型的参数信息，可忽略└── inference.pdmodel           # 检测inference模型的program文件├── Teacher├── inference.pdiparams         # 检测inference模型的参数文件├── inference.pdiparams.info    # 检测inference模型的参数信息，可忽略└── inference.pdmodel           # 检测inference模型的program文件

测试模型

检测模型

python tools/infer/predict_det.py --image_dir=./doc/imgs_en/img_10.jpg --det_model_dir=./inference/ch_PP-OCR_v3_det_finetune/Student

输出结果

[2024/11/14 10:09:12] ppocr INFO: img_10.jpg    [[[38.0, 89.0], [201.0, 73.0], [204.0, 100.0], [40.0, 116.0]], [[34.0, 56.0], [202.0, 50.0], [203.0, 77.0], [35.0, 82.0]], [[27.0, 17.0], [248.0, 25.0], [247.0, 53.0], [26.0, 45.0]]]
[2024/11/14 10:09:12] ppocr INFO: 0 The predict time of ./doc/imgs_en/img_10.jpg: 0.5007777214050293
[2024/11/14 10:09:12] ppocr INFO: The visualized image saved in ./inference_results\det_res_img_10.jpg

识别模型

python .\tools\infer\predict_rec.py --image_dir=./doc/imgs_en/img623.jpg --rec_model_dir=./inference/ch_PP-OCR_v3_rec/Teacher

输出结果

[2024/11/14 10:08:15] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2024/11/14 10:08:15] ppocr INFO: Predicts of ./doc/imgs_en/img623.jpg:('④', 0.46987295150756836)

联合测试

python .\tools\infer\predict_system.py --det_model_dir=./inference/ch_PP-OCR_v3_det_finetune/Teacher --image_dir=./doc/imgs_en/img_10.jpg --rec_model_dir=./inference/ch_PP-OCR_v3_rec/Teacher

输出结果

[2024/11/14 10:16:27] ppocr DEBUG: The visualized image saved in ./inference_results\img_10.jpg
[2024/11/14 10:16:27] ppocr INFO: The predict total time is 17.56923508644104img_10.jpg	[{"transcription": "Plegse loweryour yolum", "points": [[35, 21], [247, 25], [246, 48], [34, 44]]}, {"transcription": "wnenvoupas![在这里插入图片描述](https://i-blog.csdnimg.cn/direct/405c2e3b14c54fd18d938d8c51848407.png)
sb)", "points": [[40, 57], [201, 57], [201, 75], [40, 75]]}, {"transcription": "residentialoreos", "points": [[41, 89], [200, 78], [202, 99], [42, 110]]}]

在这里插入图片描述