如何在OpenCV中运行自定义OCR模型

devtools/2025/2/19 16:23:46/

我们首先介绍如何获取自定义OCR模型,然后介绍如何转换自己的OCR模型以便能够被opencv_dnn模块正确运行,最后我们将提供一些预先训练的模型。

训练你自己的 OCR 模型

此存储库是训练您自己的 OCR 模型的良好起点。在存储库中,MJSynth+SynthText 默认设置为训练集。此外,您可以配置所需的模型结构和数据集。

将 OCR 模型转换为 ONNX 格式并在 OpenCV DNN 中使用它

完成模型训练后,请使用transform_to_onnx.py将模型转换为onnx格式。

在网络摄像头中执行

源码:

'''Text detection model: https://github.com/argman/EASTDownload link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1CRNN Text recognition model taken from here: https://github.com/meijieru/crnn.pytorchHow to convert from pb to onnx:Using classes from here: https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.pyMore converted onnx text recognition models can be downloaded directly here:Download link: https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharingAnd these models taken from here:https://github.com/clovaai/deep-text-recognition-benchmarkimport torchfrom models.crnn import CRNNmodel = CRNN(32, 1, 37, 256)model.load_state_dict(torch.load('crnn.pth'))dummy_input = torch.randn(1, 1, 32, 100)torch.onnx.export(model, dummy_input, "crnn.onnx", verbose=True)
'''# Import required modules
import numpy as np
import cv2 as cv
import math
import argparse############ Add argument parser for command line arguments ############
parser = argparse.ArgumentParser(description="Use this script to run TensorFlow implementation (https://github.com/argman/EAST) of ""EAST: An Efficient and Accurate Scene Text Detector (https://arxiv.org/abs/1704.03155v2)""The OCR model can be obtained from converting the pretrained CRNN model to .onnx format from the github repository https://github.com/meijieru/crnn.pytorch""Or you can download trained OCR model directly from https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing")
parser.add_argument('--input',help='Path to input image or video file. Skip this argument to capture frames from a camera.')
parser.add_argument('--model', '-m', required=True,help='Path to a binary .pb file contains trained detector network.')
parser.add_argument('--ocr', default="crnn.onnx",help="Path to a binary .pb or .onnx file contains trained recognition network", )
parser.add_argument('--width', type=int, default=320,help='Preprocess input image by resizing to a specific width. It should be multiple by 32.')
parser.add_argument('--height', type=int, default=320,help='Preprocess input image by resizing to a specific height. It should be multiple by 32.')
parser.add_argument('--thr', type=float, default=0.5,help='Confidence threshold.')
parser.add_argument('--nms', type=float, default=0.4,help='Non-maximum suppression threshold.')
args = parser.parse_args()############ Utility functions ############def fourPointsTransform(frame, vertices):vertices = np.asarray(vertices)outputSize = (100, 32)targetVertices = np.array([[0, outputSize[1] - 1],[0, 0],[outputSize[0] - 1, 0],[outputSize[0] - 1, outputSize[1] - 1]], dtype="float32")rotationMatrix = cv.getPerspectiveTransform(vertices, targetVertices)result = cv.warpPerspective(frame, rotationMatrix, outputSize)return resultdef decodeText(scores):text = ""alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"for i in range(scores.shape[0]):c = np.argmax(scores[i][0])if c != 0:text += alphabet[c - 1]else:text += '-'# adjacent same letters as well as background text must be removed to get the final outputchar_list = []for i in range(len(text)):if text[i] != '-' and (not (i > 0 and text[i] == text[i - 1])):char_list.append(text[i])return ''.join(char_list)def decodeBoundingBoxes(scores, geometry, scoreThresh):detections = []confidences = []############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############assert len(scores.shape) == 4, "Incorrect dimensions of scores"assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"assert scores.shape[0] == 1, "Invalid dimensions of scores"assert geometry.shape[0] == 1, "Invalid dimensions of geometry"assert scores.shape[1] == 1, "Invalid dimensions of scores"assert geometry.shape[1] == 5, "Invalid dimensions of geometry"assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"height = scores.shape[2]width = scores.shape[3]for y in range(0, height):# Extract data from scoresscoresData = scores[0][0][y]x0_data = geometry[0][0][y]x1_data = geometry[0][1][y]x2_data = geometry[0][2][y]x3_data = geometry[0][3][y]anglesData = geometry[0][4][y]for x in range(0, width):score = scoresData[x]# If score is lower than threshold score, move to next xif (score < scoreThresh):continue# Calculate offsetoffsetX = x * 4.0offsetY = y * 4.0angle = anglesData[x]# Calculate cos and sin of anglecosA = math.cos(angle)sinA = math.sin(angle)h = x0_data[x] + x2_data[x]w = x1_data[x] + x3_data[x]# Calculate offsetoffset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])# Find points for rectanglep1 = (-sinA * h + offset[0], -cosA * h + offset[1])p3 = (-cosA * w + offset[0], sinA * w + offset[1])center = (0.5 * (p1[0] + p3[0]), 0.5 * (p1[1] + p3[1]))detections.append((center, (w, h), -1 * angle * 180.0 / math.pi))confidences.append(float(score))# Return detections and confidencesreturn [detections, confidences]def main():# Read and store argumentsconfThreshold = args.thrnmsThreshold = args.nmsinpWidth = args.widthinpHeight = args.heightmodelDetector = args.modelmodelRecognition = args.ocr# Load networkdetector = cv.dnn.readNet(modelDetector)recognizer = cv.dnn.readNet(modelRecognition)# Create a new named windowkWinName = "EAST: An Efficient and Accurate Scene Text Detector"cv.namedWindow(kWinName, cv.WINDOW_NORMAL)outNames = []outNames.append("feature_fusion/Conv_7/Sigmoid")outNames.append("feature_fusion/concat_3")# Open a video file or an image file or a camera streamcap = cv.VideoCapture(args.input if args.input else 0)tickmeter = cv.TickMeter()while cv.waitKey(1) < 0:# Read framehasFrame, frame = cap.read()if not hasFrame:cv.waitKey()break# Get frame height and widthheight_ = frame.shape[0]width_ = frame.shape[1]rW = width_ / float(inpWidth)rH = height_ / float(inpHeight)# Create a 4D blob from frame.blob = cv.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)# Run the detection modeldetector.setInput(blob)tickmeter.start()outs = detector.forward(outNames)tickmeter.stop()# Get scores and geometryscores = outs[0]geometry = outs[1][boxes, confidences] = decodeBoundingBoxes(scores, geometry, confThreshold)# Apply NMSindices = cv.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)for i in indices:# get 4 corners of the rotated rectvertices = cv.boxPoints(boxes[i])# scale the bounding box coordinates based on the respective ratiosfor j in range(4):vertices[j][0] *= rWvertices[j][1] *= rH# get cropped image using perspective transformif modelRecognition:cropped = fourPointsTransform(frame, vertices)cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)# Create a 4D blob from cropped imageblob = cv.dnn.blobFromImage(cropped, size=(100, 32), mean=127.5, scalefactor=1 / 127.5)recognizer.setInput(blob)# Run the recognition modeltickmeter.start()result = recognizer.forward()tickmeter.stop()# decode the result into textwordRecognized = decodeText(result)cv.putText(frame, wordRecognized, (int(vertices[1][0]), int(vertices[1][1])), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 0))for j in range(4):p1 = (int(vertices[j][0]), int(vertices[j][1]))p2 = (int(vertices[(j + 1) % 4][0]), int(vertices[(j + 1) % 4][1]))cv.line(frame, p1, p2, (0, 255, 0), 1)# Put efficiency informationlabel = 'Inference time: %.2f ms' % (tickmeter.getTimeMilli())cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))# Display the framecv.imshow(kWinName, frame)tickmeter.reset()if __name__ == "__main__":main()
$ text_detection -m=[path_to_text_detect_model] -ocr=[path_to_text_recognition_model]

提供预先训练的 ONNX 模型

一些预先训练的模型可以在https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing找到。

下表显示了它们在不同文本识别数据集上的表现:

文本识别模型的性能是在OpenCV DNN上测试的,不包括文本检测模型。

选型建议

文本识别模型的输入是文本检测模型的输出,这导致文本检测的性能极大地影响着文本识别的性能。

DenseNet_CTC 的参数最小,FPS 最好,适合边缘设备,对计算成本非常敏感。如果你的计算资源有限,又想达到更好的准确率,VGG_CTC 是个不错的选择。

CRNN_VGG_BiLSTM_CTC适用于对识别准确率要求较高的场景。


http://www.ppmy.cn/devtools/143280.html

相关文章

Windows安全中心(病毒和威胁防护)的注册

文章目录 Windows安全中心&#xff08;病毒和威胁防护&#xff09;的注册1. 简介2. WSC注册初探3. WSC注册原理分析4. 关于AMPPL5. 参考 Windows安全中心&#xff08;病毒和威胁防护&#xff09;的注册 本文我们来分析一下Windows安全中心&#xff08;Windows Security Center…

Vulhub:Jackson[漏洞复现]

CVE-2017-7525(Jackson反序列化) 启动漏洞环境 docker-compose up -d 阅读vulhub给出的漏洞文档 cat README.zh-cn.md # Jackson-databind 反序列化漏洞&#xff08;CVE-2017-7525&#xff09; Jackson-databind 支持 [Polymorphic Deserialization](https://github.com/Fas…

(六)Spring Cloud Alibaba 2023.x:Sentinel 流量控制与熔断限流实现

目录 前言 准备 下载sentinel控制台 项目集成 引入依赖 配置yml文件 限流控制 Sentinel注解 前言 在微服务架构中&#xff0c;流量控制组件至关重要&#xff0c;它是保障系统稳定性与高可用性的核心手段之一 。Sentinel 是面向分布式、多语言异构化服务架构的流量治理…

Axure9设置画布固定

在使用AxureRP9设计原型时&#xff0c;如果遇到画布在拖动时变得难以控制&#xff0c;可以尝试在Windows系统中通过‘文件’>‘首选项’&#xff0c;或在Mac系统中通过‘AxureRP9’>‘偏好设置’进行设置&#xff0c;以稳定画布的行为。 现象 页面底层的画布&#xff0…

类和对象 如何理解面向对象

目录 1. 面向对象的初步认知 2. 类定义和使用 3. 类的实例化 4. this引用 5. 对象的构造及初始化 6. 封装 7. static成员 8. 代码块 9. 内部类 10. 对象的打印 正文开始 1. 面向对象的初步认知 1.1 什么是面向对象 Java是一门纯面向对象的语言(Object Oriented Pro…

HarmonyOS 非线性容器LightWeightMap 常用的几个方法

LightWeightMap可用于存储具有关联关系的key-value键值对集合&#xff0c;存储元素中key值唯一&#xff0c;每个key对应一个value。 LightWeightMap依据泛型定义&#xff0c;采用轻量级结构&#xff0c;初始默认容量大小为8&#xff0c;每次扩容大小为原始容量的两倍。 集合中k…

Not using native diff for overlay2, this may cause degraded performance……

问题现象 案例&#xff1a;Anolis 8.9&#xff08;4.19.91-26.an8.x86_64&#xff09; Overlay2存储驱动程序&#xff09; 当我们安装好Docker之后&#xff0c;通过systemctl status docker -l 会发现有一个告警信息&#xff1a;levelwarning msg"Not using native dif…

【时间序列分析】距离相关系数(Distance Correction)理论及Python代码实现

文章目录 1. 距离相关系数算法介绍2. Python代码实现3. 优缺点及作用4.总结4.1 线性依赖关系4.2 非线性依赖关系4.3 单调依赖关系4.4 复杂依赖关系 1. 距离相关系数算法介绍 距离相关系数&#xff1a;研究两个变量之间的独立性&#xff0c;距离相关系数为0表示两个变量是独立的…