如何在OpenCV中运行自定义OCR模型

server/2024/12/18 13:39:56/

我们首先介绍如何获取自定义OCR模型,然后介绍如何转换自己的OCR模型以便能够被opencv_dnn模块正确运行,最后我们将提供一些预先训练的模型。

训练你自己的 OCR 模型

此存储库是训练您自己的 OCR 模型的良好起点。在存储库中,MJSynth+SynthText 默认设置为训练集。此外,您可以配置所需的模型结构和数据集。

将 OCR 模型转换为 ONNX 格式并在 OpenCV DNN 中使用它

完成模型训练后,请使用transform_to_onnx.py将模型转换为onnx格式。

在网络摄像头中执行

源码:

'''Text detection model: https://github.com/argman/EASTDownload link: https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1CRNN Text recognition model taken from here: https://github.com/meijieru/crnn.pytorchHow to convert from pb to onnx:Using classes from here: https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.pyMore converted onnx text recognition models can be downloaded directly here:Download link: https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharingAnd these models taken from here:https://github.com/clovaai/deep-text-recognition-benchmarkimport torchfrom models.crnn import CRNNmodel = CRNN(32, 1, 37, 256)model.load_state_dict(torch.load('crnn.pth'))dummy_input = torch.randn(1, 1, 32, 100)torch.onnx.export(model, dummy_input, "crnn.onnx", verbose=True)
'''# Import required modules
import numpy as np
import cv2 as cv
import math
import argparse############ Add argument parser for command line arguments ############
parser = argparse.ArgumentParser(description="Use this script to run TensorFlow implementation (https://github.com/argman/EAST) of ""EAST: An Efficient and Accurate Scene Text Detector (https://arxiv.org/abs/1704.03155v2)""The OCR model can be obtained from converting the pretrained CRNN model to .onnx format from the github repository https://github.com/meijieru/crnn.pytorch""Or you can download trained OCR model directly from https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing")
parser.add_argument('--input',help='Path to input image or video file. Skip this argument to capture frames from a camera.')
parser.add_argument('--model', '-m', required=True,help='Path to a binary .pb file contains trained detector network.')
parser.add_argument('--ocr', default="crnn.onnx",help="Path to a binary .pb or .onnx file contains trained recognition network", )
parser.add_argument('--width', type=int, default=320,help='Preprocess input image by resizing to a specific width. It should be multiple by 32.')
parser.add_argument('--height', type=int, default=320,help='Preprocess input image by resizing to a specific height. It should be multiple by 32.')
parser.add_argument('--thr', type=float, default=0.5,help='Confidence threshold.')
parser.add_argument('--nms', type=float, default=0.4,help='Non-maximum suppression threshold.')
args = parser.parse_args()############ Utility functions ############def fourPointsTransform(frame, vertices):vertices = np.asarray(vertices)outputSize = (100, 32)targetVertices = np.array([[0, outputSize[1] - 1],[0, 0],[outputSize[0] - 1, 0],[outputSize[0] - 1, outputSize[1] - 1]], dtype="float32")rotationMatrix = cv.getPerspectiveTransform(vertices, targetVertices)result = cv.warpPerspective(frame, rotationMatrix, outputSize)return resultdef decodeText(scores):text = ""alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"for i in range(scores.shape[0]):c = np.argmax(scores[i][0])if c != 0:text += alphabet[c - 1]else:text += '-'# adjacent same letters as well as background text must be removed to get the final outputchar_list = []for i in range(len(text)):if text[i] != '-' and (not (i > 0 and text[i] == text[i - 1])):char_list.append(text[i])return ''.join(char_list)def decodeBoundingBoxes(scores, geometry, scoreThresh):detections = []confidences = []############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############assert len(scores.shape) == 4, "Incorrect dimensions of scores"assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"assert scores.shape[0] == 1, "Invalid dimensions of scores"assert geometry.shape[0] == 1, "Invalid dimensions of geometry"assert scores.shape[1] == 1, "Invalid dimensions of scores"assert geometry.shape[1] == 5, "Invalid dimensions of geometry"assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"height = scores.shape[2]width = scores.shape[3]for y in range(0, height):# Extract data from scoresscoresData = scores[0][0][y]x0_data = geometry[0][0][y]x1_data = geometry[0][1][y]x2_data = geometry[0][2][y]x3_data = geometry[0][3][y]anglesData = geometry[0][4][y]for x in range(0, width):score = scoresData[x]# If score is lower than threshold score, move to next xif (score < scoreThresh):continue# Calculate offsetoffsetX = x * 4.0offsetY = y * 4.0angle = anglesData[x]# Calculate cos and sin of anglecosA = math.cos(angle)sinA = math.sin(angle)h = x0_data[x] + x2_data[x]w = x1_data[x] + x3_data[x]# Calculate offsetoffset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])# Find points for rectanglep1 = (-sinA * h + offset[0], -cosA * h + offset[1])p3 = (-cosA * w + offset[0], sinA * w + offset[1])center = (0.5 * (p1[0] + p3[0]), 0.5 * (p1[1] + p3[1]))detections.append((center, (w, h), -1 * angle * 180.0 / math.pi))confidences.append(float(score))# Return detections and confidencesreturn [detections, confidences]def main():# Read and store argumentsconfThreshold = args.thrnmsThreshold = args.nmsinpWidth = args.widthinpHeight = args.heightmodelDetector = args.modelmodelRecognition = args.ocr# Load networkdetector = cv.dnn.readNet(modelDetector)recognizer = cv.dnn.readNet(modelRecognition)# Create a new named windowkWinName = "EAST: An Efficient and Accurate Scene Text Detector"cv.namedWindow(kWinName, cv.WINDOW_NORMAL)outNames = []outNames.append("feature_fusion/Conv_7/Sigmoid")outNames.append("feature_fusion/concat_3")# Open a video file or an image file or a camera streamcap = cv.VideoCapture(args.input if args.input else 0)tickmeter = cv.TickMeter()while cv.waitKey(1) < 0:# Read framehasFrame, frame = cap.read()if not hasFrame:cv.waitKey()break# Get frame height and widthheight_ = frame.shape[0]width_ = frame.shape[1]rW = width_ / float(inpWidth)rH = height_ / float(inpHeight)# Create a 4D blob from frame.blob = cv.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)# Run the detection modeldetector.setInput(blob)tickmeter.start()outs = detector.forward(outNames)tickmeter.stop()# Get scores and geometryscores = outs[0]geometry = outs[1][boxes, confidences] = decodeBoundingBoxes(scores, geometry, confThreshold)# Apply NMSindices = cv.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)for i in indices:# get 4 corners of the rotated rectvertices = cv.boxPoints(boxes[i])# scale the bounding box coordinates based on the respective ratiosfor j in range(4):vertices[j][0] *= rWvertices[j][1] *= rH# get cropped image using perspective transformif modelRecognition:cropped = fourPointsTransform(frame, vertices)cropped = cv.cvtColor(cropped, cv.COLOR_BGR2GRAY)# Create a 4D blob from cropped imageblob = cv.dnn.blobFromImage(cropped, size=(100, 32), mean=127.5, scalefactor=1 / 127.5)recognizer.setInput(blob)# Run the recognition modeltickmeter.start()result = recognizer.forward()tickmeter.stop()# decode the result into textwordRecognized = decodeText(result)cv.putText(frame, wordRecognized, (int(vertices[1][0]), int(vertices[1][1])), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 0))for j in range(4):p1 = (int(vertices[j][0]), int(vertices[j][1]))p2 = (int(vertices[(j + 1) % 4][0]), int(vertices[(j + 1) % 4][1]))cv.line(frame, p1, p2, (0, 255, 0), 1)# Put efficiency informationlabel = 'Inference time: %.2f ms' % (tickmeter.getTimeMilli())cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))# Display the framecv.imshow(kWinName, frame)tickmeter.reset()if __name__ == "__main__":main()
$ text_detection -m=[path_to_text_detect_model] -ocr=[path_to_text_recognition_model]

提供预先训练的 ONNX 模型

一些预先训练的模型可以在https://drive.google.com/drive/folders/1cTbQ3nuZG-EKWak6emD_s8_hHXWz7lAr?usp=sharing找到。

下表显示了它们在不同文本识别数据集上的表现:

文本识别模型的性能是在OpenCV DNN上测试的,不包括文本检测模型。

选型建议

文本识别模型的输入是文本检测模型的输出,这导致文本检测的性能极大地影响着文本识别的性能。

DenseNet_CTC 的参数最小,FPS 最好,适合边缘设备,对计算成本非常敏感。如果你的计算资源有限,又想达到更好的准确率,VGG_CTC 是个不错的选择。

CRNN_VGG_BiLSTM_CTC适用于对识别准确率要求较高的场景。


http://www.ppmy.cn/server/151184.html

相关文章

java 选择排序,涵盖工作原理、算法分析、实现细节、优缺点以及一些实际应用场景

选择排序的详细解析 更深入地探讨选择排序的各个方面&#xff0c;包括其工作原理、算法分析、实现细节、优缺点以及一些实际应用场景。 动画演示 1. 基本概念 选择排序是一种简单的比较排序算法。它的核心思想是将数组分为两个部分&#xff1a;已排序部分和未排序部分。每…

贪心算法(二)

目录 一、最长递增子序列 二、递增的三元子序列 三、最长连续递增序列 四、买卖股票的最佳时机 五、买卖股票的最佳时机II 一、最长递增子序列 最长递增子序列 拿到这道题&#xff0c;我们最先想到的就是用动态规划的方法去解决它。使用动态规划的方法&#xff0c;我们只…

使用 ffmpeg 给视频批量加图片水印

背景 事情是这样的……前两天突然接到 leader 给的一个任务&#xff1a;给视频加上图片 logo 水印。我这种剪映老司机当然迷之一笑了哈哈哈哈哈&#xff0c;沉浸在简单的任务中还没反应过来巴掌就如洪水般涌来&#xff0c;因为 leader 给了几十个视频……作为一个计算机人&…

20241217使用M6000显卡在WIN10下跑whisper来识别中英文字幕

20241217使用M6000显卡在WIN10下跑whisper来识别中英文字幕 2024/12/17 17:21 缘起&#xff0c;最近需要识别法国电影《地下铁》的法语字幕&#xff0c;使用 字幕小工具V1.2【whisper套壳/GUI封装了】 无效。 那就是直接使用最原始的whisper来干了。 当你重装WIN10的时候&#…

前后端分离的项目使用nginx 解决 Invalid CORS request

我是这样打算的&#xff0c;前端用nginx代理&#xff0c;使用80 转443 端口走https 前端的地址就是http://yumbo.top 或https://yumbo.top 后端服务地址是&#xff1a;http://yumbo.top:8081 下面是我的完整配置&#xff0c;功能是正常的&#xff0c;加了注释 user nginx; …

蓝桥杯数列求值(2019试题C)

【问题描述】 给定数列1,1,1,3,5,7,17……从第4项开始&#xff0c;每项都是前3项的和。求第20190324项的最后4位数字。 【答案提交】 这是一道结果填空题&#xff0c;考生只需要计算出结果并提交即可。本题的结果为一个4位整数&#xff08;提示:答案的千位不为0&#xff09;&a…

华为ensp--BGP路径选择-Preferred Value

学习新思想&#xff0c;争做新青年。今天学习的是BGP路径选择-Preferred Value 实验目的 理解BGP路由信息首选值&#xff08;Preferred Value&#xff09;的作用 掌握修改Preferred Value属性的方法 掌握通过修改Preferred Value属性来实现流量分担的方法 实验拓扑 实验要求…

高效数据集成:钉钉与企业系统无缝对接

钉钉数据集成案例分享&#xff1a;鸿巢基础资料-供应商账号(删除操作) 在企业信息化管理中&#xff0c;数据的准确性和及时性至关重要。本文将聚焦于一个具体的系统对接集成案例——钉钉数据集成到钉钉&#xff0c;详细探讨如何通过轻易云数据集成平台实现“鸿巢基础资料-供应…