yolo11s rknn无法detect的bugfix - step by step

devtools/2025/2/22 4:13:10/



  1. 版本匹配问题
  2. 通道和参数传递问题。


training env:

torch                        2.4.0+cpu
torchaudio                   2.4.0+cpu
torchvision                  0.19.0+cpu

ultralytics                  8.3.68      /ultralytics
ultralytics-thop             2.0.14

rknn env:

torch                    2.4.0+cpu
torchaudio               2.4.0+cpu
torchvision              0.19.0+cpu 



周日最后,我启动了针对yolov11s.pt的训练,训练的数据集是一个测试数据集moonpie.这一次,我把pt->onnx->rknn的docker处理成了唯一的一个,epoch增大到250(batch=16, imgsz=640)


result = model.train(data=r'./moonpie.yaml', epochs=250, batch=16, imgsz=640, device='cpu')


results_dict: {'metrics/precision(B)': 0.8658830071855359, 'metrics/recall(B)': 0.770949720670391, 'metrics/mAP50(B)': 0.8821807607242769, 'metrics/mAP50-95(B)': 0.6566184427590052, 'fitness': 0.6791746745555324}
save_dir: PosixPath('/app/rk3588_build/yolo_sdk/ultralytics/runs/detect/train4')
speed: {'preprocess': 0.9470678144885648, 'inference': 83.09212807686097, 'loss': 0.00011536382859753024, 'postprocess': 1.3212234743179814}
task: 'detect'

突然发现一件事,因为我周四开始的测试,是把新物体放到了第81个slot返回。难道是class_id detected的时候忘记处理它的大小了?要是的话,这个错误就太低级了。

2.1 直接做模拟环境的最终测试 

 step1. pt2onnx

from ultralytics import YOLO# Load a model
model = YOLO("/app/rk3588_build/last_moonpie_yolov11s.pt")  # load an official model
#model = YOLO(r"./best.pt")  # load a custom trained model
# Export the model

Ultralytics 8.3.68 🚀 Python-3.8.10 torch-2.4.0+cpu CPU (unknown)
YOLO11 summary (fused): 238 layers, 2,617,701 parameters, 0 gradients, 6.5 GFLOPs

PyTorch: starting from '/app/rk3588_build/last_moonpie_yolov11s.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 85, 8400) (5.3 MB)

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 0.6s, saved as '/app/rk3588_build/last_moonpie_yolov11s.onnx' (10.2 MB)

Export complete (0.9s)
Results saved to /app/rk3588_build
Predict:         yolo predict task=detect model=/app/rk3588_build/last_moonpie_yolov11s.onnx imgsz=640  
Validate:        yolo val task=detect model=/app/rk3588_build/last_moonpie_yolov11s.onnx imgsz=640 data=./moonpie.yaml  
Visualize:       https://netron.app

step2. test rknn detect:

>>>>>>>>>>>>>>>original model: /app/rk3588_build/last_moonpie_yolov11s.onnx
--> Running model
I GraphPreparing : 100%|███████████████████████████████████████| 238/238 [00:00<00:00, 19028.68it/s]
I SessionPreparing : 100%|██████████████████████████████████████| 238/238 [00:00<00:00, 5188.41it/s]
target pic has no object concerned.



    # Set inputsimg = cv2.imread(IMG_PATH)# img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))print(f'>>>>>>>>>>>>>>>original model: {MODEL1}')# Inferenceprint('--> Running model')img2 = np.expand_dims(img, 0)outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])


>>重点:把传入模型的输入输出参数打印出来,如果发现是(1,3,640,640),那么data_format是nchw模式[batch_size, channel, height, weight.]

3.1 回顾训练时的通道设置:


# Parameters

nc: 80 # number of classes

scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'

  # [depth, width, max_channels]

  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs

  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs

  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs

  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs

  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone


  # [from, repeats, module, args]

  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2


Detect - Ultralytics YOLO Docs


 3.1 Question1 onnx有没有对传入的颜色通道有限制?

 这个似乎取决于训练过程中,读取imge的颜色通道,据说:使用 Python 的 OpenCV 库读取图像默认是 BGR 顺序,而 Pillow 库读取图像是 RGB 顺序。

yolov11- patches.py

# OpenCV Multilanguage-friendly functions ------------------------------------------------------------------------------
_imshow = cv2.imshow  # copy to avoid recursion errorsdef imread(filename: str, flags: int = cv2.IMREAD_COLOR):"""Read an image from a file.Args:filename (str): Path to the file to read.flags (int, optional): Flag that can take values of cv2.IMREAD_*. Defaults to cv2.IMREAD_COLOR.Returns:(np.ndarray): The read image."""return cv2.imdecode(np.fromfile(filename, np.uint8), flags)


./ultralytics/data/loaders.py:                    # Load HEIC image using Pillow with pillow-heif
./ultralytics/data/loaders.py:                    check_requirements("pillow-heif")
./ultralytics/data/loaders.py:                    from pillow_heif import register_heif_opener 

./ultralytics/cfg/datasets/ImageNet.yaml:  721: pillow
./ultralytics/cfg/datasets/ImageNet.yaml:  n03938244: pillow
./ultralytics/cfg/datasets/lvis.yaml:  803: pillow 

./pyproject.toml:    "pillow>=7.1.2", 


                    register_heif_opener()  # Register HEIF opener with Pillowwith Image.open(path) as img:im0 = cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)  # convert image to BGR nparray


    def preprocess(self):"""Preprocesses the input image before performing inference.Returns:image_data: Preprocessed image data ready for inference."""# Read the input image using OpenCVself.img = cv2.imread(self.input_image)# Get the height and width of the input imageself.img_height, self.img_width = self.img.shape[:2]# Convert the image color space from BGR to RGBimg = cv2.cvtColor(self.img, cv2.COLOR_BGR2RGB)# Resize the image to match the input shapeimg = cv2.resize(img, (self.input_width, self.input_height))# Normalize the image data by dividing it by 255.0image_data = np.array(img) / 255.0# Transpose the image to have the channel dimension as the first dimensionimage_data = np.transpose(image_data, (2, 0, 1))  # Channel first# Expand the dimensions of the image data to match the expected input shapeimage_data = np.expand_dims(image_data, axis=0).astype(np.float32)# Return the preprocessed image datareturn image_data


3.2 更详细的解释:

1. 颜色空间转换
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
这行代码将图像从 BGR(蓝、绿、红)颜色空间转换为 RGB(红、绿、蓝)颜色空间。此时,图像数据是一个三维数组,形状为 (H, W, C),其中 H 是图像高度,W 是图像宽度,C 是通道数(这里 C = 3,因为是 RGB 图像)。2. 图像尺寸调整
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
这行代码将图像调整为指定的大小 (IMG_SIZE, IMG_SIZE)。调整后的图像仍然是三维数组,形状为 (IMG_SIZE, IMG_SIZE, 3),依旧保持 HWC 的格式。3. 增加批量维度
img2 = np.expand_dims(img, 0)
np.expand_dims 函数用于在指定的轴上增加一个维度。这里在轴 0 上增加了一个维度,使得原本形状为 (IMG_SIZE, IMG_SIZE, 3) 的三维数组变成了形状为 (1, IMG_SIZE, IMG_SIZE, 3) 的四维数组。此时,新增加的第一个维度表示批量大小(N = 1),所以现在图像数据的格式变为 NHWC。4. 推理时指定格式
outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])
这行代码调用 rknn 的推理函数,明确指定输入数据 img2 的格式为 NHWC。综上所述,经过上述一系列操作后,最终输入到 rknn.inference 函数中的数据 img2 是 NHWC 格式。




4.1 最终的侦测识别代码

#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import argparse
import os
import sys
import os.path as osp
import cv2
import torch
import numpy as np
import onnxruntime as ort
from math import expROOT = os.getcwd()
if str(ROOT) not in sys.path:sys.path.append(str(ROOT))#ONNX_MODEL = r'/app/rk3588_build/yolo_sdk/ultralytics/yolo11s.onnx'
ONNX_MODEL = r'/app/rk3588_build/yolo11_selfgen.onnx'
#ONNX_MODEL = 'yolov5s_relu.onnx'
#ONNX_MODEL= '/app/rk3588_build/last_moonpie.onnx'
#ONNX_MODEL= '/app/rk3588_build/last_moonpie_yolov11s.onnx'
#ONNX_MODEL= '/app/rk3588_build/best.onnx'
#PYTORCH_MODEL=r"/app/rk3588_build/yolo_sdk/ultralytics/best.pt" #driller model 走不通,版本太严格
RKNN_MODEL = r'/app/rk3588_build/rknn_models/sim_moonpie-640-640_rk3588.rknn'
#IMG_PATH = './frame_2266.png'
DATASET = './dataset.txt'
#IMG_PATH = './bus.jpg'
IMG_PATH = '/app/rk3588_build/cake26.jpg'
QUANTIZE_ON = FalseCLASSES = ['moonpie', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light','fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow','elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee','skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard','tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple','sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch','potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone','microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear','hair drier', 'toothbrush']meshgrid = []class_num = len(CLASSES)
headNum = 3
strides = [8, 16, 32]
mapSize = [[80, 80], [40, 40], [20, 20]]
nmsThresh = 0.45
objectThresh = 0.5input_imgH = 640
input_imgW = 640class DetectBox:def __init__(self, classId, score, xmin, ymin, xmax, ymax):self.classId = classIdself.score = scoreself.xmin = xminself.ymin = yminself.xmax = xmaxself.ymax = ymaxdef GenerateMeshgrid():for index in range(headNum):for i in range(mapSize[index][0]):for j in range(mapSize[index][1]):meshgrid.append(j + 0.5)meshgrid.append(i + 0.5)def IOU(xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2):xmin = max(xmin1, xmin2)ymin = max(ymin1, ymin2)xmax = min(xmax1, xmax2)ymax = min(ymax1, ymax2)innerWidth = xmax - xmininnerHeight = ymax - ymininnerWidth = innerWidth if innerWidth > 0 else 0innerHeight = innerHeight if innerHeight > 0 else 0innerArea = innerWidth * innerHeightarea1 = (xmax1 - xmin1) * (ymax1 - ymin1)area2 = (xmax2 - xmin2) * (ymax2 - ymin2)total = area1 + area2 - innerAreareturn innerArea / totaldef NMS(detectResult):predBoxs = []sort_detectboxs = sorted(detectResult, key=lambda x: x.score, reverse=True)for i in range(len(sort_detectboxs)):xmin1 = sort_detectboxs[i].xminymin1 = sort_detectboxs[i].yminxmax1 = sort_detectboxs[i].xmaxymax1 = sort_detectboxs[i].ymaxclassId = sort_detectboxs[i].classIdif sort_detectboxs[i].classId != -1:predBoxs.append(sort_detectboxs[i])for j in range(i + 1, len(sort_detectboxs), 1):if classId == sort_detectboxs[j].classId:xmin2 = sort_detectboxs[j].xminymin2 = sort_detectboxs[j].yminxmax2 = sort_detectboxs[j].xmaxymax2 = sort_detectboxs[j].ymaxiou = IOU(xmin1, ymin1, xmax1, ymax1, xmin2, ymin2, xmax2, ymax2)if iou > nmsThresh:sort_detectboxs[j].classId = -1return predBoxsdef sigmoid(x):return 1 / (1 + exp(-x))def postprocess(out, img_h, img_w):print('postprocess ... ')detectResult = []output = []for i in range(len(out)):print(out[i].shape)output.append(out[i].reshape((-1)))scale_h = img_h / input_imgHscale_w = img_w / input_imgWgridIndex = -2cls_index = 0cls_max = 0for index in range(headNum):reg = output[index * 2 + 0]cls = output[index * 2 + 1]for h in range(mapSize[index][0]):for w in range(mapSize[index][1]):gridIndex += 2if 1 == class_num:cls_max = sigmoid(cls[0 * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])cls_index = 0else:for cl in range(class_num):cls_val = cls[cl * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w]if 0 == cl:cls_max = cls_valcls_index = clelse:if cls_val > cls_max:cls_max = cls_valcls_index = clcls_max = sigmoid(cls_max)if cls_max > objectThresh:regdfl = []for lc in range(4):sfsum = 0locval = 0for df in range(16):temp = exp(reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w])reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] = tempsfsum += tempfor df in range(16):sfval = reg[((lc * 16) + df) * mapSize[index][0] * mapSize[index][1] + h * mapSize[index][1] + w] / sfsumlocval += sfval * dfregdfl.append(locval)x1 = (meshgrid[gridIndex + 0] - regdfl[0]) * strides[index]y1 = (meshgrid[gridIndex + 1] - regdfl[1]) * strides[index]x2 = (meshgrid[gridIndex + 0] + regdfl[2]) * strides[index]y2 = (meshgrid[gridIndex + 1] + regdfl[3]) * strides[index]xmin = x1 * scale_wymin = y1 * scale_hxmax = x2 * scale_wymax = y2 * scale_hxmin = xmin if xmin > 0 else 0ymin = ymin if ymin > 0 else 0xmax = xmax if xmax < img_w else img_wymax = ymax if ymax < img_h else img_hbox = DetectBox(cls_index, cls_max, xmin, ymin, xmax, ymax)detectResult.append(box)# NMSprint('detectResult:', len(detectResult))predBox = NMS(detectResult)return predBoxdef precess_image(img_src, resize_w, resize_h):image = cv2.resize(img_src, (resize_w, resize_h), interpolation=cv2.INTER_LINEAR)image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)image = image.astype(np.float32)image /= 255.0return imagedef detect(img_path):orig = cv2.imread(img_path)img_h, img_w = orig.shape[:2]image = precess_image(orig, input_imgW, input_imgH)image = image.transpose((2, 0, 1))image = np.expand_dims(image, axis=0)# image = np.ones((1, 3, 384, 640), dtype=np.float32)# print(image.shape)ort_session = ort.InferenceSession(ONNX_MODEL)pred_results = (ort_session.run(None, {'data': image}))out = []for i in range(len(pred_results)):out.append(pred_results[i])predbox = postprocess(out, img_h, img_w)print('obj num is :', len(predbox))for i in range(len(predbox)):xmin = int(predbox[i].xmin)ymin = int(predbox[i].ymin)xmax = int(predbox[i].xmax)ymax = int(predbox[i].ymax)classId = predbox[i].classIdscore = predbox[i].scorecv2.rectangle(orig, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)ptext = (xmin, ymin)title = CLASSES[classId] + "%.2f" % scorecv2.putText(orig, title, ptext, cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2, cv2.LINE_AA)cv2.imwrite('./test_onnx_result.jpg', orig)if __name__ == '__main__':print('This is main ....')GenerateMeshgrid()img_path = IMG_PATHdetect(img_path)

4.2 .pt2onnx的代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# 获取当前脚本文件所在目录的父目录,并构建相对路径
import os
import sys
current_dir = os.path.dirname(os.path.abspath(__file__))
project_path = os.path.join(current_dir, '..')
#based: https://docs.ultralytics.com/modes/export/#key-features-of-export-mode
from ultralytics import YOLO# Load a model
model = YOLO("/app/rk3588_build/last_moonpie_yolov11s.pt")  # load an official model
#model = YOLO("./best_moonpie.pt")  # load an official model
results = model(task='detect', source='../../cake26.jpg', save=True)  # predict on an image

4.1.1 关联修改1,修改yolov11-ultralytics源码: ./nn/head.py, 替换掉Detect.forward的代码

    def forward(self, x):#fengxh modified here. at Feb17,2025y = [] for i in range(self.nl):t1 = self.cv2[i](x[i])t2 = self.cv3[i](x[i])y.append(t1)y.append(t2)return y

4.1.2  关联修改2.修改onnx模型加载部分:./engine/model.py, 它重定义了.onnx输出模型参数:

      print("===================onnx====================")import torchdummy_input = torch.randn(1,3,640,640)input_names=['data']output_names=['reg1', 'cls1','reg2', 'cls2','reg3', 'cls3']torch.onnx.export(self.model, dummy_input, '/app/rk3588_build/yolo11_selfgen.onnx', verbose=False, input_names=input_names, output_names=output_names, opset_version=11)print("==================onnx self gened==========")

4.3 识别结果


附录A 额外提示




