OpenCV 与 YoloV3的结合使用：目标实时跟踪

代码分析

YOLO%20%E6%A8%A1%E5%9E%8B%E5%8A%A0%E8%BD%BD-toc" style="margin-left:80px;">1. YOLO 模型加载

2. 视频加载与初始化

3. 视频帧处理

4. 物体检测

5. 处理检测结果

6. 边界框和类别显示

7. 帧率（FPS）计算

8. 结果显示与退出

9. 资源释放

整体代码

效果展示

总结

代码分析

这段代码使用 YOLO（You Only Look Once）模型进行视频中的物体检测，并通过 OpenCV 显示检测结果。以下是代码的详细分析：

YOLO%20%E6%A8%A1%E5%9E%8B%E5%8A%A0%E8%BD%BD">1. YOLO 模型加载

python">net = cv2.dnn.readNet('../../needFiles/yolov3.weights', '../../needFiles/yolov3.cfg')

这行代码加载了预先训练的 YOLOv3 模型的权重文件（yolov3.weights）和配置文件（yolov3.cfg）。YOLOv3 是一个实时物体检测模型，能够检测多个类别的物体。

python">layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

getLayerNames() 获取网络的所有层名称。getUnconnectedOutLayers() 返回网络输出层的索引（通常是 YOLO 的 3 个输出层），通过索引列表，获取这些输出层的名称，用于后面的 forward 方法中。

2. 视频加载与初始化

python">video_path = 'D:/Dji/DJIneo.mp4'
cap = cv2.VideoCapture(video_path)

使用 cv2.VideoCapture 来加载视频文件。如果视频路径正确，cap 将用于逐帧读取视频。

python">resize_scale = 0.3

定义缩放比例为 0.3，用于后续缩小显示尺寸，以减少计算量。

python">prev_time = 0

初始化变量 prev_time，用于计算帧率（FPS，Frames Per Second）。

3. 视频帧处理

python">while True:ret, frame = cap.read()if not ret:break

逐帧读取视频内容，cap.read() 返回两个值，ret 是布尔值表示是否成功读取，frame 是当前帧图像。如果无法读取（如视频结束），则退出循环。

python">frame_resized = cv2.resize(frame, (0, 0), fx=resize_scale, fy=resize_scale)

当前帧 frame 被缩小到原来的 30%（通过 resize_scale），用于加快后续处理。

4. 物体检测

python">blob = cv2.dnn.blobFromImage(frame_resized, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

YOLO 模型需要特定格式的输入。blobFromImage 将图像转换为 YOLO 需要的 4D blob，归一化比例为 0.00392，图像大小调整为 (416, 416)。net.setInput(blob) 将处理后的 blob 输入到网络，net.forward(output_layers) 得到检测结果。

5. 处理检测结果

python">class_ids = []
confidences = []
boxes = []

初始化三个列表：class_ids 用于存储检测到的物体类别，confidences 存储每个物体的置信度，boxes 存储边界框的坐标。

python">for out in outs:for detection in out:scores = detection[5:]class_id = np.argmax(scores)confidence = scores[class_id]if confidence > 0.5:...

遍历 YOLO 输出的 outs，每个 detection 包含检测到的一个物体的信息。检测结果中的前 4 个值是物体的位置信息，后面的值是类别的置信度。np.argmax(scores) 找出置信度最高的类别，confidence 存储该类别的置信度。如果置信度超过 0.5，则认为该物体被成功检测。

6. 边界框和类别显示

python">indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
for i in indexes.flatten():x, y, w, h = boxes[i]label = str(class_ids[i])cv2.rectangle(frame_resized, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.putText(frame_resized, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

使用非极大值抑制（NMS，Non-Maximum Suppression）去除重叠的边界框，减少冗余检测结果。然后，遍历保留下来的边界框，在图像上绘制矩形框和类别标签。

7. 帧率（FPS）计算

python">current_time = time.time()
fps = 1 / (current_time - prev_time)
prev_time = current_time
cv2.putText(frame_resized, f'FPS: {int(fps)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

通过计算两帧之间的时间差，实时计算并显示 FPS，以评估模型的运行效率。

8. 结果显示与退出

python">cv2.imshow('Object Detection', frame_resized)
if cv2.waitKey(1) & 0xFF == ord('q'):break

使用 imshow 显示检测结果，按 'q' 键退出循环。

9. 资源释放

python">cap.release()
cv2.destroyAllWindows()

释放视频资源并关闭所有窗口。

整体代码

python">import cv2
import numpy as np
import time# 加载 YOLO 模型
net = cv2.dnn.readNet('../../needFiles/yolov3.weights', '../../needFiles/yolov3.cfg')
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]  # 修正索引问题# 加载视频
video_path = 'D:/Dji/DJIneo.mp4'
cap = cv2.VideoCapture(video_path)# 缩小显示尺寸
resize_scale = 0.3# 初始化时间和帧计数器
prev_time = 0# 处理视频的每一帧
while True:ret, frame = cap.read()if not ret:break# 缩小当前帧frame_resized = cv2.resize(frame, (0, 0), fx=resize_scale, fy=resize_scale)# 检测对象blob = cv2.dnn.blobFromImage(frame_resized, 0.00392, (416, 416), (0, 0, 0), True, crop=False)net.setInput(blob)outs = net.forward(output_layers)# 处理检测结果class_ids = []confidences = []boxes = []for out in outs:for detection in out:scores = detection[5:]class_id = np.argmax(scores)confidence = scores[class_id]if confidence > 0.5:  # 置信度阈值center_x = int(detection[0] * frame_resized.shape[1])center_y = int(detection[1] * frame_resized.shape[0])w = int(detection[2] * frame_resized.shape[1])h = int(detection[3] * frame_resized.shape[0])x = int(center_x - w / 2)y = int(center_y - h / 2)boxes.append([x, y, w, h])confidences.append(float(confidence))class_ids.append(class_id)# 应用非极大抑制来去除冗余框indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)for i in indexes.flatten():  # 展平索引x, y, w, h = boxes[i]label = str(class_ids[i])cv2.rectangle(frame_resized, (x, y), (x + w, y + h), (0, 255, 0), 2)cv2.putText(frame_resized, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)# 计算 FPScurrent_time = time.time()fps = 1 / (current_time - prev_time)prev_time = current_time# 显示 FPScv2.putText(frame_resized, f'FPS: {int(fps)}', (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)# 显示结果cv2.imshow('Object Detection', frame_resized)# 按 'q' 键退出if cv2.waitKey(1) & 0xFF == ord('q'):break# 释放资源
cap.release()
cv2.destroyAllWindows()