YOLOv5、YOLOv8改进:Decoupled Head解耦头

news/2025/2/10 12:50:17/

目录

 

1.Decoupled Head介绍

 2.Yolov5加入Decoupled_Detect

2.1 DecoupledHead加入common.py中:

2.2 Decoupled_Detect加入yolo.py中:

2.3修改yolov5s_decoupled.yaml 


1.Decoupled Head介绍

Decoupled Head是一种图像分割任务中常用的网络结构,用于提取图像特征并预测每个像素的类别。传统的图像分割网络通常将特征提取和像素预测过程集成在同一个网络中,而Decoupled Head则将这两个过程进行解耦,分别处理。

Decoupled Head的核心思想是通过引入额外的分支网络来进行像素级的预测。这个分支网络通常被称为“头”(head),因此得名Decoupled Head。具体而言,Decoupled Head网络在主干网络的特征图上添加一个或多个额外的分支,用于预测像素的类别。

Decoupled Head的优势在于可以更好地处理不同尺度和精细度的语义信息。通过将像素级的预测与特征提取分开,可以更好地利用底层和高层特征之间的语义信息,从而提高分割的准确性和细节保留能力。

Decoupled Head的优点:

  1. 分离特征提取和像素预测:Decoupled Head将特征提取和像素级预测分离开来,使得网络可以更加灵活地处理不同尺度和语义信息。

  2. 多尺度特征融合:通过在主干网络的不同层级添加分支,Decoupled Head可以融合来自不同尺度的特征信息,从而提高对多尺度目标的分割能力。

  3. 更好的像素级预测:由于Decoupled Head将像素级的预测作为独立的任务进行处理,可以更好地保留细节和边缘信息,提高分割的精确性。

  4. 可扩展性:Decoupled Head结构可以根据需要进行扩展和修改,例如添加更多的分支或调整分支的结构,以适应不同的任务和数据集需求。

 YOLOv6 采用了解耦检测头(Decoupled Head)结构,同时综合考虑到相关算子表征能力和硬件上计算开销这两者的平衡,采用 Hybrid Channels 策略重新设计了一个更高效的解耦头结构,在维持精度的同时降低了延时,缓解了解耦头中 3x3 卷积带来的额外延时开销。

原始 YOLOv5 的检测头是通过分类和回归分支融合共享的方式来实现的,因此加入 Decoupled Head。

为什么要用到解耦头?

因为分类和定位的关注点不同;
分类更关注目标的纹理内容;
定位更关注目标的边缘信息

 2.Yolov5加入Decoupled_Detect

2.1 DecoupledHead加入common.py中:

#======================= 解耦头=============================#
class DecoupledHead(nn.Module):def __init__(self, ch=256, nc=80,  anchors=()):super().__init__()self.nc = nc  # number of classesself.nl = len(anchors)  # number of detection layersself.na = len(anchors[0]) // 2  # number of anchorsself.merge = Conv(ch, 256 , 1, 1)self.cls_convs1 = Conv(256 , 256 , 3, 1, 1)self.cls_convs2 = Conv(256 , 256 , 3, 1, 1)self.reg_convs1 = Conv(256 , 256 , 3, 1, 1)self.reg_convs2 = Conv(256 , 256 , 3, 1, 1)self.cls_preds = nn.Conv2d(256 , self.nc * self.na, 1) # 一个1x1的卷积,把通道数变成类别数,比如coco 80类(主要对目标框的类别,预测分数)self.reg_preds = nn.Conv2d(256 , 4 * self.na, 1)       # 一个1x1的卷积,把通道数变成4通道,因为位置是xywhself.obj_preds = nn.Conv2d(256 , 1 * self.na, 1)       # 一个1x1的卷积,把通道数变成1通道,通过一个值即可判断有无目标(置信度)def forward(self, x):x = self.merge(x)x1 = self.cls_convs1(x)x1 = self.cls_convs2(x1)x1 = self.cls_preds(x1)x2 = self.reg_convs1(x)x2 = self.reg_convs2(x2)x21 = self.reg_preds(x2)x22 = self.obj_preds(x2)out = torch.cat([x21, x22, x1], 1) # 把分类和回归结果按channel维度,即dim=1拼接return outclass Decoupled_Detect(nn.Module):stride = None  # strides computed during buildonnx_dynamic = False  # ONNX export parameterexport = False  # export modedef __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layersuper().__init__()self.nc = nc  # number of classesself.no = nc + 5  # number of outputs per anchorself.nl = len(anchors)  # number of detection layersself.na = len(anchors[0]) // 2  # number of anchorsself.grid = [torch.zeros(1)] * self.nl  # init gridself.anchor_grid = [torch.zeros(1)] * self.nl  # init anchor gridself.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)self.m = nn.ModuleList(DecoupledHead(x, nc, anchors) for x in ch)self.inplace = inplace  # use in-place ops (e.g. slice assignment)def forward(self, x):z = []  # inference outputfor i in range(self.nl):x[i] = self.m[i](x[i])  # convbs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()if not self.training:  # inferenceif self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)y = x[i].sigmoid()if self.inplace:y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xyy[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # whelse:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  # torch 1.8.0xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xywh = (wh * 2) ** 2 * self.anchor_grid[i]  # why = torch.cat((xy, wh, conf), 4)z.append(y.view(bs, -1, self.no))return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)def _make_grid(self, nx=20, ny=20, i=0):d = self.anchors[i].devicet = self.anchors[i].dtypeshape = 1, self.na, ny, nx, 2  # grid shapey, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)if check_version(torch.__version__, '1.10.0'):  # torch>=1.10.0 meshgrid workaround for torch>=0.7 compatibilityyv, xv = torch.meshgrid(y, x, indexing='ij')else:yv, xv = torch.meshgrid(y, x)grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)return grid, anchor_grid

2.2 Decoupled_Detect加入yolo.py中:

class BaseModel(nn.Module):

    def _apply(self, fn):# Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffersself = super()._apply(fn)m = self.model[-1]  # Detect()if isinstance(m, (Detect, Segment,Decoupled_Detect)):m.stride = fn(m.stride)m.grid = list(map(fn, m.grid))if isinstance(m.anchor_grid, list):m.anchor_grid = list(map(fn, m.anchor_grid))return self

class DetectionModel(BaseModel):

    def _initialize_dh_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency# https://arxiv.org/abs/1708.02002 section 3.3# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.m = self.model[-1]  # Detect() modulefor mi, s in zip(m.m, m.stride):  # from# reg_bias = mi.reg_preds.bias.view(m.na, -1).detach()# reg_bias += math.log(8 / (640 / s) ** 2)# mi.reg_preds.bias = torch.nn.Parameter(reg_bias.view(-1), requires_grad=True)# cls_bias = mi.cls_preds.bias.view(m.na, -1).detach()# cls_bias += math.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum())  # cls# mi.cls_preds.bias = torch.nn.Parameter(cls_bias.view(-1), requires_grad=True)b = mi.b3.bias.view(m.na, -1)b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)mi.b3.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)b = mi.c3.bias.datab += math.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum())  # clsmi.c3.bias = torch.nn.Parameter(b, requires_grad=True)
  if isinstance(m, (Detect, Segment,ASFF_Detect)):s = 256  # 2x min stridem.inplace = self.inplaceforward = lambda x: self.forward(x)[0] if isinstance(m, Segment) else self.forward(x)m.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s))])  # forwardcheck_anchor_order(m)m.anchors /= m.stride.view(-1, 1, 1)self.stride = m.strideself._initialize_biases()  # only run onceelif isinstance(m, Decoupled_Detect):s = 256  # 2x min stridem.inplace = self.inplacem.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])  # forwardcheck_anchor_order(m)  # must be in pixel-space (not grid-space)m.anchors /= m.stride.view(-1, 1, 1)self.stride = m.strideself._initialize_dh_biases()  # only run once

def parse_model(d, ch):  # model_dict, input_channels(3)

        elif m in {Detect, Segment,Decoupled_Detect}:args.append([ch[x] for x in f])if isinstance(args[1], int):  # number of anchorsargs[1] = [list(range(args[1] * 2))] * len(f)if m is Segment:args[3] = make_divisible(args[3] * gw, 8)

2.3修改yolov5s_decoupled.yaml 

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license# Parameters
nc: 1  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:- [10,13, 16,30, 33,23]  # P3/8- [30,61, 62,45, 59,119]  # P4/16- [116,90, 156,198, 373,326]  # P5/32# YOLOv5 v6.0 backbone
backbone:# [from, number, module, args][[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2[-1, 1, Conv, [128, 3, 2]],  # 1-P2/4[-1, 3, C3, [128]],[-1, 1, Conv, [256, 3, 2]],  # 3-P3/8[-1, 6, C3, [256]],[-1, 1, Conv, [512, 3, 2]],  # 5-P4/16[-1, 9, C3, [512]],[-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32[-1, 3, C3, [1024]],[-1, 1, SPPF, [1024, 5]],  # 9]# YOLOv5 v6.0 head
head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]],  # cat backbone P4[-1, 3, C3, [512, False]],  # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]],  # cat backbone P3[-1, 3, C3, [256, False]],  # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]],  # cat head P4[-1, 3, C3, [512, False]],  # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]],  # cat head P5[-1, 3, C3, [1024, False]],  # 23 (P5/32-large)[[17, 20, 23], 1, Decoupled_Detect, [nc, anchors]],  # Detect(P3, P4, P5),解耦]


http://www.ppmy.cn/news/1125650.html

相关文章

【数据结构-图】并查集

💝💝💝欢迎来到我的博客,很高兴能够在这里和您见面!希望您在这里可以感受到一份轻松愉快的氛围,不仅可以获得有趣的内容和知识,也可以畅所欲言、分享您的想法和见解。 推荐:kuan 的首页,持续学…

影视行业应该如何利用软文进行宣传?媒介盒子告诉你

近年来有不少电影出圈,引起热烈讨论,不管是《燃冬》,还是《奥本海默》,一部电影的出圈,除了内容外,还需要营销。营销是影视行业中必不可少的一环,传统的宣传方式是海报、电视广告、预告片&#…

手机上比较好用的笔记软件使用哪一款?

手机已经成为我们日常生活不可或缺的一部分,它们伴随着我们的方方面面。在这部小小的设备中,我们可以完成许多任务,其中之一就是记录笔记。手机上的笔记软件如今多种多样,但在选择时,敬业签可能是你不容错过的选择。 …

力扣-225.用队列实现栈

Method 1 使用两个队列&#xff1a;一个主队列&#xff0c;一个辅助队列 首先将元素入队q2&#xff0c;然后将q1的元素全部出队&#xff0c;再入队到q2.再将q1和q2互换&#xff0c;则q1中的元素即为栈内元素 AC Code class MyStack { public:queue<int> q1;queue<int…

【CNN-FPGA开源项目解析】卷积层03--单格乘加运算单元PE 单窗口卷积块CU 模块

03–单格乘加运算单元PE & 单窗口卷积块CU 文章目录 03--单格乘加运算单元PE & 单窗口卷积块CU前言单格乘加运算单元PE代码模块结构时序逻辑分析对其上层模块CU的要求 单窗口卷积块CU代码逻辑分析 前言 ​ 第一和第二篇日志已经详细阐述了"半精度浮点数"的加…

去掉实体类String类型字段的值前后空格

import org.junit.Test;Testpublic void testTrimBean(){StrDemo demo new StrDemo();demo.setName(" shuangping.yang\t ");demo.setGender(" 男 ");demo.setAge(20);System.out.println(demo.toString());trimBean(demo);System.out.println("实体…

微服务架构改造案例

最后一个部分&#xff0c;结合我们自己的财务共享平台项目进行了微服务架构改造案例分析。 对于改造前的应用&#xff0c;实际上存在四个方面的问题。 其一是关于高可用性方面的&#xff0c;即传统的单体应用我们在进行数据库水平扩展的时候已经很难扩展&#xff0c;已经出现…

一、VXLAN基础

VXLAN 1、VXLAN2、VXLAN解决的问题总结 3、VXLAN网络架构3.1、VXLAN网络内互访3.2、VXLAN网络内互访&#xff08;集中式网关&#xff09;3.3、VXLAN网络内互访&#xff08;分布式网关&#xff09; 4、与VLAN对比5、Underlay网络和Overlay网络的组合6、VXLAN报文封装格式6.1、VX…