当前位置: 首页 > news >正文

带读YOLOv13,HyperACE | FullPAD到底是什么

文章目录

  • 实验验证
  • 带读YOLOv13
    • HyperACE
    • FullPAD

实验验证

在VOC数据集且不使用预训练权重的前提下,YOLOv13效果较YOLOv12及YOLOv11均有所差距:在带来较大推理开销的同时,检测精度均有所下降(mAP50-95精度相似,但mAP50精度有不同程度的下降)。

测试结果(epoch:100; imagese: 640; batch: 32; 数据集:VOC):

ModelmAP50-95mAP50run time (h)params (M)interence time (ms)
YOLOv8n0.5490.7601.0513.010.2+0.3(postprocess)
YOLOv11n0.5530.7571.1422.590.2+0.3(postprocess)
YOLOv12n0.5530.7621.9652.510.4+0.2(postprocess)
YOLOv13n0.5530.7512.2632.450.5+0.2(postprocess)

带读YOLOv13

在这里插入图片描述

YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception 主要提出了以下改进:

  1. HyperACE:基于超图的自适应关系增强 | Hypergraph-based Adaptive Correlation Enhancement

    • 通过C3AH模块学习全局高阶感知信息,其主要借助自适应超图计算实现,仅具有线性复杂性。

    • 通过DS-C3k模块学习局部低阶感知信息。

  2. FullPAD:全流程“聚合-分发”策略 | Full-Pipeline Aggregation-and-Distribution Paradigm

    • 从主干网络中收集多尺度特征图,并将其传送至HyperACE,然后通过不同的FullPAD策略将增强后的特征重新分发到整个流程(Neck)的各个位置。

    • 实现了细粒度的信息流动与表示协同,显著提升了梯度传播效率并增强了检测性能。

    • NeurIPS2023 Gold-YOLO也提出了一种“聚合-分发”策略,通过收集并对齐不同层的特征信息完成聚合,然后通过简单的注意力机制将聚合信息注入到每个Level中。

HyperACE

在这里插入图片描述

  1. 对Backbone部分的B3、B5分别进行下采样、上采样操作,然后将前两者与B4进行拼接,最后使用1x1卷积进行通道调整。
class FuseModule(nn.Module):"""A module to fuse multi-scale features for the HyperACE block.This module takes a list of three feature maps from different scales, aligns them to a commonspatial resolution by downsampling the first and upsampling the third, and then concatenatesand fuses them with a convolution layer.Attributes:c_in (int): The number of channels of the input feature maps.channel_adjust (bool): Whether to adjust the channel count of the concatenated features.Methods:forward: Fuses a list of three multi-scale feature maps.Examples:>>> import torch>>> model = FuseModule(c_in=64, channel_adjust=False)>>> # Input is a list of features from different backbone stages>>> x_list = [torch.randn(2, 64, 64, 64), torch.randn(2, 64, 32, 32), torch.randn(2, 64, 16, 16)]>>> output = model(x_list)>>> print(output.shape)torch.Size([2, 64, 32, 32])"""def __init__(self, c_in, channel_adjust):super(FuseModule, self).__init__()self.downsample = nn.AvgPool2d(kernel_size=2)self.upsample = nn.Upsample(scale_factor=2, mode='nearest')if channel_adjust:self.conv_out = Conv(4 * c_in, c_in, 1)else:self.conv_out = Conv(3 * c_in, c_in, 1)def forward(self, x):x1_ds = self.downsample(x[0])x3_up = self.upsample(x[2])x_cat = torch.cat([x1_ds, x[1], x3_up], dim=1)out = self.conv_out(x_cat)return out
  1. 通过C3AH模块学习全局高阶感知信息:1)AdaHyperedgeGen生成超边矩阵;2)AdaHGConv根据超边生成超图;3)AdaHGComputation根据超图捕捉高级感知信息

    在这里插入图片描述

    2.1 AdaHyperedgeGen 生成一个自适应超边参与矩阵,通过节点特征的全局上下文(‘mean’,‘max’和‘both’)动态生成超边原型,并计算每个节点与每个超边之间的连续性参与矩阵。(以下代码包含解读)

    class AdaHyperedgeGen(nn.Module):
    """
    Generates an adaptive hyperedge participation matrix from a set of vertex features.
    """
    def __init__(self, node_dim, num_hyperedges, num_heads=4, dropout=0.1, context="both"):super().__init__()self.num_heads = num_headsself.num_hyperedges = num_hyperedgesself.head_dim = node_dim // num_headsself.context = context# 基础原型self.prototype_base = nn.Parameter(torch.Tensor(num_hyperedges, node_dim))nn.init.xavier_uniform_(self.prototype_base)# 上下文学习模块if context in ("mean", "max"):self.context_net = nn.Linear(node_dim, num_hyperedges * node_dim)  elif context == "both":self.context_net = nn.Linear(2*node_dim, num_hyperedges * node_dim)else:raise ValueError(f"Unsupported context '{context}'. ""Expected one of: 'mean', 'max', 'both'.")self.pre_head_proj = nn.Linear(node_dim, node_dim)self.dropout = nn.Dropout(dropout)self.scaling = math.sqrt(self.head_dim)def forward(self, X):B, N, D = X.shape# 学习全局上下文信息if self.context == "mean":context_cat = X.mean(dim=1)          elif self.context == "max":context_cat, _ = X.max(dim=1)          else:avg_context = X.mean(dim=1)           max_context, _ = X.max(dim=1)           context_cat = torch.cat([avg_context, max_context], dim=-1) # 通过全局上下文信息构造超边原型:初始原型 + 由上下文生成偏移量 = 动态超边原型prototype_offsets = self.context_net(context_cat).view(B, self.num_hyperedges, D)  prototypes = self.prototype_base.unsqueeze(0) + prototype_offsets           # 将节点特征和超边原型进行多头划分,为后续的点积相似度计算做准备X_proj = self.pre_head_proj(X) X_heads = X_proj.view(B, N, self.num_heads, self.head_dim).transpose(1, 2)proto_heads = prototypes.view(B, self.num_hyperedges, self.num_heads, self.head_dim).permute(0, 2, 1, 3)# 相似度计算X_heads_flat = X_heads.reshape(B * self.num_heads, N, self.head_dim)proto_heads_flat = proto_heads.reshape(B * self.num_heads, self.num_hyperedges, self.head_dim).transpose(1, 2)# 点积得到每个节点与超边之间的相似度logits = torch.bmm(X_heads_flat, proto_heads_flat) / self.scaling logits = logits.view(B, self.num_heads, N, self.num_hyperedges).mean(dim=1) logits = self.dropout(logits)  # 返回每个节点对每个超边的隶属度(softmax)return F.softmax(logits, dim=1)
    

    2.2 AdaHGConv 使用 AdaHyperedgeGen 生成自适应超边参与矩阵,执行顶点到超边(vertex-to-edge)和超边到顶点(edge-to-vertex)的特征聚合。(以下代码包含解读)

    class AdaHGConv(nn.Module):
    """
    Performs the adaptive hypergraph convolution.This module contains the two-stage message passing process of hypergraph convolution:
    1. Generates an adaptive participation matrix using AdaHyperedgeGen.
    2. Aggregates vertex features into hyperedge features (vertex-to-edge).
    3. Disseminates hyperedge features back to update vertex features (edge-to-vertex).
    A residual connection is added to the final output.
    """
    def __init__(self, embed_dim, num_hyperedges=16, num_heads=4, dropout=0.1, context="both"):super().__init__()# 超边参与矩阵self.edge_generator = AdaHyperedgeGen(embed_dim, num_hyperedges, num_heads, dropout, context)self.edge_proj = nn.Sequential(nn.Linear(embed_dim, embed_dim ),nn.GELU())self.node_proj = nn.Sequential(nn.Linear(embed_dim, embed_dim ),nn.GELU())def forward(self, X):# X.shape = [B, N, D]# A.shape = [B, N, num_hyperedges]A = self.edge_generator(X)  # Vertex to Edge# A.transpose(1, 2).shape = [B, num_hyperedges, N]# He.shape = [B, num_hyperedges, D]He = torch.bmm(A.transpose(1, 2), X) He = self.edge_proj(He)# Edge to Vertex# X_new.shape = [B, N, D]X_new = torch.bmm(A, He)  X_new = self.node_proj(X_new)return X_new + X
    

    2.3 AdaHGComputation超图计算,并返回超图结构

    class AdaHGComputation(nn.Module):
    """
    A wrapper module for applying adaptive hypergraph convolution to 4D feature maps.This class makes the hypergraph convolution compatible with standard CNN architectures. It flattens a
    4D input tensor (B, C, H, W) into a sequence of vertices (tokens), applies the AdaHGConv layer to
    model high-order correlations, and then reshapes the output back into a 4D tensor.
    """
    def __init__(self, embed_dim, num_hyperedges=16, num_heads=8, dropout=0.1, context="both"):super().__init__()self.embed_dim = embed_dimself.hgnn = AdaHGConv(embed_dim=embed_dim,num_hyperedges=num_hyperedges,num_heads=num_heads,dropout=dropout,context=context)def forward(self, x):B, C, H, W = x.shapetokens = x.flatten(2).transpose(1, 2) tokens = self.hgnn(tokens) x_out = tokens.transpose(1, 2).view(B, C, H, W)return x_out 
    

    2.4 C3AH模块

    class C3AH(nn.Module):
    """
    A CSP-style block integrating Adaptive Hypergraph Computation (C3AH).The input feature map is split into two paths.
    One path is processed by the AdaHGComputation module to model high-order correlations, while the other
    serves as a shortcut. The outputs are then concatenated to fuse features.
    """
    def __init__(self, c1, c2, e=1.0, num_hyperedges=8, context="both"):super().__init__()c_ = int(c2 * e)  assert c_ % 16 == 0, "Dimension of AdaHGComputation should be a multiple of 16."num_heads = c_ // 16self.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c1, c_, 1, 1)self.m = AdaHGComputation(embed_dim=c_, num_hyperedges=num_hyperedges, num_heads=num_heads,dropout=0.1,context=context)self.cv3 = Conv(2 * c_, c2, 1)  def forward(self, x):return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
    
  2. HyperACE模块(以下代码包含解读)

class HyperACE(nn.Module):"""Hypergraph-based Adaptive Correlation Enhancement (HyperACE)."""def __init__(self, c1, c2, n=1, num_hyperedges=8, dsc3k=True, shortcut=False, e1=0.5, e2=1, context="both", channel_adjust=True):super().__init__()self.c = int(c2 * e1) self.cv1 = Conv(c1, 3 * self.c, 1, 1)self.cv2 = Conv((4 + n) * self.c, c2, 1) # 低阶感知信息提取self.m = nn.ModuleList(DSC3k(self.c, self.c, 2, shortcut, k1=3, k2=7) if dsc3k else DSBottleneck(self.c, self.c, shortcut=shortcut) for _ in range(n))# B3、B4、B5特征融合self.fuse = FuseModule(c1, channel_adjust)# 高阶感知信息提取self.branch1 = C3AH(self.c, self.c, e2, num_hyperedges, context)self.branch2 = C3AH(self.c, self.c, e2, num_hyperedges, context)def forward(self, X):x = self.fuse(X)# 聚合特征按通道划分为三份y = list(self.cv1(x).chunk(3, 1))# 第二部分用于提取高阶感知信息out1 = self.branch1(y[1])out2 = self.branch2(y[1])# 第三部分用于提取低阶感知信息y.extend(m(y[-1]) for m in self.m)y[1] = out1y.append(out2)return self.cv2(torch.cat(y, 1))

FullPAD

FullPAD_Tunel是FullPAD的主要模块,核心是一个门控融合模块,用于融合两路特征图(通过一个可学习的门控参数自动平衡特征图之间的影响)。
output = original + gate × enhanced \text{output} = \text{original} + \text{gate} \times \text{enhanced} output=original+gate×enhanced

class FullPAD_Tunnel(nn.Module):"""A gated fusion module for the Full-Pipeline Aggregation-and-Distribution (FullPAD) paradigm.This module implements a gated residual connection used to fuse features. It takes two inputs: the originalfeature map and a correlation-enhanced feature map. It then computes `output = original + gate * enhanced`,where `gate` is a learnable scalar parameter that adaptively balances the contribution of the enhanced features.Methods:forward: Performs the gated fusion of two input feature maps.Examples:>>> import torch>>> model = FullPAD_Tunnel()>>> original_feature = torch.randn(2, 64, 32, 32)>>> enhanced_feature = torch.randn(2, 64, 32, 32)>>> output = model([original_feature, enhanced_feature])>>> print(output.shape)torch.Size([2, 64, 32, 32])"""def __init__(self):super().__init__()self.gate = nn.Parameter(torch.tensor(0.0))def forward(self, x):out = x[0] + self.gate * x[1]return out
http://www.lqws.cn/news/572239.html

相关文章:

  • 个人计算机系统安全、网络安全、数字加密与认证
  • 数据库中的 DDL(Data Definition Language,数据定义语言) 用于定义或修改数据库结构(如库、表、索引、约束等)。
  • 机器学习-02(深度学习的基本概念)
  • 智能新纪元:大语言模型如何重塑电商“人货场”经典范式
  • 【QT】信号和槽(1) 使用 || 定义
  • 深入学习 GORM:记录插入与数据检索
  • MySQL技巧
  • 【ad-hoc】# P12414 「YLLOI-R1-T3」一路向北|普及+
  • Requests源码分析:面试考察角度梳理
  • MySQL 架构
  • 理解 Confluent Schema Registry:Kafka 生态中的结构化数据守护者
  • 第10.4篇 使用预训练的目标检测网络
  • 学习使用Visual Studio分析.net内存转储文件的基本用法
  • C# 委托(调用带引用参数的委托)
  • 计算机组成原理与体系结构-实验四 微程序控制器 (Proteus 8.15)
  • 【硬核数学】3. AI如何应对不确定性?概率论为模型注入“灵魂”《从零构建机器学习、深度学习到LLM的数学认知》
  • 【HuggingFace】模型下载至本地访问
  • SpringMVC实战:从配置到JSON处理全解析
  • 开源免费计划工具:帮你高效规划每一天
  • UE5 Grid3D 学习笔记
  • 什么是IPFS(InterPlanetary File System,星际文件系统)
  • c# 在sql server 数据库中批插入数据
  • C++ 格式化输入输出
  • 「Java案例」输出24个希腊字母
  • 计算机组成原理与体系结构-实验一 进位加法器(Proteus 8.15)
  • Linux下的调试器-gdb(16)
  • 信息安全与网络安全---引言
  • 矩阵的定义和运算 线性代数
  • 设计模式 | 组合模式
  • VMware设置虚拟机为固定IP