论文解读:DiAD之SG网络

devtools/2024/12/22 15:26:46/

目录

  • 一、SG网络功能介绍
  • 二、SG网络代码实现

一、SG网络功能介绍

DiAD论文最主要的创新点就是使用SG网络解决多类别异常检测中的语义信息丢失问题,那么它是怎么实现的保留原始图像语义信息的同时重建异常区域?

与稳定扩散去噪网络的连接: SG网络被设计为与稳定扩散(Stable Diffusion, SD)去噪网络相连接。SD去噪网络本身具有强大的图像生成能力,但可能无法在多类异常检测任务中保持图像的语义信息一致性。SG网络通过引入语义引导机制,使得在重构异常区域时能够参考并保留原始图像的语义上下文。整个框架图中,SG网络与去噪网络的连接如下图所示。
在这里插入图片描述在这里插入图片描述
这是论文给出的最终输出,我认为图中圈出来的地方有问题,应该改为SG网络的编码器才对。

语义一致性保持: SG网络在重构过程中,通过在不同尺度下处理噪声,并利用空间感知特征融合(Spatial-aware Feature Fusion, SFF)块融合特征,确保重建过程中保留语义信息。这样,即使在重构异常区域时,也能使修复后的区域与原始图像的语义上下文保持一致。
多尺度特征融合: SFF块将高尺度的语义信息集成到低尺度中,使得在保留原始正常样本信息的同时,能够处理大规模异常区域的重建。这种机制有助于在处理需要广泛重构的区域时,最大化重构的准确性,同时保持图像的语义一致性。从下图中可以看到,特征融合模块还是很好理解的。

在这里插入图片描述

与预训练特征提取器的结合: SG网络还与特征空间中的预训练特征提取器相结合。预训练特征提取器能够处理输入图像和重建图像,并在不同尺度上提取特征。通过比较这些特征,系统能够生成异常图(anomaly maps),这些图显示了图像中可能存在的异常区域,并给出了异常得分或置信度。这一步骤进一步验证了SG网络在保留语义信息方面的有效性。
避免类别错误: 相比于传统的扩散模型(如DDPM),SG网络通过引入类别条件解决了在多类异常检测任务中可能出现的类别错误问题。LDM虽然通过交叉注意力引入了条件约束,但在随机高斯噪声下去噪时仍可能丢失语义信息。SG网络则通过其语义引导机制,有效地避免了这一问题。

二、SG网络代码实现

这部分代码大概有300行

python">class SemanticGuidedNetwork(nn.Module):def __init__(self,image_size,in_channels,model_channels,hint_channels,num_res_blocks,attention_resolutions,dropout=0,channel_mult=(1, 2, 4, 8),conv_resample=True,dims=2,use_checkpoint=False,use_fp16=False,num_heads=-1,num_head_channels=-1,num_heads_upsample=-1,use_scale_shift_norm=False,resblock_updown=False,use_new_attention_order=False,use_spatial_transformer=False,  # custom transformer supporttransformer_depth=1,  # custom transformer supportcontext_dim=None,  # custom transformer supportn_embed=None,  # custom support for prediction of discrete ids into codebook of first stage vq modellegacy=True,disable_self_attentions=None,num_attention_blocks=None,disable_middle_self_attn=False,use_linear_in_transformer=False,):super().__init__()if use_spatial_transformer:assert context_dim is not None, 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'if context_dim is not None:assert use_spatial_transformer, 'Fool!! You forgot to use the spatial transformer for your cross-attention conditioning...'from omegaconf.listconfig import ListConfigif type(context_dim) == ListConfig:context_dim = list(context_dim)if num_heads_upsample == -1:num_heads_upsample = num_headsif num_heads == -1:assert num_head_channels != -1, 'Either num_heads or num_head_channels has to be set'if num_head_channels == -1:assert num_heads != -1, 'Either num_heads or num_head_channels has to be set'self.dims = dimsself.image_size = image_sizeself.in_channels = in_channelsself.model_channels = model_channelsif isinstance(num_res_blocks, int):self.num_res_blocks = len(channel_mult) * [num_res_blocks]else:if len(num_res_blocks) != len(channel_mult):raise ValueError("provide num_res_blocks either as an int (globally constant) or ""as a list/tuple (per-level) with the same length as channel_mult")self.num_res_blocks = num_res_blocksif disable_self_attentions is not None:# should be a list of booleans, indicating whether to disable self-attention in TransformerBlocks or notassert len(disable_self_attentions) == len(channel_mult)if num_attention_blocks is not None:assert len(num_attention_blocks) == len(self.num_res_blocks)assert all(map(lambda i: self.num_res_blocks[i] >= num_attention_blocks[i], range(len(num_attention_blocks))))print(f"Constructor of UNetModel received num_attention_blocks={num_attention_blocks}. "f"This option has LESS priority than attention_resolutions {attention_resolutions}, "f"i.e., in cases where num_attention_blocks[i] > 0 but 2**i not in attention_resolutions, "f"attention will still not be set.")self.attention_resolutions = attention_resolutionsself.dropout = dropoutself.channel_mult = channel_multself.conv_resample = conv_resampleself.use_checkpoint = use_checkpointself.dtype = th.float16 if use_fp16 else th.float32self.num_heads = num_headsself.num_head_channels = num_head_channelsself.num_heads_upsample = num_heads_upsampleself.predict_codebook_ids = n_embed is not Nonetime_embed_dim = model_channels * 4self.time_embed = nn.Sequential(linear(model_channels, time_embed_dim),nn.SiLU(),linear(time_embed_dim, time_embed_dim),)self.input_blocks = nn.ModuleList([TimestepEmbedSequential(conv_nd(dims, in_channels, model_channels, 3, padding=1))])self.zero_convs = nn.ModuleList([self.make_zero_conv(model_channels)])self.input_hint_block = TimestepEmbedSequential(conv_nd(dims, hint_channels, 16, 3, padding=1),nn.SiLU(),conv_nd(dims, 16, 16, 3, padding=1),nn.SiLU(),conv_nd(dims, 16, 32, 3, padding=1, stride=2),nn.SiLU(),conv_nd(dims, 32, 32, 3, padding=1),nn.SiLU(),conv_nd(dims, 32, 96, 3, padding=1, stride=2),nn.SiLU(),conv_nd(dims, 96, 96, 3, padding=1),nn.SiLU(),conv_nd(dims, 96, 256, 3, padding=1, stride=2),nn.SiLU(),zero_module(conv_nd(dims, 256, model_channels, 3, padding=1)))self._feature_size = model_channelsinput_block_chans = [model_channels]ch = model_channelsds = 1for level, mult in enumerate(channel_mult):for nr in range(self.num_res_blocks[level]):layers = [ResBlock(ch,time_embed_dim,dropout,out_channels=mult * model_channels,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,)]ch = mult * model_channelsif ds in attention_resolutions:if num_head_channels == -1:dim_head = ch // num_headselse:num_heads = ch // num_head_channelsdim_head = num_head_channelsif legacy:# num_heads = 1dim_head = ch // num_heads if use_spatial_transformer else num_head_channelsif exists(disable_self_attentions):disabled_sa = disable_self_attentions[level]else:disabled_sa = Falseif not exists(num_attention_blocks) or nr < num_attention_blocks[level]:layers.append(AttentionBlock(ch,use_checkpoint=use_checkpoint,num_heads=num_heads,num_head_channels=dim_head,use_new_attention_order=use_new_attention_order,) if not use_spatial_transformer else SpatialTransformer(ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim,disable_self_attn=disabled_sa, use_linear=use_linear_in_transformer,use_checkpoint=use_checkpoint))self.input_blocks.append(TimestepEmbedSequential(*layers))self.zero_convs.append(self.make_zero_conv(ch))self._feature_size += chinput_block_chans.append(ch)if level != len(channel_mult) - 1:out_ch = chself.input_blocks.append(TimestepEmbedSequential(ResBlock(ch,time_embed_dim,dropout,out_channels=out_ch,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,down=True,)if resblock_updownelse Downsample(ch, conv_resample, dims=dims, out_channels=out_ch)))ch = out_chinput_block_chans.append(ch)self.zero_convs.append(self.make_zero_conv(ch))ds *= 2self._feature_size += chif num_head_channels == -1:dim_head = ch // num_headselse:num_heads = ch // num_head_channelsdim_head = num_head_channelsif legacy:# num_heads = 1dim_head = ch // num_heads if use_spatial_transformer else num_head_channelsself.middle_block = TimestepEmbedSequential(ResBlock(ch,time_embed_dim,dropout,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,),AttentionBlock(ch,use_checkpoint=use_checkpoint,num_heads=num_heads,num_head_channels=dim_head,use_new_attention_order=use_new_attention_order,) if not use_spatial_transformer else SpatialTransformer(  # always uses a self-attnch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim,disable_self_attn=disable_middle_self_attn, use_linear=use_linear_in_transformer,use_checkpoint=use_checkpoint),ResBlock(ch,time_embed_dim,dropout,dims=dims,use_checkpoint=use_checkpoint,use_scale_shift_norm=use_scale_shift_norm,),)self.middle_block_out = self.make_zero_conv(ch)self._feature_size += ch#SFF Blockself.down11 = nn.Sequential(zero_module(nn.Conv2d(640, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down12 = nn.Sequential(zero_module(nn.Conv2d(640, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down13 = nn.Sequential(zero_module(nn.Conv2d(640, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down21 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down22 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down23 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down31 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down32 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.down33 = nn.Sequential(zero_module(nn.Conv2d(1280, 1280, kernel_size=3, stride=2, padding=1, bias=False)),nn.InstanceNorm2d(1280),nn.SiLU(),)self.silu = nn.SiLU()def make_zero_conv(self, channels):return TimestepEmbedSequential(zero_module(conv_nd(self.dims, channels, channels, 1, padding=0)))def forward(self, x, hint, timesteps, context, **kwargs):t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)emb = self.time_embed(t_emb)guided_hint = self.input_hint_block(hint, emb, context)outs = []h = x.type(self.dtype)for module, zero_conv in zip(self.input_blocks, self.zero_convs):if guided_hint is not None:h = module(h, emb, context)h += guided_hintguided_hint = Noneelse:h = module(h, emb, context)outs.append(zero_conv(h, emb, context))#SFF Block Implementationouts[9] = self.silu(outs[9]+self.down11(outs[6])+self.down21(outs[7])+self.down31(outs[8]))outs[10] = self.silu(outs[10]+self.down12(outs[6])+self.down22(outs[7])+self.down32(outs[8]))outs[11] = self.silu(outs[11]+self.down13(outs[6])+self.down23(outs[7])+self.down33(outs[8]))h = self.middle_block(h, emb, context)outs.append(self.middle_block_out(h, emb, context))return outs

http://www.ppmy.cn/devtools/85397.html

相关文章

elementPuls 表格反选实现

真的在网上搜了很多资料发现根本实现不了反选 最下面有示例 然后去看了下官网 发现官网有教你怎么选中某个值的方法 官网中的”多选“ 官网地址 <template><el-tableref"multipleTableRef":data"tableData"style"width: 100%"selectio…

wps在pc端在线预览,而不是下载

如果有有java后端代码如下 SneakyThrowsApiOperation("访问文件")GetMapping("/download/{name}")public void getImage(HttpServletResponse response, PathVariable("name") String name) {String imagePath uploadFilePath File.separator …

基于springboot+vue+uniapp的网上花店小程序

开发语言&#xff1a;Java框架&#xff1a;springbootuniappJDK版本&#xff1a;JDK1.8服务器&#xff1a;tomcat7数据库&#xff1a;mysql 5.7&#xff08;一定要5.7版本&#xff09;数据库工具&#xff1a;Navicat11开发软件&#xff1a;eclipse/myeclipse/ideaMaven包&#…

操作系统:进程1

一.进程 1.什么是进程 一个进程创建&#xff0c;他会生成几块&#xff1a; 代码段&#xff1a;进程执行的程序代码数据段&#xff1a;全局变量&#xff0c;静态变量&#xff0c;在进程生命周期中是动态可变的堆&#xff1a;动态分配的内存区域&#xff0c;malloc、calloc、real…

Unity Canvas动画:UI元素的动态展示

在Unity中&#xff0c;Canvas是用于管理和展示用户界面&#xff08;UI&#xff09;元素的系统。Canvas动画是UI设计中的重要组成部分&#xff0c;它能够提升用户体验&#xff0c;使界面更加生动和响应用户操作。本文将探讨Unity Canvas动画的基本概念、实现方法以及一些实用的技…

HDShredder 7 企业版案例分享: 依照国际权威标准,安全清除企业数据

HDShredder 7 企业版用户案例 天津鸿萌科贸发展有限公司是德国 Miray 公司 HDShredder 数据清除软件的授权代理商。近日&#xff0c;上海某网络科技有限公司采购 HDShredder 7 企业版x4&#xff0c;为公司数据存储资产的安全清除工作流程配备高效的执行工具。HDShredder 7 企业…

LC 128.最长连续序列

128.最长连续序列 给定一个未排序的整数数组 nums &#xff0c;找出数字连续的最长序列&#xff08;不要求序列元素在原数组中连续&#xff09;的长度。 请你设计并实现时间复杂度为 O(n) 的算法解决此问题。 示例 1&#xff1a; 输入&#xff1a; nums [100,4,200,1,3,2]…

防火墙——网络环境支持

目录 网络环境支持 防火墙的组网 web连接上防火墙 web管理口 让防火墙接到网络环境中 ​编辑 管理员用户管理 缺省管理员 接口 配置一个普通接口 创建安全区域 路由模式 透明模式 混合模式 防火墙的安全策略 防火墙转发流程 与传统包过滤的区别 创建安全策略 …