Joint EM Image Denoising and Segmentation with Instance-Aware Interaction
code:https://github.com/zhichengwang-tri/EM-DenoiSeg
代码真的写的超级无敌烂!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!谁爱复现谁复现吧
Miccai2024
Abstract
在大型电子显微镜(EM)中,对快速成像的需求往往导致大量的成像噪声,这大大损害了分割精度。虽然传统方法通常将去噪作为初步阶段,但对去噪和分割过程之间潜在协同作用的探索有限。为了弥补这一差距,我们提出了一个实例感知的交互框架来同时处理EM图像去噪和分割,旨在实现这两个任务之间的相互增强。具体来说,我们的框架包括三个组成部分:去噪网络、分割网络和促进特征级交互的融合网络。首先,去噪网络减轻了噪声的退化。随后,分割网络学习实例级亲和先验,编码重要的空间结构信息。最后,在融合网络中,我们提出了一种新的实例感知嵌入模块(IEM),利用分割特征中的重要空间结构信息进行去噪。IEM在统一的框架内实现了两个任务之间的交互,并通过联合训练机制促进了从去噪到分割的隐式反馈。通过跨多个数据集的广泛实验,我们的框架证明了比现有解决方案有实质性的性能改进。此外,我们的框架在不同的网络体系结构中表现出强大的泛化能力。
Contributions
1) We present the first unified framework for joint EM image denoising and instance segmentation, leveraging synergies between the two tasks.
2) We introduce a novel instance-aware embed-ding module that integrates segmentation prior to enhance the performance of both denoising and segmentation through interaction design.
3) Extensive experiments validate the superiority of our framework over existing solutions in both denoising and segmentation performance across multiple datasets, demonstrating robust generalization capabilities.
Related Works
Efforts to integrate denoising and segmentation can be broadly categorized into denoising-guided segmentation and segmentation-guided denoising methods.
Denoising-guided segmentation methods focus on enhancing the robustness of segmentation models by incorporating noise resilience during training [4,19,28].
Conversely, segmentation-guided denoising methods utilize advanced segmentation prior to optimize the network’s ability to reduce noise while preserving structural details [15,24].
recent work explores the synergy 协同作用 between semantic segmentation and image denoising via alternate boosting [26], yet which cannot handle instance segmentation. Despite progress in denoising [3,5,14] and instance segmentation [10,11,16] in the field of EM, there is a gap in research exploring their symbiotic relationship.
Method
Overview
Our framework consists of three components: a denoising network, a segmentation network, and a fusion network.
We facilitate collaborative learning and promote task interaction at the feature level. Initially, we use a denoising network to process noisy images, improving segmentation performance by mitigating noise degradation. Subsequently, a segmentation network predicts pixel affinity map encoding crucial spatial structure information.
Given a noisy input image In, denoising is first performed to mitigate noise degradation to obtain a coarse denoised image Ic. Then, a segmentation network predicts affinity map Sp from this less noisy result. After segmentation, the affinity map Sp is utilized as segmentation prior to guide the fusion process with the coarse denoised image to get the final denoised image If . At the same time, the affin-ity map predicted by the segmentation network can be converted to the final instance segmentation map by the Mutex [25] post-processing algorithm.
affinity map* 表示不同像素之间的关联性或相似性
在这个框架中,Sp 的作用包括:
- 作为分割前置信息,用于指导后续的融合过程,将 Sp与粗糙去噪图像 Ic结合起来,生成最终的去噪图像 If
the denoising network and segmentation network can be various network combinations. We implement a dual Unet architecture in Table 1, 2,and 4. In Table 3, we expand the denoising network to DnCNN [27] and RIDNet [1], and the segmentation network to TransUNet [7] to show the generalization capacity of our framework. Across all experiments, the comparison solutions for both denoising and segmentation networks are the same with ours. 居然都还是挺传统的网络的...
Instance-Aware Embedding Module
In the fusion network, we introduce a novel Instance-aware Embedding Module (IEM) to fuse semantic and image features in a structure-aware manner, preserving cellular integrity during reconstruction. IEM computes similarity between semantic and image features, facilitating cross-modal inter-action between heterogeneous representations.
Incorporating high-level prior into low-level denoising necessitates detailed con-sideration of the gaps between the sources. 感觉这个文章思路甚至可用在图像多模态
IEM establishes connections between the segmentation network and the denoising net-work, thereby facilitating the integration of these two heterogeneous tasks. we choose a Unet-like architecture [20] for the fusion network due to its exceptional performance. The network is further augmented with two IEMs that perform pixel-wise attention between image and semantic features to obtain the fused features. We integrate two IEMs into the second and third layers of the UNet encoder. As illustrated in Fig. 1(a), the coarse denoised image Ic is passed through a cascade 级联 of convolution layers to extract feature representations. We utilize the predicted affinity from the segmentation network as multi-scale segmentation prior.
we take two semantic/image featureswith two spatial resolutions
with H and W denoting the height and width of the input image, which are then fused as refined output feature through IEM.
To reconcile the discrepancies in channel dimensions and spatial resolutions for the computation of attention, the affinity maps Sp undergo transformation, then we get the image fea-
ture F (n) and semantic feature Fs(n) with the same shape.
Next, we adopt the MultiHeadAttention [23] mechanism to compute an attention map, which is then used to fabricate image feature F (n) to get the refined image feature F (n),
Then, we apply ReLU(·) to the refined image feature and add it with the input image feature,
the final image feature is then sent to the next layer of the fusion network. Finally, a bottleneck layer, followed by a decoder network reconstructs the final denoised image If .
Joint Training Mechanism
Lastly, our joint learning mechanism enables the fusion network to provide implicit yet effective feedback to the affinity learning process, thereby benefiting segmentation.
We train our framework in an end-to-end manner. As shown in Fig. 1(a), for the coarse denoised image Ic, we employ a restoration loss,
For the predicted affinity Sp, we utilize a weighted binary cross entropy loss for optimization. where Sgt is the ground truth affinity.
The final denoising result If is also supervised by a restoration loss
Distinct from existing solutions [15,19], we employ a joint training approach, where the overall objection function is the combination of the above three losses, which terms
α = 3, β = 50 are the hyper-parameters.
Experiments
加噪
We simulate two types of noise degradation. For film noise, we set the kernel size to 5 and the maximum intensity to 1.5. For Gaussian-Poisson mixture noise, the noise level of Gaussian noise is randomly set between 55 and 85, and the lambda parameter of the Poisson component is set to a random number between 0.6 and 0.8.