Abstract
Infrared small target denoising is widely used in military and civilian fields. Existing deep learning-based methods are specially designed for optical images and tend to over-smooth the informative image details, thus losing the response of small targets. To both denoise and maintain informative image details, this paper proposes a gradient-aware channel attention network (GCAN) for infrared small target image denoising before detection. Specifically, we use an encoder-decoder network to remove the additive noise of the infrared images. Then, a gradient-aware channel attention module is designed to adaptively enhance the informative high-gradient image channel. The informative target region with high-gradient can be maintained in this way. After that, we develop a large dataset with 3981 noisy infrared images. Experimental results show that our proposed GCAN can both effectively remove the additive noise and maintain the informative target region. Additional experiments of infrared small target detection further verify the effectiveness of our method.
With the rapid development of infrared imaging technology, the infrared imaging system has been widely used in marine resource utilization, high-precision navigation, and ecological environment monitorin

Fig. 1 (a1)-(a3) Visual results of noisy input images; (b1)-(b3) detected results without denoising; (c1)-(c3) denoised images by our method; (d1)-(d3) detected results with denoising
图1 (a1)-(a3)含噪声输入图像; (b1)-(b3)未去噪直接检测结果; (c1)-(c3)本文算法去噪后的图像;(d1)-(d3)去噪后检测结果
To alleviate the negative effect caused by the additive noise, numerous traditional methods have been proposed, including the filtering-based metho
Different from the previous model-driven traditional methods, the convolutional neural network (CNN) can achieve high-performance image denoising in a data-driven manner and has yielded promising results in optical image denoising. Jain et al
To both denoise IR images and maintain the response of small targets, we propose a novel infrared image denoising method named gradient-aware channel attention network (GCAN). We design an encoder decoder-based network with residual connections to remove the additive noise of infrared images. Then, a gradient-based channel attention module (GCAM) is designed and embedded into the residual connection to adaptively enhance the informative high-gradient image channel and thus preserve the informative details. In this way, informative target regions with a high gradient can be preserved and additive noise of IR images is also removed.
The contributions of this paper can be summarized as follows:
1) An encoder-decoder denoising framework and a gradient-based channel attention module are proposed to remove the additive noise and adaptively
enhance the informative image channels, respectively.
2) We develop an NUDT-IRSTDn dataset with various SCR ratios based on our previous NUDT-SIRST dataset. Both IR image denoising performance and corresponding influence on subsequent target detection tasks can be evaluated.
3) The experimental results of both denoising and high-level object detection demonstrate that our GCAN can not only achieve high-performance of denoising compared to other state-of-the-art methods, but also effectively keep the performance of subsequent detection tasks stable under terrible imaging conditions.
Assuming that is a noise disturbance image and is a corresponding clean image, the relationship between them can be formulated as:
, | (1) |
where :denotes the complex degradation process involving internal and external IR imaging conditions.
The noise reduction process aims to recover the clean images from the degraded images. This process can be transformed to seek a function f to minimize the mse error between f(x) and Y , which can be described as:
, | (2) |
where f is regarded as the optimal approximation of , and denotes the recovered clean image.
1) Overall architecture: In this section, we introduce our infrared image denoising network (GCAN) in detail. First, we follow the encoder decoder-based architecture and combine with residual connections to remove the varied additive noise and initially pass image details to the top layers. It is worth noting that pooling layers and the ReLU layers are removed before the summation with residuals to avoid losing details. Then, we propose a gradient-based channel attention module to maintain the potential target regions (e.g., high-gradient region) while denoising images. The overall architecture of the GCAN is shown in

Fig. 2 An illustration of the proposed gradient-aware channel attention network (GCAN) for infrared small target image denoising before detection
图2 基于梯度可感知通道注意力模块的红外小目标检测前去噪网络示意图
2) Encoder-decoder structure: The encoder-decoder structure consists of several stacked Conv-Blocks and Deconv-Blocks. The encoder part is designed to suppress image noise from low-level to high-level step by step while preserving informative information in the input images. As shown in
. | (3) |
Each Deconv-Block is symmetric with the corresponding Conv-Block, and the output of
, | (4) |
where th (th∈1,..., N) is the number of Blocks. wi and bi denote the weights and biases in the i (i∈1,...,I) convolutional layer, respectively. * and represent convolution and deconvolution operator, respectively.
is the input image, and (k>0) is the extracted feature from the previous layers. ReLU(X) = Max(0, X) is the activation function.
3) Residual connections: The residual connection is used to avoid gradients vanishing as the network goes deep, and also serves as a simple detail recovery structure that can connect matched Conv-Blocks and Deconv-Blocks to propagate the informative details from low-level to high-level features. As shown in
4) Gradient-based channel attention module (GCAM): To avoid over-smooth the informative small target region, we design a GCAM as shown in
, | (5) |
where M and N represent the length and width of the image, respectively. Then is fed to a mean
operation to generate , respectively. After element-wise multiplication, , GCAM can adaptively enhance the input feature map along the channel dimension.
The high-quality dataset is essential for data-driven CNN-based methods. However, existing denoising methods are essentially data-driven and evaluated on their in-house datase
These noisy images are manually synthesized by adding Gaussian white noise on those clean long-wave band IR images, whose wavelength locates between 8 μm and 14 μm. As shown in
Metrics | NUDT SIRST | NUDT-IRSTDn | ||
---|---|---|---|---|
Noise.v1 | Noise.v2 | Noise.v3 | ||
LSCR | 0.402~19.05 | 0.402~5 | 0.402~3.5 | 0.402~2 |
LSCR’ | 5.68 | 4.364 | 3.205 | 1.687 |
σ | - | 0~0.06 | 0~0.1 | 0~0.5 |
σ’ | - | 0.013 | 0.04 | 0.154 |
PSNR | - | 21.5~40.2 | 20.9~34.1 | 9.9~24.4 |
PSNR’ | - | 31.88 | 25.89 | 17.31 |
Number | 1327 | 1327 | 1327 | 1327 |
To simulate IR images subject to complex noise interference scenarios and better comparison of the influence of different noise intensities on subsequent tasks. We did not directly add the same levels of noise to the initial image. The synthesis process of our dataset is shown in
, | (6) |
where , , are the local background gray mean, target gray level mean, and local background gray standard deviation. We set the local background of the target as a rectangle centered at the target position with fixed width and height of 20 pixels. To eliminate the influence of the target region, we exclude the target region inside the rectangle. Some examples of the developed dataset are shown in

Fig. 3 Examples of the developed dataset, including (a0)-(i0) clean images; (a1)-(i1) level-1 noisy images; (a2)-(i2) level-2 noisy images; (a3)-(i3) level-3 noisy images
图3 数据集示例,(a0)-(i0) 无噪声图片; (a1)-(i1) 噪声等级为1的图片; (a2)-(i2) 噪声等级为2的图片; (a3)-(i3) 噪声等级为3的图片

Fig. 4 Synthesis process of our dataset
图4 数据集合成步骤
As shown in
1)Implementation Details: We conducted extensive experiments on the NUDT-IRSTDn dataset. To consist with the NUDT-SIRST dataset, we divided each group dataset into a training set and a test set with the ratio of 1:1. We resized all input IR images to 256×256 pixels. The batch size and learning rate in the process of network training were set as 8 and 1×e-5 respectively. We used the mean square error (MSE) as the loss function of our network. All models were implemented in PyTorch on a computer with an Intel Xeon Gold 5117 CPU and an Nvidia Tesla V100 GPU.
2)Evaluation Metrics: Following the previous work
1)Denoising results: To verify the superiority of our method, we compared our GCAN with state-of-the-art methods, including conventional model-based methods (BM3
Denosing Method | Noise.v1 | Noise.v2 | Noise.v3 | |||
---|---|---|---|---|---|---|
PS NR | SSIM | PS NR | SSIM | PS NR | SSIM | |
BM3D | 36.7 | 0.75 | 31.0 | 0.52 | 19.4 | 0.23 |
WNNM | 34.6 | 0.38 | 33.1 | 0.36 | 30.3 | 0.28 |
K-SVD | 35.2 | 0.62 | 34.0 | 0.43 | 31.2 | 0.27 |
REDCNN | 36.8 | 0.87 | 35.6 | 0.82 | 29.5 | 0.74 |
DnCNN | 44.3 | 0.93 | 40.3 | 0.91 | 33.6 | 0.87 |
SWINIR | 44.9 | 0.92 | 41.7 | 0.97 | 34.3 | 0.87 |
GCAN | 45.5 | 0.96 | 42.1 | 0.96 | 33.7 | 0.88 |

Fig. 5 Input image and corresponding denoising results of different methods on Noise-v1, Noise-v2 and Noise-v3 datasets respectively, for better visualization, the two regions of interest are enlarged and highlighted in red and blue, respectively, it can be observed that our GCAN can preserve target regions better
图5 不同去噪方法在三种噪声数据集上的输入图像和对应的去噪结果,图像中的重要区域采用红蓝框的形式进行了强调,可视化结果表明本文提出的去噪方法可以在有效去除噪声的同时保留好红外图像中的弱小目标信息
Method | #Params (dB) | FLOPs (G) | PSNR/SSIM (dB) |
---|---|---|---|
GCAN w/o GCAM | 1.848 | 83.89 | 44.6 |
GCAN | 2.345 | 157.30 | 45.5 |
2)Effectiveness of Denoising for Detection: In this subsection, we evaluated the effectiveness of the denoising methods by comparing whether these methods can help the subsequent detection task maintain performance under a varied noisy environment.
Firstly, we evaluated the influence of additive noise on subsequent target detection. We selected five typical infrared small target detection methods (Top-ha
Detection Method | *Oriset | Noise.v1 | Noise.v2 | Noise.v3 |
---|---|---|---|---|
Top-Ha | 25.8 | 23.6 | 13.0 | 5.21 |
RIP | 35.2 | 26.3 | 14.9 | 7.75 |
AC | 44.1 | 39.1 | 20.7 | 1.19 |
UNe | 79.5 | 64.7 | 38.4 | 19.0 |
DNANe | 88.6 | 64.6 | 38.3 | 5.5 |
*Oriset indicates noise-free images in the image pairs.
Then, we compared the detection results on denoised images to evaluate the performance of denoising methods. We adopted Top-Ha
Denoising Method | Noise.v1 | Noise.v2 | Noise.v3 | |||
---|---|---|---|---|---|---|
Top-Ha | DNANe | Top-Ha | DNANe | Top-Ha | DNANe | |
BM3 | 23.6/37.5/1.9 | 61.1/72.1/17.7 | 13.2/27.4/3.04 | 39.4/49.3/32.9 | 5.42/21.3/128 | 5.25/30.8/18.0 |
WNN | 1.89/6.55/14.5 | 1.75/1.58/1.13 | 2.11/7.07/21.55 | 2.07/1.90/0.95 | 1.13/3.91/7.82 | 0.75/0.63/0.70 |
K-SV | 21.1/26.3/12.3 | 58.9/67.3/28.1 | 13.3/26.2/45.1 | 42.1/51.2/52.0 | 5.14/18.5/86.7 | 2.12/32.5/29.1 |
RED-CN | 13.2/26.9/39.4 | 44.5/58.1/1.91 | 5.33/14.8/3.25 | 28.1/28.8/3.92 | 1.67/6.61/3.76 | 3.57/10.2/10.0 |
DnCN | 23.9/39.4/2.05 | 72.9/95.1/1.21 | 21.1/35.4/1.96 | 60.4/86.2/1.30 | 6.29/18.3/2.75 | 15.2/26.2/5.43 |
GCAN(ours) | 24.1/41.7/1.48 | 74.5/96.7/0.40 | 22.0/38.4/1.70 | 61.6/87.9/1.00 | 8.38/20.2/2.61 | 17.5/29.2/1.07 |
3) Computational Efficiency: As shown in
Denosing Method | GFLOPs (G) | Inference Time (s) | Params (M) | PSNR(dB) |
---|---|---|---|---|
RED-CNN | 83.89 | 0.156 | 1.848 | 44.6 |
DnCNN | 43.79 | 0.307 | 0.668 | 44.3 |
SWINIR | 49.64 | 0.271 | 11.80 | 44.9 |
GCAN | 157.30 | 0.206 | 2.345 | 45.5 |
In this paper, we propose a simple yet effective gradient-aware channel attention network (GCAN) for infrared small target image denoising before detection. To achieve this data-driven learning manner, we develop an infrared image denoising dataset, which contains 3 noise-level subsets. Then, we propose a novel infrared image denoising method (namely, GCAN) to achieve high-performance image denoising. Specifically, an encoder decoder-based denoising network is used to initially remove the additive noise. Then, a residual connection structure and a gradient-based channel attention module (GCAM) are designed to preserve informative image details in IR images. Some conclusions can be summarized as follows:
(1) Compared to four benchmark denoising methods, GCAN achieves better denoising performance in terms of PSNR and SSIM. Better visually denoising performance is also achieved.
(2) The gradient-based channel attention module (GCAM) can avoid the over-smooth of IR images and effectively maintain the response of small target regions. Extensive experiments on five benchmark detection methods can verify the effectiveness of our method in terms of IoU、Pd and Fa.
(3) Although achieving better performance, larger model size and extra computation cost (i.e., FLOPs) are introduced, more light-weight computation operator and simple network will be explored to increase the practicality under computational resources limited device in the future work.
References
Sun Y, Yang J, An W. Infrared dim and small target detection via multiple subspace learning and spatial-temporal patch-tensor model[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 59(5): 3737-3752. 10.1109/tgrs.2020.3022069 [Baidu Scholar]
Wu T, Li B, Luo Y. MTU-Net: Multilevel TransUNet for Space-Based Infrared Tiny Ship Detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61(2): 1-15. 10.1109/tgrs.2023.3235002 [Baidu Scholar]
Li B, Wang Y, Wang L, et al. Monte Carlo Linear Clustering with Single-Point Supervision is Enough for Infrared Small Target Detection [C]// IEEE International Conference on Computer Vision. 2023, 455-468. 10.1109/iccv51070.2023.00099 [Baidu Scholar]
Liu T, Yang J, Li B, et al. Nonconvex tensor low-rank approximation for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60(3): 1-18. 10.1109/tgrs.2021.3130310 [Baidu Scholar]
Li B, Guo Y, Yang J, et al. Gated recurrent multi-attention network for VHR remote sensing image classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60(3): 1-13. 10.1109/tgrs.2021.3093914 [Baidu Scholar]
Liu T, Yang J, Li B, et al. Infrared small target detection via nonconvex tensor tucker decomposition with factor prior[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62(4): 25-38. [Baidu Scholar]
Zhou J, Wang L, and Liu B. Analysis of the causes of non-uniformity in infrared images[J]. Infrared and Laser Engineering, 1997, 26(3): 11-13. [Baidu Scholar]
Goyal. B, Dogra. A, Agrawal. S,et al. A. Image enoising review: From classical to state-of-the-art approaches[J]. Information fusion, 2020, 55(1): 220-244. 10.1016/j.inffus.2019.09.003 [Baidu Scholar]
Li B, Xiao C, Wang L, et al. Dense nested attention network for infrared small target detection[J]. IEEE Transactions on image processing, 2023, 32(5): 1745-1758. 10.1109/tip.2022.3199107 [Baidu Scholar]
Dabov K, Foi A, Katkovnik V, et al. Image denoising by sparse 3-d transform-domain collaborative filtering [J]. IEEE Transactions on image processing, 2007, 16(8): 2080-2095. 10.1109/tip.2007.901238 [Baidu Scholar]
Aharon M, Elad M, Bruckstein A. K-svd: An algorithm for designing overcomplete dictionaries for sparse representation [J]. IEEE Transactions on signal processing, 2006, 54(11): 4311-4322. 10.1109/tsp.2006.881199 [Baidu Scholar]
Bouboulis P. Slavakis K., Theodoridis S. Adaptive kernel based image denoising employing semi-parametric regularization [J]. IEEE Transactions on Image Processing, 2010, 19(6): 1465-1479. 10.1109/tip.2010.2042995 [Baidu Scholar]
Gu S, Xie Q, Meng D, et al. Weighted nuclear norm minimization and its applications to low level vision [J]. International journal of computer vision, 2017, 121(2): 183-208. 10.1007/s11263-016-0930-5 [Baidu Scholar]
Jain V, Seung H S. Natural Image Denoising with Convolutional Networks [C]//International Conference on Neural Information Processing Systems. 2008, 455-468. [Baidu Scholar]
Zhang K, Zuo W, Chen Y, et al. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising[J]. IEEE transactions on image processing, 2017, 26(7): 3142-3155. 10.1109/tip.2017.2662206 [Baidu Scholar]
Chen H, Zhang Y, Zhang W, et al. Low-dose ct via convolutional neural network [J]. Biomedical optics express. 2017, 8(2), 679-694. 10.1364/boe.8.000679 [Baidu Scholar]
Rivest J.-F. and Fortin R. Detection of dim targets in digital infrared imagery by morphological image processing [J]. Optical Engineering. 1996, 35(7): 1886-1893. 10.1117/1.600620 [Baidu Scholar]
Chirdchoo N, Soh W S, Chua K C. Ript: A receiver-initiated reservation-based protocol for underwater acoustic networks [J]. IEEE Journal on Selected Areas in Communications. 2008, 26(9): 1744-1753. 10.1109/jsac.2008.081213 [Baidu Scholar]
Dai Y, Wu Y, Zhou F, et al. Asymmetric contextual modulation for infrared small target detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021: 950-959. 10.1109/wacv48630.2021.00099 [Baidu Scholar]
Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]// International Conference on Medical image computing and computer-assisted intervention. 2015, 234-241. 10.1007/978-3-319-24574-4_28 [Baidu Scholar]
Liang J, Cao J, Sun G, et al. Swinir: Image restoration using swin transformer[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 1833-1844. 10.1109/iccvw54120.2021.00210 [Baidu Scholar]