Abstract
The encoding aperture snapshot spectral imaging system, based on the compressive sensing theory, can be regarded as an encoder, which can efficiently obtain compressed two-dimensional spectral data and then decode it into three-dimensional spectral data through deep neural networks. However, training the deep neural networks requires a large amount of clean data that is difficult to obtain. To address the problem of insufficient training data for deep neural networks, a self-supervised hyperspectral denoising neural network based on neighborhood sampling is proposed. This network is integrated into a deep plug-and-play framework to achieve self -supervised spectral reconstruction. The study also examines the impact of different noise degradation models on the final reconstruction quality. Experimental results demonstrate that the self-supervised learning method enhances the average peak signal-to-noise ratio by 1.18 dB and improves the structural similarity by 0.009 compared with the supervised learning method. Additionally, it achieves better visual reconstruction results.
Spectral imaging technology enables the simultaneous acquisition of both spectral and image informatio
Xiong et al. pioneered the use of deep learning methods for hyperspectral compressive sensing reconstruction
By applying deep learning to compressive sensing reconstruction, high quality and fast reconstruction can be achieved through the powerful deep feature representation capability. However, most existing research on deep learning-based reconstruction algorithms is limited to supervised learning approaches, restricting their applicability in real-world scenarios. This is because supervised learning approaches require a large number of clean images to be used as labels, and acquiring clean images in the hyperspectral domain is expensive. In the realm of deep learning denoising, self-supervised learning has shown promise. In cases where the noise is zero-mean, the Noise2Noise method demonstrates that for a clean scene and two independently noise-containing images and observed, a denoising network trained with pairings is equivalent to a network trained with pairings. Neighbor2Neighbor extends Noise2Noise by downsampling a single noise-containing image and training the two noise-containing sub-images obtained from downsampling as model inputs and labels. However, these self-supervised learning methods have only been tested for their denoising efficacy on RGB images or greyscale images and have not explored their performance on hyperspectral images.
In this paper, we propose a self-supervised compressive sensing hyperspectral image reconstruction algorithm based on PnP method and Neighbor2Neighbor strategy. Initially, we train a self-supervised hyperspectral denoising network using Neighbor2Neighbor, incorporating a channel attention mechanism to capture inter-spectral correlations in hyperspectral images. Subsequently, the denoising network is embedded into a deep plug-and-play framework based on the alternating multiplier method to achieve compressive sensing image reconstruction with a denoising model. We evaluate the algorithm’s effectiveness in terms of both data metrics and visual effects, and compare the effects of different noise degradation models on the final reconstruction results.
The main contributions of this paper are as follows:
(1)Based on Neighbor2Neighbor and SENet, we propose a self-supervised hyperspectral image denoising network model Self-HSIDeCNN for subsequent compressed sensing image reconstruction.
(2)On the basis of Self-HSIDeCNN and PnP method, we propose PnP-Self-HSIDeCNN method to implement self-supervised hyperspectral compressed sensing image reconstruction.
(3)Through ablation experiments, we verified the effects of multiple hyperparameters in the PnP-Self-HSIDeCNN method on the reconstruction speed and reconstruction results, laying a foundation for its practical application.
The rest of this paper is organized as follows: section 1 describes the mathematical model of CASSI; section 2 introduces the mathematical principles of PnP method and Neighbor2Neighbor methond, and introduces self-supervised denoising network Self-HSIDeCNN; section 3 demonstrates the effectiveness of our proposed self-HSIDeCNN method through extensive experiments; section 4 concludes the paper.
Let be the spectral data cube, where and are the spatial dimensions and is the spectral dimension. Let be the CASSI system physical mask, which can be regarded as a matrix of size . Each element of this matrix obeys a 0-1 distribution with probability and is used to modulate the 3D spectral data cube signal. Let be the spectral data after passing through the mask plate, and for the -th band, we have
, | (1) |
where represents the element-wise multiplication. After passing through the dispersion prism, the data cube is shifted in the -axis. Let the offset signal be , and be the reference wavelength, we have
, | (2) |
where are the pixel coordinates in the plane of the detector, is the wavelength of the-th band, is the centre wavelength, denotes the spatial offset of the -th band. This gives the measured value at the position of the detector plane is
. | (3) |
The detector receives signals from all bands and finally obtains a two-dimensional measurement . Considering the noise during the measurement, we have
. | (4) |
Let the offset physical mask plate matrix (this matrix can be fixed or variabl
, | (5) |
. | (6) |
The final obtained measurement can be expressed a
. | (7) |
The complete process is shown in

Fig. 1 CASSI forward model
图1 CASSI系统前向模型
The PnP method can decompose the original problem, which is difficult to solve, into subproblems that are easy to solve with good result
, | (8) |
where is the measured value, is the original signal and is the regularity term. The basic idea of PnP method for inverse problems is to use a pretrained denoiser for the desired signal as a prior. The method decomposes the whole problem into easier subproblems and solves the subproblems alternately in an iterative manner. The denoising network can be used as a flexible plug-in (i.e., it can be easily changed) in the process. Specifically, problem (8) can be decomposed into the following subproblems using the alternating multiplier metho
(9) |
(10) |
, | (11) |
where is the auxiliary variable, is the multiplier, is the penalty factor, and is the number of iterations. Let the auxiliary function , then
, | (12) |
, | (13) |
, | (14) |
where .
, | (15) |
, | (16) |
, | (17) |
where is the estimated noise bias and is the denoiser. It should be noted that the performance of the noise reducer directly affects the final reconstruction results. The noise reducer used here in this paper is a self-supervised hyperspectral denoising network for the purpose of final self-supervised image reconstruction. During the practical application, the initial inputs consist of 2D measurements acquired from the detector and the mask matrix. Following several iterations of the pre-trained denoising network, the final reconstructed data cube is obtained. This process is shown in

Fig. 2 PnP image reconstruction framework
图2 PnP图像重建框架
The most critical part of the PnP-ADMM solution for compressive sensing hyperspectral reconstruction is the part of
The core idea of Noise2Noise is that for an unobserved clean scene and two observed independent noise-containing images and , a noise reduction network trained with pairings is equivalent to a network trained with pairings, provided the noise is obeying a zero mea
, | (18) |
where is the noise reduction network. Noise2Noise requires at least 2 separate noise-containing images for each scene, which is difficult to satisfy in real scenes. To increase the practical value of Noise2Noise, the theory of Noise2Noise is extended. For a single noisy image, one of the possible ways to construct two similar but not identical images is downsamplin
The Neighbor2Neighbor downsampling idea is shown in
. | (19) |

Fig. 3 Image downsampling method
图3 图像降采样方法
Reference [
. | (20) |
Deep denoising models for greyscale and RGB images have gained longevity in recent years. We propose deep plug-and-play self-supervised hyperspectral image denoising network (Self-HSIDeCNN). FFDNet is chosen as the base model framework for our model, which has the advantages of being flexible and fast, and has already shown excellent performance in denoising RGB images and greyscale image

Fig. 4 Self-HSIDeCNN network architecture
图4 Self-HSIDeCNN网络结构
In the process of performing frame-level noise reduction, for the -th band of the spectral data cube , in addition to inputting this band of size into the neural network, the images of its neighbouring bands will also be inputted (in this paper, we take = 6). For band images at this point, following the sampling strategy in 2.2, two subgraphs, both of size , are obtained. One subgraph is used as model input and one subgraph is used as labels to compute the loss function. For subgraph , in order to speed up model training, another downsampling is performed after input to the model. Together with the noise estimation level map, the final model has an input size of .
In order to better capture the correlation between different spectral bands of hyperspectral images, an inter-channel attention structure SENet is added to form a SEBlock after every two convolutional layers in the neural network. This structure is equivalent to assigning a separate weight to each channel feature map, which increases the number of parameters in the model, but enhances the non-local feature extraction capability of the model. The SENet structure is shown in

Fig. 5 SENet architecture
图5 SENet结构
After obtaining the hyperspectral denoising model, we can train the model based on the Neighbor2Neighbor self-supervised learning method. According to the optimization objective shown in
, | (21) |
where is a stochastic downsampling function and is a noise reduction neural network. is a penalty term (fixed at 5 in this paper) used to balance the level of detail preserved in the denoising results. To keep the gradient stable, the gradients of and are not propagated during training. This training process is shown in

Fig. 6 Self-supervised training process
图6 自监督训练过程
For denoising models with supervised learning methods, the models cannot be trained in the absence of clean images as labels. For the compressed perceptual image reconstruction model with supervised learning method, the training data containing noise will make the model worse (we will discuss this content in the subsequent section). As can be seen from
Our proposed method can be divided into two steps. First, we train the hyperspectral denoising network based on the self-supervised approach. Then, this network is directly embedded into the PnP framework for self-supervised image reconstruction. In this section, we first describe the dataset and implementation details. Then, to evaluate the effectiveness, the proposed method is compared with models trained based on supervised learning approach. Furthermore, ablation studies are conducted to analyze the effect of hyperparameters on the results.
Model training was performed using the CAVE hyperspectral dataset, consisting of 32 scenes, each with a resolution of and containing a total of 31 bands from 400 to 800 . Five of the scenarios were selected as the test set and the remaining scenarios as the training set. The RGB images of the five test scenes are shown in

Fig. 7 RGB image of test scene
图7 测试场景RGB图像
Before comparing the reconstruction outcomes, it is essential to access both the supervised and the self-supervised denoising results to investigate the impact of the Neighbor2Neighbor self-supervised learning strategy on the denoising outcomes. The identical model, parameters, and training data are employed here, differing solely in the loss function and back-propagation gradient. For the supervised learning approach, the loss function is
. | (22) |
For the self-supervised learning approach, the loss function is shown in
With the maximum pixel value of 255, let the Gaussian noise variance be and the Poisson noise intensity be . In this paper, we compare four different noise degradation models, which are fixed Gaussian noise (), range Gaussian noise (), fixed Poisson noise (), and range Poisson noise (). Using peak signal to noise ratio (PSNR) and structural similarity (SSIM) as evaluation metrics. The five test scenario metrics were averaged and the final results are shown in
Noise type | Supervised Model | Self-supervised Model |
---|---|---|
38.12 dB, 0.951 | 39.33 dB, 0.978 | |
37.63 dB, 0.924 | 39.16 dB, 0.969 | |
38.28 dB, 0.964 | 40.25 dB, 0.983 | |
38.25 dB, 0.966 | 39.86 dB, 0.976 |
The self-supervised learning model, utilizing the Neighbor2Neighbor strategy, demonstrates superior performance compared to the supervised learning model across various noise degradation models. This superiority arises from the slight disparity between the labels of the self-supervised learning model and the desired denoising results of the model inputs. Additionally, with the incorporation of the penalty term in the loss function, the model achieves enhanced generalization and performs better on the test set.
The hyperparameter γ is used to avoid overly smooth denoising results. When , it means that the penalty term in the loss function does not exist. In such a case, and are not exactly the same, and the model tends to output the average value of and because the loss function achieves its minimum value at this point. In order to verify the influence of the superparameter on the denoising results, the noise in the training data is unchanged (this paper uses the Gaussian noise with variance of 25 and the Poisson noise with intensity of 30), and only the value of the superparameter is changed to verify the denoising effect of different models on the test set. The final results are shown in
PSNR / dB () | SSIM () | PSNR / dB () | SSIM () | |
---|---|---|---|---|
0 | 39.27 | 0.970 | 40.26 | 0.983 |
1 | 39.23 | 0.962 | 40.23 | 0.981 |
2 | 39.26 | 0.965 | 40.34 | 0.986 |
5 | 39.33 | 0.978 | 40.25 | 0.983 |
20 | 39.30 | 0.977 | 39.89 | 0.979 |
For Gaussian noise, the denoising effect is first analysed when and . When , the penalty term does not exist, the denoising result will be too smooth, and the performance of Self-HSIDeCNN is not optimal. However, because of the presence of downsampling and upsampling modules in Self-HSIDeCNN, the denoising outcomes exhibit excessive smoothness in the downsampled subgraphs. However, despite this, the high-frequency information in the final output image of the model remains well-preserved after upsampling the subgraphs. This explains why the denoising effect becomes worse at (the presence of the penalty term makes the denoising inadequate). At , although a small amount of noise is not removed due to the increase in γ, the original information of the image is better recovered (compared to ). At , more noise is retained along with the high-frequency detail information of the image, so the denoising effect becomes worse again.
For Poisson noise, the trend is similar to that of Gaussian noise, but its optimum is achieved at . This is because the model has a better denoising effect on Poisson noise. At this point, only a smaller is needed to alleviate the problem of over-smoothing caused by the self-supervised learning method.
In this paper, we fix for experiments, and the value of should be determined according to the scene characteristics and experimental results in practical applications.
The self-supervised learning model Self-HSIDeCNN based on Neighbor2Neighbor in 3.2 can be directly embedded in deep plug-and-play architectures for compressive sensing image reconstruction. We name it PnP-Self-HSIDeCNN. Deep plug-and-play frameworks often require a warm start to speed up convergence. Here, the GAP-TV denoiser is used for 90 iterations first, and then the denoiser is switched to the self-supervised learning model for better reconstruction results. In addition, for the estimated noise level at the time of reconstruction, it was set to 30 regardless of the noise degradation model (normalised to 0 to 1). As the iteration progresses, this parameter can be gradually decreased to enhance the reconstruction quality. We employ another reconstruction algorithm named PnP-HSI, which utilizes a denoising model trained through supervised learning with the PnP framework.
To compare with end-to-end neural networks, we selected both U-net and TSA-net models. U-net consists of two main components: an encoder and a decoder. Each coding block contains two convolutional layers and a maximum pooling layer using the ReLU activation function. TSA-net, directly uses the structure from the Ref. [
The final comparison results are shown in
U-net | TSA-net | PnP-HSI | Ours () | Ours () | Ours () | Ours () | |
---|---|---|---|---|---|---|---|
Scene1 | 26.29 dB, 0.843 | 26.47 dB, 0.855 | 29.56 dB, 0.875 | 31.12 dB, 0.892 | 30.44 dB, 0.843 | 28.34 dB, 0.788 | 28.51 dB, 0.798 |
Scene2 | 37.20 dB, 0.941 | 36.98 dB, 0.935 | 37.59 dB, 0.959 | 39.62 dB, 0.959 | 37.01 dB, 0.921 | 37.55 dB, 0.963 | 38.39 dB, 0.966 |
Scene3 | 33.87 dB, 0.918 | 35.07 dB, 0.917 | 36.27 dB, 0.924 | 35.89 dB, 0.916 | 34.31 dB, 0.888 | 34.14 dB, 0.908 | 34.25 dB, 0.909 |
Scene4 | 32.99 dB, 0.910 | 33.16 dB, 0.929 | 34.88 dB, 0.933 | 35.22 dB, 0.928 | 34.13 dB, 0.870 | 32.96 dB, 0.910 | 33.02 dB, 0.903 |
Scene5 | 20.11 dB, 0.788 | 21.14 dB, 0.794 | 21.55 dB, 0.797 | 23.89 dB, 0.838 | 23.94 dB, 0.808 | 23.06 dB, 0.810 | 22.70 dB, 0.802 |
Mean | 30.09 dB, 0.880 | 30.56 dB, 0.886 | 31.97 dB, 0.898 | 33.15 dB, 0.907 | 31.97 dB, 0.866 | 31.21 dB, 0.876 | 31.37 dB, 0.876 |

Fig. 8 Comparison of the visual effect of the reconstruction results of different algorithms:(a) RGB image; (b) 2D measurements; (c) ground truth; (d) U-net; (e) TSA-net; (f) PnP-HIS; (g) PnP-Self-HSIDeCNN
图8 不同算法重建结果视觉效果对比:(a)RGB图像; (b)二维测量值; (c)真值图像; (d)U-net; (e)TSA-net; (f)PnP-HIS; (g)PnP-Self-HSIDeCNN

Fig. 9 Comparison of visual reconstruction effects in the final band:(a) ground truth; (b) U-net; (c) TSA-net; (d) PnP-HIS; (e) PnP-Self-HSIDeCNN
图9 最后一个谱段图像重建视觉效果比较:(a)真值图像; (b)U-net; (c)TSA-net; (d)PnP-HIS; (e)PnP-Self-HSIDeCNN
In the evaluation of hyperspectral images, we not only consider spatial metrics but also emphasize spectral metrics.

Fig. 10 Comparison of spectral curve reconstruction results:(a) RGB image; (b) spectral curve comparison
图10 光谱曲线重建效果对比:(a)RGB图像; (b)光谱曲线对比
Based on the aforementioned metrics, it is evident that the self-supervised reconstruction algorithm proposed in this paper, based on the deep plug-and-play framework, yields superior results across various metrics. However, in terms of runtime, the end-to-end neural networks U-net and TSA-net hold a significant advantage, completing reconstruction in less than 1 second after training, whereas algorithms utilizing deep plug-and-play frameworks require several minutes due to the iterative process. Nevertheless, runtime durations at the minute level are generally acceptable in practical applications. Moreover, the method circumvents the necessity for clean data as labels during training, thereby substantially mitigating the issue of inadequate training data for compressive sensing deep learning reconstruction models in real-world scenarios.
In real-world application scenarios, data often contains noise, and ideally, clean data with minimal noise levels is preferred. In our proposed self-supervised reconstruction model, spectral data cubes containing noise can be obtained from real scenarios and then utilized for training to ensure generalization. However, for supervised deep learning models, training becomes challenging due to the absence of clean images as labels. One approach is to directly train with noisy images as labels, but this typically leads to a degradation in model performance. To evaluate the generalizability of the models, Gaussian noise was introduced to the CAVE dataset to simulate real-world scenarios. U-net and TSA-net were retrained and tested using this noisy data. Conversely, self-supervised networks, which inherently incorporate noise during the training process, can be directly tested with noisy data. The resulting performance is summarized in
U-net | TSA-net | PnP-HSI | Ours () | |
---|---|---|---|---|
Scene1 | 25.89 dB, 0.824 | 25.88 dB, 0.838 | 29.52 dB, 0.873 | 31.13 dB, 0.890 |
Scene2 | 36.41 dB, 0.931 | 35.49 dB, 0.927 | 37.54 dB, 0.957 | 39.59 dB, 0.957 |
Scene3 | 31.78 dB, 0.887 | 33.80 dB, 0.904 | 36.23 dB, 0.919 | 35.84 dB, 0.912 |
Scene4 | 31.03 dB, 0.901 | 32.03 dB, 0.917 | 34.80 dB, 0.934 | 35.21 dB, 0.929 |
Scene5 | 20.14 dB, 0.773 | 20.50 dB, 0.787 | 21.56 dB, 0.798 | 23.84 dB, 0.834 |
Mean | 29.05 dB, 0.863 | 29.54 dB, 0.874 | 31.93 dB, 0.899 | 33.12 dB, 0.904 |
Difference | -1.04 dB, -0.017 | -1.02 dB, -0.012 | -0.04 dB, -0.001 | -0.03 dB, -0.003 |
Percentage | -3.46%, -1.93% | -3.34%, -1.35% | -0.13%, -1.11% | -0.09%, -0.33% |
From
In 3.2, we verified the effect of the hyperparameter γ on the model’s denoising results. Here, we explore its impact on the hyperspectral image reconstruction. Using the denoising model from Section 3.2 directly and maintaining other parameters in the PnP-Self-HSIDeCNN algorithm unchanged, we present the reconstruction results in
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |
---|---|---|---|---|---|---|---|---|---|---|
Scene1 | 30.92 | 0.862 | 30.42 | 0.862 | 30.92 | 0.878 | 29.70 | 0.785 | 29.89 | 0.855 |
Scene2 | 37.19 | 0.921 | 38.54 | 0.940 | 40.19 | 0.951 | 35.50 | 0.921 | 37.67 | 0.944 |
Scene3 | 33.93 | 0.888 | 36.28 | 0.908 | 36.36 | 0.913 | 32.74 | 0.888 | 34.28 | 0.892 |
Scene4 | 33.98 | 0.883 | 35.07 | 0.899 | 35.47 | 0.894 | 33.19 | 0.861 | 35.38 | 0.916 |
Scene5 | 22.97 | 0.769 | 23.86 | 0.833 | 23.80 | 0.843 | 23.29 | 0.791 | 23.18 | 0.803 |
Mean | 31.79 | 0.865 | 32.83 | 0.888 | 33.35 | 0.896 | 30.88 | 0.849 | 32.08 | 0.882 |
For scenes 1 to 4, an appropriate penalty term can enhance the models’ reconstruction quality. However, excessively large values of can degrade the model’s denoising effect (even worse than without the penalty term), consequently affecting the final hyperspectral compressed perceptual image reconstruction. However, for scene 5, the model performance is better when is not zero. This is attributed to scene 5’s rich colour information and more complex spatial-spectral curves, consequently affecting the final hyperspectral compressed perceptual image reconstruction. The comparison of Tables
In the aforementioned comparison, the estimated noise level σ is set to 30, irrespective of the noise degradation model. This approach is chosen to facilitate direct model usage for reconstruction without the need for tedious parameter adjustment processes. Consequently, the reconstruction results of the PnP-Self-HSIDeCNN algorithm in the aforementioned experiments represent expected performance in practical use rather than optimal performance. To explore the model's optimal performance, Gaussian fixed noise () is utilized during model training, and two noise estimation strategies are employed during model testing: fixed estimation of the noise level (fixed at 100, 50, 30, and 5, respectively), and dynamic estimation of the noise level (gradually decreasing as iterations progress). The experimental results are presented in
PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | PSNR/dB | SSIM | |
---|---|---|---|---|---|---|---|---|---|---|
Scene1 | 27.57 | 0.750 | 30.21 | 0.852 | 31.00 | 0.888 | 31.28 | 0.897 | 32.01 | 0.899 |
Scene2 | 33.56 | 0.921 | 38.16 | 0.941 | 39.38 | 0.955 | 40.18 | 0.973 | 40.32 | 0.977 |
Scene3 | 32.00 | 0.888 | 34.69 | 0.888 | 35.77 | 0.911 | 36.20 | 0.928 | 36.83 | 0.932 |
Scene4 | 31.87 | 0.861 | 34.59 | 0.912 | 35.13 | 0.927 | 35.50 | 0.929 | 36.14 | 0.941 |
Scene5 | 21.58 | 0.749 | 22.97 | 0.803 | 23.75 | 0.832 | 24.29 | 0.850 | 25.88 | 0.859 |
Mean | 29.32 | 0.834 | 32.12 | 0.879 | 33.01 | 0.903 | 33.49 | 0.915 | 34.23 | 0.921 |
Scene1 | 53.10 s | 96.57 s | 99.09 s | 105.39 s | 117.74 s |
Scene2 | 51.21 s | 91.74 s | 95.67 s | 101.17 s | 110.92 s |
Scene3 | 48.69 s | 84.51 s | 89.87 s | 94.28 s | 105.03 s |
Scene4 | 51.84 s | 91.35 s | 96.27 s | 103.77 s | 116.19 s |
Scene5 | 55.62 s | 104.61 s | 112.84 s | 119.07 s | 145.56 s |
Mean | 52.09 s | 93.76 s | 98.75 s | 104.74 s | 119.09 s |
When = 100, it significantly exceeds the noise level present in the training data for the denoising model. At this point, the results converge quickly but with poor reconstruction quality. Smaller σ values yield better reconstruction results but require longer runtimes. Compared to fixed noise estimation levels, allowing the noise estimation level to gradually decrease with iterations achieves optimal reconstruction results, albeit at the expense of longer runtimes (since σ decreases as iterations progress). In summary, the value of σ should be selected based on specific requirements. Larger σ values can be used to expedite reconstruction when strict reconstruction quality is not necessary.
Deep plug-and-play frameworks typically require a hot-start to expedite convergence. In the aforementioned experiments, the GAP-TV denoiser is employed for 80 iterations before transitioning to a self-supervised learning model to achieve improved reconstruction results. However, the GAP-TV noise reducer doesn't necessarily need 80 iterations to reach its denoising performance limit. Thus, reducing the number of hot-start iterations could accelerate reconstruction. However, ending hot-start iterations prematurely may result in suboptimal final image reconstruction. To investigate this, we select a scene with a fixed noise estimation level of 30 and vary the number of hot-start iterations to study their effects on reconstruction results and speed. The experimental results are presented in
Number of warm start iterations | Warm start time / s | Self-supervised reconstruction time / s | Total reconstruction time/ s | PSNR / dB | SSIM |
---|---|---|---|---|---|
0 | 0 | 39.06 | 39.06 | 33.79 | 0.828 |
10 | 5.24 | 40.95 | 46.19 | 34.33 | 0.867 |
20 | 10.04 | 39.69 | 49.73 | 34.43 | 0.867 |
30 | 15.76 | 40.95 | 56.71 | 34.44 | 0.871 |
40 | 20.68 | 40.32 | 61.00 | 34.42 | 0.869 |
50 | 26.41 | 40.32 | 66.73 | 34.25 | 0.869 |
60 | 31.23 | 39.69 | 70.92 | 34.21 | 0.867 |
70 | 36.44 | 39.69 | 76.13 | 34.17 | 0.864 |
80 | 41.67 | 40.32 | 81.99 | 34.15 | 0.869 |
90 | 46.88 | 40.95 | 87.83 | 34.12 | 0.869 |
100 | 52.27 | 39.69 | 91.96 | 34.12 | 0.867 |
As can be seen from
In this paper, we propose a self-supervised learning method, PnP-Self-HSIDeCNN, based on a deep plug-and-play framework and neighborhood sampling strategy. Unlike traditional methods, it only requires noisy images for training and fully exploits the inter-spectral correlation of hyperspectral images through the SENet structure. Comparing the results across visual evaluation, PSNR, SSIM, and previous end-to-end neural network reconstructions, the self-supervised learning method proposed herein achieves commendable reconstruction results with robust generalization within acceptable runtimes. We conduct a detailed and comprehensive comparison test on three hyperparameters of the loss function — penalty factor γ, noise level estimation parameter σ, and number of hot-start iterations — to fully explore their impacts on final reconstruction results in terms of speed and quality, thereby laying a relevant foundation for the algorithm's future practical applications. Our findings demonstrate that self-supervised learning can yield satisfactory performance even with limited or poor-quality data, providing a feasible approach for the future application of compressed sensing and CASSI systems in real-world scenes. However, compared to end-to-end neural networks, our approach lacks in real-time performance. Also, real-world noise is far more complex than Gaussian and Poisson noise. For future work, we would like to reduce the iteration time of our method and extend it to real-world data.
References
Wang Jian-Yu, Shu Rong, Liu Yin-nian, et al. Introduction to imaging spectroscopy [M]. Science Press (王建宇,舒嵘,刘银年,等.像光谱技术导论[M].科学出版社), 2011:1-3,105-107. [Baidu Scholar]
Wu P Z. Characteristics and applications of satellite-borne hyperspectral imaging spectrometer [J]. Remote Sensing of Land Resources, 1999(3):10. [Baidu Scholar]
Ouyang Z Y. Ouyang Ziyuan: Scientific objectives of China's lunar exploration project [J]. Proceedings of the Chinese Academy of Sciences(欧阳自远.欧阳自远:中国探月工程的科学目标[J].中国科学院院刊), 2006, (05): 370-371. [Baidu Scholar]
Dun X, Fu Q, Li H T, et al. Advances in the frontiers of computational imaging [J]. Chinese Journal of Image Graphics, 2022, 27 (6): 37. [Baidu Scholar]
Donoho D L. Compressed sensing [J]. IEEE Transactions on Information Theory, 2006, 52 (4): 1289-1306. 10.1109/tit.2006.871582 [Baidu Scholar]
Gehm M E, John R, Brady D J, et al. Single-shot compressive spectral imaging with a dual-disperser architecture [J]. Optics Express, 2007, 15 (21): 14013-14027. 10.1364/oe.15.014013 [Baidu Scholar]
Wagadarikar A , John R , Willett R ,et al. Single disperser design for coded aperture snapshot spectral imaging [J]. Applied Optics, 2008, 47(10):B44-51. 10.1364/ao.47.000b44 [Baidu Scholar]
Meng Z, Qiao M, Ma J, et al. Snapshot hyperspectral endomicroscopy[J]. Optics Letters, 2020, 45 (14). 10.1364/ol.393213 [Baidu Scholar]
Candes E J , Wakin M B. An introduction to compressive sampling[J]. IEEE Signal Processing Magazine, 2008, 25(2):21-30. 10.1109/msp.2007.914731 [Baidu Scholar]
Tropp J A , Gilbert A C .Signal recovery from random measurements via orthogonal matching pursuit[J].IEEE Transactions on Information Theory, 2007, 53(12):4655-4666. 10.1109/tit.2007.909108 [Baidu Scholar]
Blumensath T , Davies M E. Iterative hard thresholding for compressed sensing[J]. Applied & Computational Harmonic Analysis, 2009, 27( 3):265-274. 10.1016/j.acha.2009.04.002 [Baidu Scholar]
Carrillo R.E, Barner, et al. Lorentzian iterative hard thresholding: robust compressed sensing with prior information[J].Signal Processing, 2013, 61 (19): 4822-4833. 10.1109/tsp.2013.2274275 [Baidu Scholar]
Mun S, Fowler J E. Block compressed sensing of images using directional transforms [C]. Image Processing (ICIP 2009), 2009. 10.1109/icip.2009.5414429 [Baidu Scholar]
Sun Y, Chen J, Liu Q, et al. Learning image compressed sensing with sub-pixel convolutional generative adversarial network[J]. Pattern Recognition, 2019, 98 (12): 107051. 10.1016/j.patcog.2019.107051 [Baidu Scholar]
Xiong Z, Shi Z, Li H, et al. HSCNN: CNN-based hyperspectral image recovery from spectrally undersampled projections[J]. IEEE Computer Society, 2017:518-525. 10.1109/iccvw.2017.68 [Baidu Scholar]
Choi I, Jeon D S, Nam G, et al. High-quality hyperspectral reconstruction using a spectral prior [C]. International Conference on Computer Graphics and Interactive Techniques. ACM, 2017. 10.1145/3130800.3130810 [Baidu Scholar]
Wang L, Zhang T, Fu Y, et al. HyperReconNet: joint coded aperture optimization and image reconstruction for compressive hyperspectral imaging[J]. IEEE Transactions on Image Processing, 2019, 28(5): 2257-2270. 10.1109/tip.2018.2884076 [Baidu Scholar]
Miao X, Yuan X, Pu Y, et al. l-net: Reconstruct hyperspectral images from a snapshot measurement[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 4059-4069. 10.1109/iccv.2019.00416 [Baidu Scholar]
Meng Z, Ma J, Yuan X. End-to-end low cost compressive spectral imaging with spatial-spectral self-attention[C]. European Conference on Computer Vision. Springer, Cham, 2020. 10.1007/978-3-030-58592-1_12 [Baidu Scholar]
Zheng S, Liu Y, Meng Z, et al. Deep plug-and-play priors for spectral snapshot compressive imaging[J]. Photonics Research, 2020, 9 (2): I0011-I0022. 10.1364/prj.411745 [Baidu Scholar]
Wang L, Wu Z, Zhong Y, et al. Spectral compressive imaging reconstruction using convolution and spectral contextual transformer [J]. arXiv e-prints, 2022. 10.1364/prj.458231 [Baidu Scholar]
Chen Z, Cheng J. Proximal gradient descent unfolding dense-spatial spectral-attention transformer for compressive spectral imaging [J]. arXiv preprint arXiv:2312.16237, 2023. [Baidu Scholar]
Chen Y, Lai W, He W, et al. Hyperspectral compressive snapshot reconstruction via coupled low-rank subspace representation and self-supervised deep network[J]. IEEE Transactions on Image Processing, 2024. 10.1109/tip.2024.3354127 [Baidu Scholar]
Luo F, Chen X, Gong X, et al. Dual-window multiscale transformer for hyperspectral snapshot compressive imaging[C]. Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(4): 3972-3980. 10.1609/aaai.v38i4.28190 [Baidu Scholar]
Takabe T, Han X H, Chen Y W. Deep versatile hyperspectral reconstruction model from a snapshot measurement with arbitrary masks[C]. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: 2390-2394. 10.1109/icassp48485.2024.10445895 [Baidu Scholar]
Yang S, Ding X, Yuan H, et al. Reconstruction quality evaluation of compressed sensing image mapping spectrometer[C]. AOPC 2023: Computing Imaging Technology. SPIE, 2023, 12967: 57-65. 10.1117/12.3007797 [Baidu Scholar]
Boyd S , Parikh N , Chu E ,et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations & Trends in Machine Learning, 2010, 3(1):1-122. 10.1561/2200000016 [Baidu Scholar]
Yuan X .Generalized alternating projection based total variation minimization for compressive sensing[J]. IEEE, 2015. 10.1109/icip.2016.7532817 [Baidu Scholar]
Lehtinen J, Munkberg J, Hasselgren J, et al. Noise2Noise: learning image restoration without clean data[J]. arXiv e-prints: 1803.04189, 2018. [Baidu Scholar]
Huang T, Li S, Jia X, et al. Neighbor2neighbor: Self-supervised denoising from single noisy images[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14781-14790. 10.1109/cvpr46437.2021.01454 [Baidu Scholar]
Zhang K, Zuo W, Zhang L. FFDNet: Toward a fast and flexible solution for cnn based image denoising[J]. IEEE Transactions on Image Processing, 2017: 1-1. 10.1109/tip.2018.2839891 [Baidu Scholar]