Abstract
To ensure the fusion quality and efficiency simultaneously, a novel image fusion method based on multi-scale Gaussian filtering and morphological transform is proposed. The multi-scale Gaussian filtering is designed to decompose the source images into a series of detail images and approximation images. The multi-scale top- and bottom-hat decompositions are used respectively to fully extract the bright and dark details of different scales in each approximation image. The multi-scale morphological inner- and outer-boundary decompositions are constructed to fully extract boundary information in each detail image. Experimental results demonstrate that the proposed method is comparable to or even better in comparison with typical multi-scale decomposition-based fusion methods. Additionally, the method operates much faster than some advanced multi-scale decomposition-based methods like NSCT and NSST.
Visible, infrared, and infrared polarization images individually captured by different sensors present complementary information of the same scene, and they could be combined by the image fusion technology to obtain a new, more accurate, comprehensive, and reliable image description of the scene
For the decomposition schemes, various methods are proposed, like the discrete wavelet transform (DWT)
Fusion rules generally include low and high frequency coefficients fusion rules. The AVG-ABS rule is a simple fusion rule which uses the average rule to combine low-frequency coefficients and uses the absolute maximum rule to combine the high-frequency coefficients. The AVG-ABS fusion rule is computed easily and implemented simply, however, it always causes distortions and artifacts
To ensure both the fusion quality and computational efficiency simultaneously, a novel multi-scale decomposition-based fusion method with dual decomposition structures is proposed. Our method is dedicated to improving the image fusion quality and efficiency from the aspect of image decomposition scheme, while for the rule aspect, our method only uses simple AVG-ABS rule. Firstly, inspired by the idea of constructing octaves in SIF
The theory and mathematical representation for constructing multiresolution pyramid transform scheme are presented by Ref.25 and extended by Ref.26. A domain of signals is assigned at each level, the analysis operators maps an image to a higher level in the pyramid, while the synthesis operator maps an image to a lower level in the pyramid, i.e. and . The detail signal contains information of x which does not exist in , where and is a subtraction operator mapping into the set . The decomposition process of an input image f is expressed as Eq.1:
, | (1) |
where
. | (2) |
And the reconstruction process through the backward recursion is expressed as Eq.3:
, | (3) |
Eq.1 and Eq.3 are called the pyramid transform and the inverse pyramid transform respectively.
The scale space of an image can be generated through convolving the image with Gaussian filters, and it has been successfully applied in SIFT
Inspired by the above algorithms, we have the source image repeatedly convolved with Gaussian filters whose standard deviation and size increase simultaneously to construct undecimated pyramid structure. Then, the DoG images are produced by subtracting adjacent Gaussian images. Accordingly, the transform scheme of such pyramid is given by Eq.4:
, | (4) |
where
. | (5) |
is the Gaussian kernels (filters) with a size of in this paper, and denotes the convolution operation. The parameter is the standard deviation, which is increasing with , and in this paper . Then the source image f can be decomposed into an approximation image and a set of detail images as shown in scheme 1, and it can also be exactly reconstructed through the following recursion:
, , . | (6) |
The four-level decomposition scheme is illustrated in

Fig.1 Example of four-level decomposition by multi-scale Gaussian filtering
图1 用多尺度高斯滤波进行4层分解的示例
The multi-scale top-hat transform using structuring elements with up-scaling size can extract the light and dark details at different image scales in image fusion
The multi-scale morphological bottom-hat transform and its inverse are shown as follows
, | (7) |
. | (8) |
where the analysis operator is morphological closing operation , with also increasing with . The morphological outer-boundary transform and its inverse are similar to the bottom-hat transform and its inverse, with being replaced by dilation operation .
The proposed fusion method comprises three processes that are multi-scale decomposition, fusion, and reconstruction.
The K-level decomposition of a given source image by the scheme (4) has the form
, | (9) |
where represents the detail image at level and denotes the approximation image of this multi-scale structure.
is a coarse representation of and usually inherit a few bright and dark details, thus the multi-scale top- and bottom-hat decompositions are used to extract bright objects on a dark background and dark objects on a bright background of different scales, respectively. Henceforth, can be decomposed by schemes mentioned in subsection 1.3 as
, | (10) |
where and represent the detail images at level l obtained by the top- and bottom-hat decomposition process, respectively. And and denote the approximation images of the multi-scale top- and bottom-hat structure, respectively.

Fig.2 Example of three-level top- and bottom-hat decompositions of the input image
图2 对输入图像f (5)进行3层顶、底帽分解的示例
The detail image in scheme 9 comprises various details like edges and lines, thus the multi-scale inner- and outer-boundary transforms mentioned in subsection 1.3 are used to extract inner-boundary as well as outer-boundary information of different scales. Hence, can be decomposed as
, | (11) |
where and represent the detail images at level l of that are obtained by the inner- and outer-boundary decomposition process, respectively. and are the approximation images of at the highest level of the multi-scale inner- and outer-boundary structure, respectively.

Fig.3 Example of three-level inner and outer-boundary decompositions of the input image .
图3 对输入图像y2进行3层内、外边缘分解的示例
In this paper, the composite approximation coefficients of the approximation image in the multi-scale top- and bottom-hat structures take the average of the approximation of the sources. For the composite detail coefficients of the detail images, the absolute maximum selection rule is used.
The vector coordinate is used here to denote the location of an image. For instance, represents the detail coefficient for the multi-scale top-hat structure at location within level l of source image A. And the notation will be used to denote an image, e. g., refers to the detail image.
The arbitrary fused detail coefficient and the fused approximation coefficient of the multi-scale top-hat structure are obtained through
. | (12) |
The weights and take 0.5, which preserves the mean intensity of the two source images. Likewise, and of the multi-scale bottom hat structure are obtained through
, | (13) |
with .
The selective rule in Eq.12 means that we choose the brighter ones in the bright details, and the selective rule in Eq.13 means that we choose the darker ones in the dark details. In this way, the bright and dark details of different scales can be fully extracted and hence the contrast at each layer can be improved.
For an arbitrary fused detail coefficient of the multi-scale inner-boundary structures, we only use the absolute maximum selection rule:
. | (14) |
So is the fused approximation coefficient . In such way, the boundary information such as edges and lines of different scales can be well preserved. Likewise, arbitrary and of the multi-scale outer-boundary structures are also obtained by the absolute maximum selection rule.
According to Eqs.6 and 8, the reconstruction of the approximation image can be obtained through the multi-scale top- and bottom-hat inverse transforms as
, | (15) |
which means both bright and dark information are of equal importance to the source image. In addition, we attach equal importance to the features of different scale levels, thus the weights in Eq.15 are set to be .
Similarly, inner- and outer-boundary information are considered to be equally important to the source image, and so are the features of different scale levels. Thus, according to Eqs.6 and 8, the reconstruction of an arbitrary detail image through the multi-scale inner- and outer-boundary inverse transforms can be obtained as
. | (16) |
At last, the fused image can be reconstructed by
. | (17) |
In order to validate the performance of the proposed method, experiments are conducted on two categories of source images including ten pairs of infrared-visible images (

Fig. 4 The two kinds of source images (a) infrared-visible images, (b) infrared intensity-polarization images
图4 两类源图像(a)红外与可见光图像,(b)红外光强与偏振图像
Various pixel-level multi-scale decomposition-based methods including DWT, DTCWT, SWT, WPT, NSCT, and NSST are compared with the proposed method. All the compared methods adopt the simple AVG-ABS rule. According to Ref.13, most of the methods mentioned above perform well when the decomposition levels for them are set to 3. Thus, for purpose of making reliable and persuasive comparisons, the decomposition levels for the methods mentioned above are all set to 3. And to make each method achieve a good performance the other parameters are also suggested by Ref. 13, some of which are listed in
For NSST, the size of the local support of shear filter at each level are selected as 8, 16, 32. As for the proposed method, the parameters and k for the multi-scale Gaussian filtering process in Eq. 5 are selected experimentally. In this experiment, the source images are decomposed by 3-layer multi-scale Gaussian decomposition, and different fused images are obtained by changing the parameters and k. During the fusion process, the AVG-ABS rule is also adopted. When and k are in certain value, every fusion image will be evaluated by seven objective assessment metrics (mentioned in subsection 3.2). For each metric, its mean value is obtained by averaging the evaluation results of the fusion images. Then, seven mean values are summed to get the sum values of objective metrics.

Fig. 5 Estimation of the parameters and k for (a) infrared-visible images, (b) infrared intensity-polarization images
图5 参数σ0和k的估算(a)红外与可见光图像,(b)红外光强与偏振图像
Seven representative metrics, i.e., Q0
In this section, the subjective assessment of the fusion methods is done by comparing the visual results obtained from the above and proposed methods. One sample pair in each type of source images are selected for visual comparison as shown in Figs.

Fig. 6 Fusion results of one pair of the infrared-visible images (a) infrared image, (b) visible image, (c)-(i) the fusion results of the DWT, DTCWT, SWT, WPT, NSCT, NSST, and the proposed methods.
图6 红外与可见光图像中一对图像的融合结果(a)红外图像,(b)可见光图像,(c)-(i)依次为DWT、DTCWT、SWT、WPT、NSCT、NSST和所提方法的融合结果

Fig. 7 Fusion results of one pair of the infrared intensity-polarization images (a) Infrared intensity image, (b) Infrared polarization image, (c)-(i) the fusion results of the DWT, DTCWT, SWT, WPT, NSCT, NSST, and the proposed methods.
图7 红外光强与偏振图像中一对图像的融合结果(a)红外光强图像,(b)红外偏振图像,(c)-(i)依次为DWT、DTCWT、SWT、WPT、NSCT、NSST和所提方法的融合结果
In
The edges of the car are distorted heavily in
The above experiments confirm that the proposed method performs better in visual effect for the two categories of source images. Although adopting the simple AVG-ABS rule, the proposed method does not generate certain artifacts or distortions and simultaneously preserves the detail information of source images as much as possible.
The objective assessment of the seven multi-scale decomposition-based methods are shown in
To verify the efficiency of the proposed method, an experiment is conducted on the image sequences named as “Nato_camp”, “Tree”, and “Dune” from the TNO Image Fusion Dataset
Experiments on both visual quality and objective assessment demonstrate that although adopting the simple AVG-ABS rule, the proposed method does not generate certain artifacts or distortions and performs very well in aspects like information preservation and contrast improvement. Under the premise of ensuring image fusion quality, the proposed method is also proved computationally efficient. The proposed method provides an option for the fusion situations needing both high quality and particularly computational efficiency, such as fast high-resolution images fusion and video fusion.
Acknowledgements
This work is supported by National Natural Science Foundation of China (61672472 and 61702465), the Shanxi Province Science Foundation for Youths (201901D211238), the Program of Graduate Innovation in Shanxi Province (2019BY108), and Science Foundation of North University of China (SZ20190011).
References
Ma J, Ma Y, Li C. Infrared and visible image fusion methods and applications: A survey [J]. Information Fusion, 2018, 45(S1566253517307972). [百度学术]
Xin J, Qian J, Yao S, et al. A Survey of infrared and visual image fusion methods [J]. Infrared Physics & Technology, 2017, 85: 478-501. [百度学术]
Yang F, Wei H. Fusion of infrared polarization and intensity images using support value transform and fuzzy combination rules [J]. Infrared Physics & Technology, 2013, 60: 235-243. [百度学术]
Hu P, Yang F, Wei H, et al. Research on constructing difference-features to guide the fusion of dual-modal infrared images [J]. Infrared Physics & Technology, 2019, 102: 102994. [百度学术]
Li S, Kang X, Fang L, et al. Pixel-level image fusion: A survey of the state of the art [J]. Information Fusion, 2017, 33: 100-112. [百度学术]
Amolins K, Zhang Y, Dare P. et al. Wavelet based image fusion techniques: An introduction, review and comparison [J]. Isprs Journal of Photogrammetry & Remote Sensing, 2007, 62(4): 249-263. [百度学术]
Selesnick I W, Baraniuk R G, Kingsbury N C. The dual-tree complex wavelet transform [J]. IEEE Signal Processing Magazine, 2005, 22(6): 123-151. [百度学术]
Singh D, Garg D, Pannu H S, Efficient landsat image fusion using fuzzy and stationary discrete wavelet transform [J]. Journal of Photographic Science, 2017, 65(2): 108-114. [百度学术]
Walczak B, Bogaert B V D, Massart D L. Application of wavelet packet transform in pattern recognition of near-IR data [J]. Analytical Chemistry, 1996, 68(10): 1742-1747. [百度学术]
Da Cunha A L, Zhou J, Do M N. The nonsubsampled contourlet transform: theory, design, and applications [J]. IEEE Transactions on Image Processing, 2006, 15(10): 3089-3101. [百度学术]
Zhu Z, Zheng M, Qi G, et al. A phase congruency and local laplacian energy based multi-modality medical image fusion method in NSCT domain [J]. IEEE Access, 2019, 7: 20811 - 20824. [百度学术]
Ming Y, Wei L, Xia Z, et al. A novel image fusion algorithm based on nonsubsampled shearlet transform [J]. Optik - International Journal for Light and Electron Optics, 2014, 125(10): 2274-2282. [百度学术]
Li S, Yang B, Hu J. Performance comparison of different multi-resolution transforms for image fusion [J]. Information Fusion, 2011, 12(2): 74-84. [百度学术]
Li S, Kang X, Hu J. Image fusion with guided filtering [J]. IEEE Transactions on Image Processing, 2013, 22(7): 2864-2875. [百度学术]
Du J, Li W, Xiao B. Anatomical-functional image fusion by information of interest in local laplacian filtering domain [J]. IEEE Transactions on Image Processing, 2017, 26(12): 5855-5865. [百度学术]
Bhatnagar G, Wu J, Liu Z. Directive contrast based multimodal medical image fusion in NSCT domain [J]. IEEE Transactions on Multimedia, 2013, 9(6): 1014-1024. [百度学术]
Gong J, Wang B, Lin Q, et al. Image fusion method based on improved NSCT transform and PCNN model [C]. 2016 9th International Symposium on Computational Intelligence and Design (ISCID), 2016. [百度学术]
Ma T, Jie M, Bin F, et al. Multi-scale decomposition based fusion of infrared and visible image via total variation and saliency analysis [J]. Infrared Physics & Technology, 2018, 92: 154-162. [百度学术]
Yin M, Liu. X, Yu Liu,et al. Medical image fusion with parameter-adaptive pulse coupled-neural network in nonsubsampled shearlet transform domain [J]. IEEE Transactions on Instrumentation & Measurement, 2018, 68(1): 1-16. [百度学术]
Li Y, Sun Y, Huang X, et al. An image fusion method based on sparse representation and sum modified-laplacian in NSCT domain [J]. Entropy, 2018, 20(7): 522. [百度学术]
Lowe D G. Distinctive image features from scale-invariant key points [J]. International Journal of Computer Vision, 2004, 60(2): 91-110. [百度学术]
Bay H, Ess A, Tuytelaars T, et al. Speeded-up robust features (SURF) [J]. Computer Vision & Image Understanding, 2008, 110(3): 346-359. [百度学术]
Mukhopadhyay S, Chanda B. Fusion of 2D grayscale images using multiscale morphology [J]. Pattern Recognition, 2001, 34(10): 1939-1949. [百度学术]
Bai X, Gu S, Zhou F, et al. Multiscale top-hat selection transform based infrared and visual image fusion with emphasis on extracting regions of interest [J]. Infrared Physics & Technology, 2013, 60(5): 81-93. [百度学术]
Goutsias J, Heijmans H M. Nonlinear multiresolution signal decomposition schemes--part I: morphological pyramids [J]. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 2000, 9(11): 1862-1876. [百度学术]
Piella G. A general framework for multiresolution image fusion: from pixels to regions [J]. Information Fusion, 2003, 4(4): 259-280. [百度学术]
Wang Z, Bovik A C. A universal image quality index [J]. IEEE Signal Processing Letters, 2002, 9(3): 81-84. [百度学术]
Piella G, Heijmans H. A new quality metric for image fusion [C]. International Conference on Image Processing, 2003. [百度学术]
Hong R. Objective image fusion performance measure [J]. Military Technical Courier, 2000, 56(2): 181-193. [百度学术]
Roberts W J, Van J A A, Ahmed F. Assessment of image fusion procedures using entropy, image quality, and multispectral classification [J]. Journal of Applied Remote Sensing, 2008, 2(1): 1-28. [百度学术]
Qu G, Zhang D, Yan P. Information measure for performance of image fusion [J]. Electronics Letters, 2002, 38(7): 313-315. [百度学术]
Tamura H, Mori S, Yamawaki T. Textural features corresponding to visual perception [J]. IEEE Trans.syst.man.cybernet, 1978, 8(6): 460-473. [百度学术]
Han Y, Cai Y, Cao Y, et al. A new image fusion performance metric based on visual information fidelity [J]. Information Fusion, 2013, 14(2): 127-135. [百度学术]