# Multi-focus Noisy Image Fusion using Low-Rank Representation

###### Abstract

In the process of image acquisition, the noise is inevitable for source image. The multi-focus noisy image fusion is a very challenging task. There is no truly adaptive noisy image fusion approaches at present. As we all know, Low-Rank representation (LRR) is robust to noise and outliers and could capture the global structure of data. In this paper, we propose a novel method based on LRR for multi-focus noisy image fusion. To the best of our knowledge, this is the first time that the LRR is introduced to multi-focus image fusion. In the discrete wavelet transform(DWT) framework, the low frequency coefficients are fused by spatial frequency, the high frequency coefficients are fused by LRR coefficients. Finally, the fused image is obtained by inverse DWT. Experimental results demonstrate that the proposed algorithm can obtain state-of-the-art performance in both qualitative and quantitative evaluations when the source image contains noise.

###### keywords:

multi-focus image fusion, low-rank representation, noisy image fusion^{†}

^{†}journal: IET Image Processing

## 1 Introduction

Multi-focus image fusion is an important technique in image processing community. The main purpose of multi-focus image fusion is to generate a fused image by integrating complementary information from multiple source images of the same scene [1] (). In recent years, this technique has become an active task in image processing community, and it has been used in many fields, such as medical imaging, remote sensing and computer vision.

Multi-focus image fusion methods can be divided into two categories: non-representation learning-based methods and representation learning-based methods.

In non-representation learning-based fusion methods, multi-scale transforms are the most commonly methods, such as discrete wavelet transform (DWT) [2] (), contourlet [3] () and shearlet [4] (). Due to the wavelet transform has not enough detail preservation ability, in reference [5] (), non-sampled contourlet transform (NSCT) was applied to image fusion successfully. In addition, the morphology which is also a non-representation learning technique was applied to image fusion. Yu Zhang et al. [6] () proposed a fusion method based on morphological gradient. The detail information (like texture and edge) is obtained by different morphological gradient operator. Then this information is used to extract focus boundary region, focus region and defocus region, respectively. Finally, the fused image is obtained by using an appropriate fusion strategy.

In representation learning-based fusion methods, the convolutional neural network (CNN) technique and sparse representation (SR) technique both have various application in image processing. Yu Liu et al.[7] () proposed the CNN-based image fusion methods. The convolutional neural network (CNN) trained by high-quality image patches and their blurred versions. Then a decision map is obtained by the output of CNN which is binary matrix. Finally, the fused image is obtained by the decision map.

The sparse representation (SR) method [8] (); [9] () is a classical technique in representation learning-based methods. Although SR-based image fusion has great performance in some image fusion tasks, it still suffers from two main drawbacks: 1) the ability of detail preservation is limited; 2) high sensitivity to misregistration. Due to these drawbacks, Yu Liu et al. [10] () proposed a novel image fusion method based on convolutional sparse representation (CSR). The CSR-based image fusion can successfully overcome the above two drawbacks.

Besides the above two drawbacks, the SR-based image fusion method also cannot capture the global structure of image. Furthermore, when the source images contained some noise, the image fusion performance obtained by above fusion methods will become worse.

In order to address these problems, we apply a new representation learning technique, low-rank representation (LRR), to multi-focus image fusion task. As is well known to all, the LRR is robust to noise and outliers and could capture the global structure of data. So the LRR technique is a prefect tool for multi-focus noising image fusion. In this study, we propose a novel multi-focus image fusion method based on LRR in noisy image fusion task and this method will be introduced in Methods.

## 2 LRR theory

In order to capture the global structure of data, G. Liu et al. [13] () proposed a novel representation method, namely, low-rank representation(LRR).

In reference [13] (), authors apply self-expression model to avoid training a dictionary and the LRR problem will be solved by the following optimization problem,

(1) | |||

where denotes the observed data matrix, denotes the nuclear norm which is the sum of the singular values of matrix. is called as -norm, is the balance coefficient. Eq.1 is solved by the inexact Augmented Lagrange Multiplier (ALM). Finally, the LRR coefficients matrix for is obtained by Eq.1.

## 3 The Proposed Image Fusion method

In this section, we intend to propose a novel method based on LRR theory in DWT domain. Firstly, the source images are decomposed by DWT. In this paper, we choose the decomposition level of DWT as 2. Then we will get one low frequency coefficients matrix and six high frequency coefficients matrices, as shown in Figure 1.

Then, the spatial frequency (SF) and choose-max scheme are used to fuse the low frequency coefficients since the low frequency coefficients reflect the contour information of source image. The high frequency coefficients include more detail information of source image, so we choose the LRR to get a low rank matrix and use the nuclear norm and choose-max scheme to fuse the high frequency coefficients. Finally, the fused image is obtained by inverse DWT. The system diagram of the proposed method is introduced in Figure 2.

### 3.1 Fusion of low frequency coefficients

The low frequency coefficients contain more contour information and less detail texture information. Thus, the SF[7] is used to fuse low frequency coefficients. The SF is calculated by Eq.2 - 4,

(2) | |||||

(3) | |||||

(4) |

where and are spatial frequency of and directions, and are the row number and column number of the image.

By sliding window technique, the coefficient matrices are divided into k patches. Then the SF value of adjacent coefficient patches are obtained by Eq.2-4. Finally, we use the choose-max scheme to get the fused low frequency coefficients.

Let denotes the SF value of each patch, where denotes the SF value from which source images, and denotes -th patch in one image. Thus, the fused low frequency coefficients is obtained by Eq.5.

(5) |

### 3.2 Fusion of high frequency coefficients

Let denote the high frequency coefficients which are obtained by the 2-scale DWT, where represents the direction of decomposition and stand for horizontal, for diagonal, for vertical, and denotes the coefficients from which source image, represents the level of decomposition, each level will have 3 high frequency coefficients matrices. Thus, we will get 6 high frequency coefficients matrices. The fusion strategy of high frequency coefficients are shown in Figure 3.

By sliding window technique, each high frequency coefficients matrix is divided into patches. And the size of window is , denotes one patch, represents the level of DWT, denotes -th patch, represents the direction of decomposition. Then two low-rank coefficients matrices and are obtained by the LRR theory. The local fused high frequency coefficients matrix is obtained by compare the nuclear norm of corresponding low-rank coefficients matrices.

Suppose that there are local high frequency coefficients matrix . The self-expression model is applied, so we use itself as the dictionary.

(6) | |||

We choose the inexact ALM to solve the problem 6. Then we will get the low-rank coefficients matrix . The nuclear norm is obtained by computing the sum of the singular values of the matrix ,

(7) |

where denotes the low-rank representation for . As shown in Eq.7, we proposed to use nuclear norm of low-rank coefficients matrix as standard for fusion the high frequency coefficients. Finally, the choose-max scheme is applied to get fused high frequency coefficients .

### 3.3 Reconstruction of fused image

Having the fused coefficients and , the fused image F is reconstructed by inverse DWT, as shown in Figure 4.

The procedure of our method is described as follows.

1) The source images are decomposed by 2-level DWT. Then the low frequency coefficients and the high frequency coefficients are obtained, where , , .

2) By sliding window technique, the low frequency coefficients are divided into patches and the high frequency coefficients are divided into M patches.

3) For low frequency coefficients, we use SF and choose-max scheme to fuse these coefficients.

4) For high frequency coefficients, we use LRR theory method to compute the low-rank matrix , and we use and choose-max scheme to get fused high frequency coefficients.

5) Finally, the fused image is obtained by inverse DWT.

## 4 Experimental results and discussion

This section firstly presents the detailed experimental settings. Then we introduce how to choose the LRR parameter( in Eq.1) in different situation. Finally, the experimental results are analyzed visually and quantitatively.

### 4.1 Experimental settings

Firstly, we choose ten images from ImageNet in sport(http://www.image-net.org/index), as shown in Figure 5. We blur these images to get source images. Gaussian smoothing filter with size and is utilized to blur these images. Different regions of these images are blurred to build the source image sets, as shown in Fig.6. Furthermore, the noise of Gaussian (, and ), salt & pepper (the noise density is 0.01 and 0.02) and Poisson are utilized to process these blurred version images. Then we use these images to determine the parameter () in different situation and compare the results of fusion methods. The âimage1â contains three different noise are shown in Figure 6, Figure 7 and Figure 8.

Secondly, the performance of the proposed method is evaluated against other base line methods, including: cross bilateral filter fusion method (CBF)[11] (), discrete cosine harmonic wavelet transform fusion method(DCHWT)[12] (), and sparse-representation-based image fusion approach(SR)[9] ().

For the purpose of quantitative comparison between the proposed method and other fusion methods, two quality metrics are utilized. These are: Peak Signal to Noise Ratio (PSNR) and Structural Similarity (SSIM). In particular, PSNR and SSIM are reference image based approach. The fused image is better with the increasing numerical index of PSNR and SSIM.

In our experiment, the wavelet transform level of DWT as 2 and the sliding window size is 1616 which makes our method faster. The parameter of low-rank representation in different situation will be determined in next section.

### 4.2 Effects of parameter

In Eq.6, the parameter is used to balance the effects of the low rank part and sparse part. In this section, we choose image1-5(Figure 5 a-e) and their blured versions and noise versions as the source images to determine the parameter .

The range of is set [1,100] for Gaussian noise and salt & pepper noise, and the range of is set [1,20] for Poisson noise, the step is 0.5. The different situations include the source images contain Gaussian noise, the source images contain salt & pepper noise and the source images contain Poisson noise. We choose the average SSIM to determine the parameter which will be used in the next section.

As shown in Figure 9, when the source images contain Gaussian noise(a) and Poisson noise(c), the average SSIM will get maximum value at and . When the source images contain salt & pepper noise (b), the average SSIM is monotonically increasing with the value of .

Therefore, we choose when the source images contain salt & pepper noise, when the source images contain Gaussian noise and when the source images contain Poisson noise.

### 4.3 Image fusion results

In this section, fifteen pairs of images (adding different noise for source images) are used to assess the performance of these methods numerically. All the experiments are implemented in MTALAB R2016a on 3.2 GHz Intel(R) Core(TM) CPU with 4 GB RAM.

We introduce fusion results for three different blurred version of source images which are from ImageNet. The fused results of blurred images contain Gaussian noise present in Section 4.3.1. The fused results of blurred images with Salt & Pepper noise present in Section 4.3.2. And the fused results of blur images with Poisson noise present in Section 4.3.3.

#### 4.3.1 Fusion on images contain Gaussian noise

When the source images contain noise, such as Gaussian noise and salt & pepper noise, the fused results obtained by fusion methods have different performance. In this experimental, we choose different Gaussian noise which and to access performance. Just for example, the fused images of source images âimage6â which contain Gaussian noise () are shown in Figure 10. The values of PSNR and SSIM for five pairs of images are shown in Table 1 and Table 2.

0.0005 | 0.001 | |||||||

Method | CBF | DCHWT | SR | Proposed | CBF | DCHWT | SR | Proposed |

image6 | 32.4000 | 32.8961 | 30.5160 | 33.2502 | 30.7592 | 30.7740 | 28.5387 | 31.3097 |

image7 | 32.1316 | 32.3488 | 30.1345 | 32.1918 | 30.4012 | 30.3097 | 28.7027 | 30.4612 |

image8 | 32.7637 | 33.2852 | 31.2978 | 33.5828 | 30.9462 | 30.9732 | 29.3839 | 31.5098 |

image9 | 32.0450 | 32.8044 | 30.1370 | 32.5609 | 30.4113 | 30.6523 | 28.6399 | 30.7358 |

image10 | 32.4145 | 32.9560 | 31.3466 | 33.3068 | 30.7700 | 30.7795 | 29.3150 | 31.3362 |

Average | 32.3510 | 32.8581 | 30.6864 | 32.9785 | 30.6576 | 30.6977 | 28.9160 | 31.0705 |

0.0005 | 0.001 | |||||||

Method | CBF | DCHWT | SR | Proposed | CBF | DCHWT | SR | Proposed |

image6 | 0.8673 | 0.8602 | 0.8235 | 0.9021 | 0.7973 | 0.7843 | 0.7233 | 0.8323 |

image7 | 0.9169 | 0.9113 | 0.8775 | 0.9091 | 0.8685 | 0.8592 | 0.8202 | 0.8704 |

image8 | 0.9017 | 0.8949 | 0.8700 | 0.9193 | 0.8410 | 0.8299 | 0.7969 | 0.8629 |

image9 | 0.9046 | 0.8995 | 0.8530 | 0.9153 | 0.8499 | 0.8407 | 0.7930 | 0.8635 |

image10 | 0.9021 | 0.8967 | 0.8839 | 0.9292 | 0.8483 | 0.8380 | 0.8118 | 0.8730 |

Average | 0.8985 | 0.8925 | 0.8616 | 0.9150 | 0.8410 | 0.8304 | 0.7891 | 0.8604 |

As shown in Figure 10, the fused images obtained by proposed method and other fusion methods are listed. And the values of PSNR and SSIM are listed in Table 1 and Table 2. The best results are indicated in bold. As we can see, the proposed fusion method has the best values and the average PSNR and SSIM values of five pairs of images are also the best. The PSNR and SSIM indicate the difference between fused image and original image, the larger value means the fused image is more similar to original image. In Table 1 and Table 2, as we can see, when the source images contain more noise, the PSNR and SSIM values obtained by proposed method are larger. This means the fused images obtained by our proposed method are more similar to original image and is more natural than other methods.

#### 4.3.2 Fusion on images contain Salt & Pepper noise

Just like the Section A, we introduce the fused result of source images which contain different salt & pepper noise (the noise density is 0.01 and 0.02). Just for example, the fused images of source images âimage7â which contain salt & pepper noise(the noise density is 0.01) are shown in Figure 11. The values of PSNR and SSIM for five pairs of images are shown in Table 3 and Table 4.

density | 0.0005 | 0.001 | ||||||

Method | CBF | DCHWT | SR | Proposed | CBF | DCHWT | SR | Proposed |

image6 | 25.4770 | 26.6355 | 25.6136 | 27.5671 | 20.8861 | 22.2331 | 21.0055 | 22.1215 |

image7 | 24.9016 | 25.7566 | 24.5434 | 26.9044 | 19.8044 | 21.0056 | 19.6716 | 21.2779 |

image8 | 25.2189 | 26.2912 | 25.1051 | 27.2584 | 20.0331 | 21.3499 | 20.0683 | 21.4161 |

image9 | 25.2630 | 26.3317 | 25.1170 | 27.4325 | 20.2827 | 21.5675 | 20.2175 | 21.6672 |

image10 | 25.2509 | 26.3606 | 25.3373 | 27.3844 | 20.4722 | 21.8004 | 20.5579 | 21.8641 |

Average | 25.2223 | 26.2751 | 25.1433 | 27.3094 | 20.2957 | 21.5913 | 20.3042 | 21.6693 |

density | 0.0005 | 0.001 | ||||||

Method | CBF | DCHWT | SR | Proposed | CBF | DCHWT | SR | Proposed |

image6 | 0.7577 | 0.7676 | 0.7479 | 0.8221 | 0.48102 | 0.5168 | 0.5080 | 0.5404 |

image7 | 0.8375 | 0.8366 | 0.8004 | 0.8888 | 0.57490 | 0.5958 | 0.5615 | 0.6628 |

image8 | 0.8121 | 0.8180 | 0.7932 | 0.8617 | 0.52310 | 0.5519 | 0.5276 | 0.5842 |

image9 | 0.8224 | 0.8282 | 0.7908 | 0.8796 | 0.54983 | 0.5794 | 0.5411 | 0.6108 |

image10 | 0.8125 | 0.8200 | 0.7985 | 0.8615 | 0.56020 | 0.5931 | 0.5731 | 0.6188 |

Average | 0.8084 | 0.8141 | 0.7862 | 0.8628 | 0.53781 | 0.5674 | 0.5423 | 0.6034 |

As shown in Figure 11, the fused images obtained by proposed method and other fusion methods are compared. And the values of PSNR and SSIM are listed in Table 3 and Table 4. The best results are indicated in bold. In Table 3 and Table 4, the PSNR and SSIM values and their average values obtained by proposed method both are the best value. As we can see, the situation is same with Gaussian noise. This means, our method has better performance than other methods even when the source image contains salt & pepper noise.

#### 4.3.3 Fusion on images contain Poisson noise

In this Section, we introduce the fused result of source images which contain Poisson noise. For example, the fused images of source images âimage8â are shown in Figure 12. The values of PSNR and SSIM are shown in Table 5 and Table 6.

Method | CBF | DCHWT | SR | Proposed |

image6 | 29.0632 | 28.8324 | 26.9615 | 30.1691 |

image7 | 29.8907 | 29.6841 | 28.0019 | 28.7165 |

image8 | 30.0660 | 29.8602 | 28.3629 | 30.0154 |

image9 | 29.0275 | 28.9895 | 27.4048 | 28.7647 |

image10 | 29.0376 | 28.7515 | 27.3377 | 29.6735 |

Average | 29.4170 | 29.2235 | 27.6138 | 29.4678 |

Method | CBF | DCHWT | SR | Proposed |

image6 | 0.7352 | 0.7177 | 0.6525 | 0.8431 |

image7 | 0.8677 | 0.8587 | 0.8177 | 0.8198 |

image8 | 0.8308 | 0.8174 | 0.7791 | 0.8644 |

image9 | 0.8276 | 0.8159 | 0.7662 | 0.8316 |

image10 | 0.7969 | 0.7839 | 0.7455 | 0.8627 |

Average | 0.8117 | 0.7987 | 0.7522 | 0.8443 |

As shown in Figure 12, the fused images obtained by proposed method and other fusion methods are compared. And the values of PSNR and SSIM are listed in Table 5 and Table 6. Just like subsection A and subsection B, the fused image obtained by proposed method has better performance than other methods when source images contain Poisson noise.

## 5 Conclusions

In this paper, to the best of our knowledge, this is the first time that the low-rank representation technique is applied to an image fusion task. And a novel noisy image fusion method based on low-rank representation has been proposed. In the DWT framework, the low frequency coefficients are fused by spatial frequency and choose-max scheme. And the high frequency coefficients are fused by low-rank representation and choose-max scheme. The experimental results show that the proposed method is more reasonable and more similar to original image. From the fused images and the values of PSNR and SSIM, the proposed method has great performance compared with other methods for different situations. And our method has better performance when source image contains more noise.

## References

## References

- (1) S. Li, X. Kang, L. Fang, J. Hu, H. Yin, Pixel-level image fusion: A survey of the state of the art, Information Fusion 33 (2017) 100–112.
- (2) A. B. Hamza, Y. He, H. Krim, A. Willsky, A multiscale approach to pixel-level image fusion, Integrated Computer Aided Engineering 12 (2005) 135–146.
- (3) S. Yang, M. Wang, L. Jiao, R. Wu, Z. Wang, Image fusion based on a new contourlet packet, Information Fusion 11 (2010) 78–84.
- (4) L. Wang, B. Li, L. F. Tian, Eggdd: An explicit dependency model for multi-modal medical image fusion in shift-invariant shearlet transform domain, Information Fusion 19 (2014) 29â37.
- (5) Q. Zhang, B. L. Guo, Multifocus image fusion using the nonsubsampled contourlet transform, Signal Processing 89 (2009) 1334–1346.
- (6) Y. Zhang, X. Bai, T. Wang, Boundary finding based multi-focus image fusion through multi-scale morphological focus-measure, Information Fusion 35 (2016) 81–101.
- (7) Y. Liu, X. Chen, H. Peng, Z. Wang, Multi-focus image fusion with a deep convolutional neural network, Information Fusion 36 (2016) 191–207.
- (8) M. Nejati, S. Samavi, S. Shirani, Multi-focus image fusion using dictionary-based sparse representation, Information Fusion 25 (2015) 72–84.
- (9) H. Yin, Y. Li, Y. Chai, Z. Liu, Z. Zhu, A novel sparse-representation-based multi-focus image fusion approach, Neurocomputing 216 (2016) 216–229.
- (10) Y. Liu, X. Chen, R. Ward, Z. J. Wang, Image fusion with convolutional sparse representation, IEEE Signal Processing Letters PP (2016) 1–1.
- (11) G. Liu, Z. Lin, Y. Yu, Robust subspace segmentation by low-rank representation, in: International Conference on Machine Learning, 2010, pp. 663–670.
- (12) B. K. S. Kumar, Image fusion based on pixel significance using cross bilateral filter, Signal, Image and Video Processing 9 (2015) 1–12.
- (13) B. K. S. Kumar, Multifocus and multispectral image fusion based on pixel significance using discrete cosine harmonic wavelet transform, Signal, Image and Video Processing 7 (2013) 1125–1143.