Digital art

Reconstructing high fidelity digital rock images using deep convolutional neural networks

As per the objective of the study, our goal is to recover a high-fidelity image from a corrupted low-resolution input. Since the images are downsampled, blurred and corrupted by various noises, we approach the problem sequentially at first. In the first part of the results section, we discuss the performance of CNNs on each individual task. In the last section, we explore the application of end-to-end approaches in reconstructing the final target image directly from the noisy input. Finally, we test the developed networks on an unseen dataset to test their generalization capability.

Image denoising

In denoising, the aim is to remove noise and restore the true image. However, since noise, edge, and texture are high-frequency components, it is difficult to distinguish them in the process of denoising; the denoised images could inevitably lose some details11. Traditional denoising methods such as non-local means filter, Gaussian filtering or wavelet thresholding often smooth the image, and require a priori knowledge of the noise type and an estimate of the noise amount.

Digital images are corrupted with various types of noises. In scientific imaging, noise can come from a variety of sources. Poisson noise originates from the varying number of electrons that hit the specimen at each measurement spot. Gaussian noise is the result of microscope electronics25. Speckle noise rarely a problem for SEM or CT scan imaging, however, we add this type of noise to our images to further complicating the denoising task.

Denoising through filters assumes a particular type of noise or requires iterative procedures to estimate the denoiser’s parameters. Gaussian noise is the most prevalent in which ({hat{y}} = y + mathcal {N}(0, sigma ))where y is the true image, and ({hat{y}}) is the image containing noise. If (sigma) of the distribution is known, the true image can be recovered. In reality, the distribution is not known, and that is if we assume a Gaussian white noise. SEM images are often corrupted by Poisson noise as well as Gaussian and rarely speckle noise (the latter is mostly observed in satellite images).

Denoising using CNNs has been the target of many studies37. While most of these models perform well on their dataset, they often have known noise amount and type in their images (eg Gaussian noise of known distribution). Our dataset is corrupted with a random amount of Gaussian, Poisson and speckle noise. The deep iterative down-up CNN, or DIDN for short, devised by Yu et al.4 is the main framework adopted here for denoising. Details of the network architecture are discussed in the “Methods” section as well as the supplementary materials. The DIDN network denoises images through a series of convolutional blocks. The network has the ability to adapt to different noise types and amounts, hence, presenting an optimal case for our study. We train the network for minimization of the L1 (ie mean absolute error (MAE)) loss. The model is trained for as long as the validation loss of the network stops improving for 5 consecutive epochs. Further detail on the training aspects of the network is presented in the “Methods” section.

Figure 1 shows a sample output of the denoiser network. The input images have varying levels of noise. The CNN network successfully reconstructs the target clean images. For the shale image, we see the network increases the PSNR from 17.9 to 50 dB, similarly both MS-SSIM and SSIM increase. Note that the images in Fig. 1 are from the test data. The last column of Fig. 1 is the absolute difference between the prediction and the ground truth. The difference maps indicate the exceptional performance of the DIDN for denoising.

Figure 1
figure 1

Sample images denoised using DIDN network. The orange rectangular box shows the highlighted area with a zoom factor of 2(times). Top row is a sample shale image, middle row is a limestone, and the bottom row a sandstone image. The row order of the rock-type is retained in Figs. 3, 4, 6, 7 and 8.

The performance of the denoiser network in terms of PSNR, SSIM and MS-SSIM on both the training and test data are reported in Fig. 2a. For comparison purposes, we also report denoising performance of the non-local means filter (window size of 7, and search window size of 11 were used). For the CNN network, the training data’s PSNR is mostly larger than 41 dB (based on the 2nd quartile of the boxplot). The results on the test data show improvement over the training data, which may indicate the test data are easier to denoise. We note that the selection of train-test data is random, and the images are shuffled before doing so. Additionally, the varying noise level in each image can contribute to the performance differences between the test and train data. The non-local means filter’s performance is far below the CNN model. This performance degradation is partly due to the unknown noise level and type and in part due to the algorithm used by the non-local means denoising.

The network’s performance in terms of SSIM and MS-SSIM is exceptional. For SSIM, a median value of (approx) 0.993 is achieved, while MS-SSIM is (approx) 0.999. Both of these metrics indicate a highly efficient reconstruction in terms of structural similarity.

The analysis of the metrics of image reconstruction indicates the network learners to effectively denoise the images. Given the randomness of the noise (type and amount), it is clear that this deep CNN is highly effective in removing noisy signals from images. Furthermore, by training with respect to the L1 loss, we avoid the smoothing of the finer features of the images that are often associated with traditional filtering methods or the L2 loss.

Figure 2
figure 2

Reconstruction metrics for (a) the denoiser network, )b) deblurring networks, and (c) super-resolution networks.

Image deblurring

Having denoised the images, the next task is to recover a deblurred image from the blurry denoised images. Conventional deblurring methods (ie deconvolution) require iterative procedure and an estimate of the point spread function. Similar to denoising, such information is not readily available.

For the deblurring task, we test two CNNs, both having a UNet architecture. The first network uses channel attention and residual blocks and is a combination of models proposed by38,39. This network is denoted as Attention-UNet. The second network is a variant of the UNet model proposed by Cho et al8. This network is called MIMO-UNet+. We train both networks for minimization of the L1 loss. More detail on the architecture of the MIMO-UNet+ and training is reported in the supplementary materials. We note that the choice of UNet architecture opens up many possibilities for designing the deblurring network. CNN models such as DeepLabv340,41 have exceptional performance in segmentation tasks. However, we only test two networks here that have been specially designed for deblurring tasks.

Figure 2b compares the performance of each network on both the train and test data in terms of the three reconstruction metrics we applied in this study. We that MIMO-UNet+ consistently performs better on all three metrics compared to the observe UNet with channel attention. The MIMO-UNet+ model produces a PSNR roughly higher than 35dB (2nd quartile) on both train and test data. Structural similarity is higher than 0.96, while MS-SSIM is nearly 1 for this network. Further analysis in this paper uses the MIMO-UNet+ for deblurring.

Figure 3 demonstrates three sample images de-blurred using the MIMO-UNet+ network. The good performance of the network on this task is reflected in these images as well in the associated metrics for each reconstruction. The sandstone images (bottom row) are easier to deblur, while due to high-frequency features in the texture of the shale samples (top row), the performance is slightly impaired. The difference maps show slightly worse performance compared to the denoiser network. Additionally, for the shale sample, the texture is more difficult to reconstruct. The carbonate image (the middle row) is blurred with higher variance, and the results indicate the good performance of the network in recovering a deblurred image.

Figure 3
figure 3

Prediction of the debluring task using the MIMO-UNet+ network.

Overall, comparing to deconvolution methods, CNNs have superior performance both in terms of quality of reconstruction and ease of use. In deconvolution methods, a point spread function and multiple iterations are required for deblurring. Using CNNs, there will be no need for user intervention, as the results indicate the performance is satisfactory to good.

Image super resolution

Image super-resolution aims to reconstruct high-resolution realization of a given low-resolution image. In the context of scientific images, it is important we recover true features, especially edges.

For super-resolving images, various network architectures have been developed (eg SRCNN30EDSR42WDSR43SRGAN44ESRGAN45DFCAN46RCAN47, among many others). A recent paper by Qiao et al.46 discussed the use of Fourier transformation for accurate reconstruction of high-frequency features of microscopic images. We experimented with all the mentioned networks and found that EDSR (enhanced deep super-resolution network), WDSR (wide activation deep super-resolution network), and DFCAN (deep Fourier channel attention network) consistently performed better than the other models in terms of construction metrics. We therefore present results of the super-resolution task using the ESDR, WDSR and DFCAN models only. It is noteworthy to mention that GAN (generative adversarial neural networks) based methods perform worse than their regular variant for our purposes. The reason for this could be the training losses that are used in GANs, which push the solution towards a more visually identical image rather than constructing high-frequency accurately features. Details on the architecture and training procedures are reported in the supplementary materials.

EDSR and WDSR are trained with respect to L1 loss, while DFCAN is trained using a weighted L2 and SSIM loss. All three networks increase resolution by a factor of 2.

Figure 2c reports the performance of the three selected super-resolving networks. We note that both EDSR and WDSR networks have (approx) 11–13 times the number of trainable parameters compared to DFCAN. The PSNR of the reconstructed high-resolution images is broadly similar, with the marginally WDSR performing better on the training data.

In terms of SSIM, DFCAN shows slight improvement over WDSR, possibly owing to the use of Fourier transformation that results in better reconstruction of high-frequency features. Nonetheless, all three networks perform exceptionally well on this task. Finally, the MS-SSIM for all three networks approaches 1, indicating a near-perfect score for this metric.

Comparing the performance of each network and their computational costs, the DFCAN model outperforms the other two. This network has more than 10 times fewer trainable parameters, and it performs equally or even marginally better on the reconstruction metrics. Therefore, for further experimenting, we use the DFCAN for the super-resolution task.

Figure 4 shows some sample low-resolution images super-resolved using the DFCAN model. Note for the shale samples (first row), the network restores the boundaries accurately. The difference map is the absolute difference between the super-resolved and ground truth images (pixel-wise difference). For the sandstone and carbonate, the difference map indicates highly efficient performance by the network. For shale, there appear to be scattered differences with no particular focus on specific regions of the image. The randomness in difference map of the shale texture is likely due to the inherent noise-type high-frequency features in the ground truth SEM images.

Figure 4
figure 4

Sample images super-resolved using the DFCAN network (magnification factor of 2).

About the author

Publishing Team