This story draft by @escholar has not been reviewed by an editor, YET.

Transformers for Image Restoration

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture

Authors:

(1) Hyosun park, Department of Astronomy, Yonsei University, Seoul, Republic of Korea;

(2) Yongsik Jo, Artificial Intelligence Graduate School, UNIST, Ulsan, Republic of Korea;

(3) Seokun Kang, Artificial Intelligence Graduate School, UNIST, Ulsan, Republic of Korea;

(4) Taehwan Kim, Artificial Intelligence Graduate School, UNIST, Ulsan, Republic of Korea;

(5) M. James Jee, Department of Astronomy, Yonsei University, Seoul, Republic of Korea and Department of Physics and Astronomy, University of California, Davis, CA, USA.

Table of Links

Abstract and 1 Introduction

2 Method

2.1. Overview and 2.2. Encoder-Decoder Architecture

2.3. Transformers for Image Restoration

2.4. Implementation Details

3 Data and 3.1. HST Dataset

3.2. GalSim Dataset

3.3. JWST Dataset

4 JWST Test Dataset Results and 4.1. PSNR and SSIM

4.2. Visual Inspection

4.3. Restoration of Morphological Parameters

4.4. Restoration of Photometric Parameters

5 Application to real HST Images and 5.1. Restoration of Single-epoch Images and Comparison with Multi-epoch Images

5.2. Restoration of Multi-epoch HST Images and Comparison with Multi-epoch JWST Images

6 Limitations

6.1. Degradation in Restoration Quality Due to High Noise Level

6.2. Point Source Recovery Test

6.3. Artifacts Due to Pixel Correlation

7 Conclusions and Acknowledgements

Appendix: A. Image restoration test with Blank Noise-Only Images

References

2.3. Transformers for Image Restoration

In Transformer, the encoder consists of multiple layers of self-attention mechanisms followed by position-wise feed-forward neural networks. “Attention” refers to a mechanism that allows models to focus on specific parts of input data while processing it. It enables the model to selectively weigh different parts of the input, giving more importance to relevant information and ignoring irrelevant or less important parts. The key idea behind attention is to dynamically compute weights for different parts of the input data, such as words in a sentence or pixels in an image, based on their relevance to the current task. In self-attention, each element (e.g., word or pixel) in the input sequence is compared to every other element to compute attention weights, which represent the importance of each element with respect to others. These attention weights are then used to compute a weighted sum of the input elements, resulting in an attention-based representation that highlights relevant information.


The Transformer decoder also consists of multiple layers of self-attention mechanisms, along with additional attention mechanisms over the encoder’s output. The decoder predicts one element of the output sequence at a time, conditioned on the previously generated elements and the encoded representation of the input sequence.


The Transformer architecture was initially proposed and applied to the task of machine translation, which involves translating text from one language to another. The success of the Transformer in machine translation tasks demonstrated its effectiveness in capturing longrange dependencies in sequences and handling sequential data more efficiently than traditional architectures. This breakthrough sparked widespread interest in the Transformer architecture, leading to its adoption and adaptation for various image processing tasks. Transformers show promising results in tasks such as image classification, object detection, semantic segmentation, and image generation, traditionally dominated by CNNs. Transformer models capture long-range pixel correlations more effectively than CNN-based models.


However, using the Transformer model on large images becomes challenging with its original implementation, which applies self-attention layers on pixels. This is because the computational complexity escalates quadratically with the pixel count. Zamir et al. (2022) overcame this obstacle by substituting the original selfattention block with the MDTA block, which implements self-attention in the feature domain and makes the complexity increase only linearly with the number of pixels. We propose to use Zamir et al. (2022)’s efficient Transformer Restormer to apply deconvolution and denoising to astronomical images. We briefly describe the two core components of Restormer in §2.3.1 and §2.3.2. Readers are referred to Zamir et al. (2022) for more technical details.

2.3.1. MDTA block

MDTA stands as a crucial module within Restormer. By performing self-attention in the channel dimension, MDTA calculates interactions between channels in the input feature map, creating query-key interactions. Through this process, MDTA effectively models interactions between channels in the input feature map, facilitating the learning of the global context necessary for image restoration tasks.


MDTA also employs depth-wise convolution to accentuate local context. This enables MDTA to emphasize the local context of the input image, ultimately allowing for the modeling of both global and local contexts.

2.3.2. GDFN block

GDFN, short for Gated-Dconv Feed-Forward Network, stands as another crucial module within Restormer. Utilizing a gating mechanism to enhance the Feed-Forward Network, GDFN offers improved information flow, resulting in high-quality outcomes for image restoration tasks


GDFN controls the information flow through gating layers, composed of element-wise multiplication of two linear projection layers, one of which is activated by the Gaussian Error Linear Unit (GELU) non-linearity. This allows GDFN to suppress less informative features and hierarchically transmit only valuable information. Similar to the MDTA module, GDFN employs local content mixing. Through this, GDFN emphasizes the local context of the input image, providing a more robust information flow for enhanced results in image restoration tasks.


This paper is available on arxiv under CC BY 4.0 Deed license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks