Table of Links
II. RELATED WORK
A. Image-to-image translation
Image-to-image translation is a domain of computer vision that focuses on transforming an image from one style or modality to another while preserving its underlying structure. This process is fundamental in various applications, ranging from artistic style transfer to synthesizing realistic datasets.
One seminal work in this field is the introduction of the Generative Adversarial Network (GAN) by Goodfellow et al. [7]. The GAN framework involves a dual-network architecture where a generator network competes against a discriminator network, fostering the generation of highly realistic images. Building on this, Zhu et al. introduced CycleGAN [8], which allows for image-to-image translation in the absence of paired examples. In the context of medical imaging, Sun et al. [9] leveraged a double U-Net CycleGAN to enhance the synthesis of CT images from MRI images. Their model incorporates a U-Net-based discriminator that improves the local and global accuracy of synthesized images. Chen et al. [10] introduced a correction network module based on an encoder-decoder structure into a CycleGAN model. Their module incorporates residual connections to efficiently extract latent feature representations from medical images and optimize them to generate higher-quality images.
B. Ultrasound image synthesis
As for medical ultrasound image synthesis, there have been achieving advancements due to the integration of deep learning techniques, particularly GANs and Denoising Diffusion Probabilistic Models (DDPMs) [11]. Liang et al. [12] employed GANs to generate high-resolution ultrasound images from low-resolution inputs, thereby enhancing image clarity and detail that are crucial for effective medical analysis. Stojanovski et al. [13] introduced a novel approach to generating synthetic ultrasound images through DDPM. Their study leverages cardiac semantic label maps to guide the synthesis process, producing realistic ultrasound images that can substitute for actual data in training deep learning models for tasks like cardiac segmentation.
In the specific context of synthesizing ultrasound images from CT images, Vitale et al. [14] proposed a two-stage pipeline. Their method begins with the generation of intermediate synthetic ultrasound images from abdominal CT scans using a ray-casting approach. Then a CycleGAN framework operates by training on unpaired sets of synthetic and real ultrasound images. Song et al. [15] also proposed a CycleGAN based method to synthesize ultrasound images from abundant CT data. Their approach leverages the rich annotations of CT images to enhance the segmentation network learning process. The segmentation networks are initially pretrained on the synthetic dataset, which mimics the properties of ultrasound images while preserving the detailed anatomical features of CT scans. Then they are then fine-tuned on actual ultrasound images to refine their ability to accurately segment kidneys.
Authors:
(1) Yuhan Song, School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1292, Japan ([email protected]);
(2) Nak Young Chong, School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1292, Japan ([email protected]).
This paper is