Authors:
(1) Xueying Mao, School of Computer Science, Fudan University, China (xymao22@[email protected]);
(2) Xiaoxiao Hu, School of Computer Science, Fudan University, China ([email protected]);
(3) Wanli Peng, School of Computer Science, Fudan University, China ([email protected]);
(4) Zhenliang Gan, School of Computer Science, Fudan University, China (zlgan23@[email protected]);
(5) Qichao Ying, School of Computer Science, Fudan University, China ([email protected]);
(6) Zhenxing Qian, School of Computer Science, Fudan University, China and a Corresponding Author ([email protected]);
(7) Sheng Li, School of Computer Science, Fudan University, China ([email protected]);
(8) Xinpeng Zhang, School of Computer Science, Fudan University, China ([email protected]).
Editor's note: This is Part 2 of 7 of a study describing the development of a new method to hide secret messages in semantic features of videos, making it more secure and resistant to distortion during online sharing. Read the rest below.
Image Steganography. Conventional image steganography methods primarily modify high-frequency components to embed secret message. The LSB substitution method [80] operates under the assumption that human eyes cannot perceive changes in the least significant bit of pixel values. HiDDeN [12] introduces an end-to-end trainable framework through an encoder-decoder architecture. SteganoGAN [31] employs dense encoders to enhance payload capacity. Wei et al [16] propose an advanced generative steganography network that can generate realistic stego images without using cover images. However, alterations in high-frequency components can be obliterated by common post-processing operations, such as JPEG compression or Gaussian Blur.
Video Steganography. Early video steganography usually modifies RGB or YUV color spaces for embedding secret message. Dong et al [33] observed that altering intra-frame modes in HEVC significantly affected video coding efficiency, while modifications to multilevel recursive coding units had minimal distortion impact. PWRN [35] employs a super-resolution CNN, the Wide Residual-Net filter (PWRN), to replace HEVC’s loop filter. Recently, He et al [36] devised an adaptive distortion function using enhanced Rate Distortion Optimization (RDO) and Syndrome-Trellis Code (STC) to minimize embedding distortion. However, these methods are struggle to handle various distortions that may arise in lossy channel transmission.
Visual Editing. Visual editing can encompass color correction on a single image, deletion, addition, or alteration of objects within the image, or even merging two photos to create an entirely new scene. In videos, visual editing might involve adding effects to specific frames, removing elements from the video to alter the scene, replacing one person’s face with another [26], also called face-swapping.
This paper is available on arxiv under CC 4.0 license.