Authors:
(1) Xueying Mao, School of Computer Science, Fudan University, China (xymao22@[email protected]);
(2) Xiaoxiao Hu, School of Computer Science, Fudan University, China ([email protected]);
(3) Wanli Peng, School of Computer Science, Fudan University, China ([email protected]);
(4) Zhenliang Gan, School of Computer Science, Fudan University, China (zlgan23@[email protected]);
(5) Qichao Ying, School of Computer Science, Fudan University, China ([email protected]);
(6) Zhenxing Qian, School of Computer Science, Fudan University, China and a Corresponding Author ([email protected]);
(7) Sheng Li, School of Computer Science, Fudan University, China ([email protected]);
(8) Xinpeng Zhang, School of Computer Science, Fudan University, China ([email protected]).
Editor's note: This is Part 1 of 7 of a study describing the development of a new method to hide secret messages in semantic features of videos, making it more secure and resistant to distortion during online sharing. Read the rest below.
Traditional video steganography methods are based on modifying the covert space for embedding, whereas we propose an innovative approach that embeds secret message within semantic feature for steganography during the video editing process. Although existing traditional video steganography methods display a certain level of security and embedding capacity, they lack adequate robustness against common distortions in online social networks (OSNs). In this paper, we introduce an end-to-end robust generative video steganography network (RoGVS), which achieves visual editing by modifying semantic feature of videos to embed secret message. We employ face-swapping scenario to showcase the visual editing effects. We first design a secret message embedding module to adaptively hide secret message into the semantic feature of videos. Extensive experiments display that the proposed RoGVS method applied to facial video datasets demonstrate its superiority over existing video and image steganography techniques in terms of both robustness and capacity.
Index Terms— Generative video steganography, Robust steganography, Semantic modification
Steganography is the science and technology of embedding secret message into natural digital carriers, such as image, video, text, etc. Generally, the natural digital carriers are called “cover” and the digital media with secret message are called “stego”. Conventional image steganography methods [49, 12, 31] primarily modify high-frequency components to embed secret message. They commonly utilize methodologies such as pixel value manipulation or integrating secret message into the cover image before inputting it into an encoder for steganographic purposes.
In the past few years, as the rise of short video software applications like TikTok, YouTube, Snapchat, etc., video has become a suitable carrier for steganography.
Traditional video steganographic methods, utilizing direct pixel value manipulation [32], coding mapping [34], or adaptive distortion function [36], exploit video data redundancy for information hiding. Nevertheless, while successful in security and embedding capacity, these methods on modifying covert space can be erased by common post-processing operations easily. So they are vulnerable to mitigate diverse distortions that may occur in lossy channel transmission.
Visual editing on videos can be seen as the process of modifying the semantic information of objects within them. Instead of hiding secret message in covert space, we embed secret message within semantic feature of videos for visual edition. The advanced semantic feature is less susceptible to distortions, making this method inherently robust. In order to improve the robustness of video steganography, we propose an end-to-end robust generative video steganography network (RoGVS), which consists of four modules, containing information encoding module, secret message embedding model, attacking layer, and secret message extraction module. For evaluation, we use face-swapping technology as an example to show the effectiveness of our method, while it can be easily extended to other applications. Comprehensive experiments have showcased that our method surpasses stateof-the-art techniques, attaining commendable robustness and generalization capabilities.
The main contributions of our work are as follows: 1) We are the first to explore a novel generative video steganography method, which modifies semantic feature to embed secret message during visual editing instead of modify the covert space. This framework exhibits strong extensibility, serving as a new topic for the future development of the steganography field. 2) The proposed method is robust against common distortions in social network platform and the secret message can be extracted with high accuracy. 3) Our method achieves better security for anti-steganalysis than other state-of-the-art methods, which can effectively evade the detection of steganalysis system.
This paper is available on arxiv under CC 4.0 license.