Authors:
(1) Xueying Mao, School of Computer Science, Fudan University, China (xymao22@[email protected]);
(2) Xiaoxiao Hu, School of Computer Science, Fudan University, China ([email protected]);
(3) Wanli Peng, School of Computer Science, Fudan University, China ([email protected]);
(4) Zhenliang Gan, School of Computer Science, Fudan University, China (zlgan23@[email protected]);
(5) Qichao Ying, School of Computer Science, Fudan University, China ([email protected]);
(6) Zhenxing Qian, School of Computer Science, Fudan University, China and a Corresponding Author ([email protected]);
(7) Sheng Li, School of Computer Science, Fudan University, China ([email protected]);
(8) Xinpeng Zhang, School of Computer Science, Fudan University, China ([email protected]).
Editor's note: This is Part 3 of 7 of a study describing the development of a new method to hide secret messages in semantic features of videos, making it more secure and resistant to distortion during online sharing. Read the rest below.
This module aims to embed the secret message during face swapping. The key problem is how to implement face swapping under the guidance of secret message. To our understanding, the latent features of the cover video encompass both identity and attribute feature. Face swapping essentially involves replacing the cover video’s identity with that of the reference image. Consequently, we embed the secret message into the identity feature of the reference image, formulated as follows:
where λ is a hyper-parameter adjusting the influence of secret message on identity feature.
To bolster the robustness of our method for face-swapping videos in real-world scenarios, we design a attacking layer. This module simulates prevalent distortions encountered across social network platforms.
JPEG Compression. JPEG compression involves a nondifferentiable quantization step due to rounding. To mitigate this, we apply Shin et al.’s method [53] to approximate the near-zero quantization step using function Eq. (6):
where x denotes pixels of the input image. We uniformly sample the JPEG quality from within the range of [50, 100].
Color Distortions. We consider two general color distortions: brightness and contrast. We perform a linear transformation on the pixels of each channel as the formula Eq. (7):
where p(x) and f(x) refers to the distorted and the original image. The parameters a and c regulate contrast and brightness, respectively.
Color Saturation. We perform random linear interpolation between RGB and gray images equivalent to simulate the distortion.
Additive Noise. We use Gaussian noise to simulate any other distortions that are not considered in the attacking layer. We employ a Gaussian noise model (sampling the standard deviation δ ∼ U[0, 0.2]) to simulate imaging noise.
The proposed method ensures both high stego video quality and precise extraction of secret message. We achieve this by training the modules using the following losses.
Attribute Loss. We use the weak feature matching loss [26] to constrain attribute difference before and after embedding secret message. The loss function is defined as follows:
where Dj refers to the feature extractor of Discriminator D for the j-th layer, Nj is the number of elements in the j-th layer, and H is the total number of layers. Additionally, h represents the starting layer for computing the weak feature matching loss.
Adversarial Loss. To enhance performance, we use multiscale Discriminator with gradient penalty. We adopt the Hinge version of adversarial loss defined as follows:
Secret Loss. To address this, we use the Binary Cross-Entropy loss (BCE) as defined in Eq. (11).
Total loss. The total loss is defined as follows:
This paper is available on arxiv under CC 4.0 license.