Can Privacy Issues Be Resolved with Secure Source Coding?

Authors:

(1) Onur Gunlu, Chair of Communications Engineering and Security, University of Siegen and Information Coding Division, Department of Electrical Engineering, Linkoping University ([email protected]);

(2) Rafael F. Schaefer, Chair of Communications Engineering and Security, University of Siegen ([email protected]);

(3) Holger Boche, Chair of Theoretical Information Technology, Technical University of Munich, CASA: Cyber Security in the Age of Large-Scale Adversaries Exzellenzcluster, Ruhr-Universitat Bochum, and BMBF Research Hub 6G-Life, Technical University of Munich ([email protected]);

(4) H. Vincent Poor, Department of Electrical and Computer Engineering, Princeton University ([email protected]).

Table of Links

Abstract and I. Introduction

II. System Model

III. Secure and Private Source Coding Regions

IV. Gaussian Sources and Channels

V. Proof for Theorem 1

Acknowledgment and References

Abstract—The problem of secure source coding with multiple terminals is extended by considering a remote source whose noisy measurements are the correlated random variables used for secure source reconstruction. The main additions to the problem include 1) all terminals noncausally observe a noisy measurement of the remote source; 2) a private key is available to all legitimate terminals; 3) the public communication link between the encoder and decoder is rate-limited; and 4) the secrecy leakage to the eavesdropper is measured with respect to the encoder input, whereas the privacy leakage is measured with respect to the remote source. Exact rate regions are characterized for a lossy source coding problem with a private key, remote source, and decoder side information under security, privacy, communication, and distortion constraints. By replacing the distortion constraint with a reliability constraint, we obtain the exact rate region also for the lossless case. Furthermore, the lossy rate region for scalar discrete-time Gaussian sources and measurement channels is established.

I. INTRODUCTION

Consider multiple terminals that observe correlated random sequences and wish to reconstruct these sequences at another terminal, called a decoder, by sending messages through noiseless communication links, i.e., the distributed source coding problem [1]. A sensor network, where each node observes a correlated random sequence that should be reconstructed at a distant node is a classic example for this problem [2, pp. 258]. Similarly, function computation problems in which a fusion center observes messages sent by other nodes to compute a function are closely related problems and can be used to model various recent applications [3], [4]. Since the messages sent over the communication links can be public, security constraints are imposed on these messages against an eavesdropper in the same network [5]. If all sent messages are available to the eavesdropper, then it is necessary to provide an advantage to the decoder over the eavesdropper to enable secure source coding. Providing side information, which is correlated with the sequences that should be reconstructed, to the decoder can provide such an advantage over the eavesdropper that can also have side information, as in [6]–[8]. Allowing the eavesdropper to access only a strict subset of all messages is also a method to enable secure distributed source coding, considered in [9]–[11]; see also [12] in which a similar method is applied to enable secure remote source reconstruction. Similarly, also a private key that is shared by legitimate terminals and hidden from the eavesdropper can provide such an advantage, as in [13], [14].

Source coding models in the literature commonly assume that dependent multi-letter random variables are available and should be compressed. For secret-key agreement [15], [16] and secure function computation problems [17], [18], which are instances of the source coding with side information problem [19, Section IV-B], the correlation between these multi-letter random variables is posited in [20], [21] to stem from an underlying ground truth that is a remote source such that its noisy measurements are these dependent random variables. Such a remote source allows to model the cause of correlation in a network, so we also posit that there is a remote source whose noisy measurements are used in the source coding problems discussed below, which is similar to the models in [22, pp. 78] and [23, Fig. 9]. Furthermore, in the chief executive officer (CEO) problem [24], there is a remote source whose noisy measurements are encoded such that a decoder can reconstruct the remote source by using the encoder outputs. Our model is different from the model in the CEO problem, since in our model the decoder aims to recover encoder observations rather than the remote source that is considered mainly to describe the cause of correlation between encoder observations. Thus, we define the secrecy leakage as the amount of information leaked to an eavesdropper about encoder observations. Since the remote source is common for all observations in the same network, we impose a privacy leakage constraint on the remote source because each encoder output observed by an eavesdropper leaks information about unused encoder observations, which might later cause secrecy leakage when the unused encoder observations are employed [25]–[27]; see [28]–[30] for joint secrecy and joint privacy constraints imposed due to multiple uses of the same source.

We characterize the rate region for a lossy secure and private source coding problem with one private key, remote source, encoder, decoder, eavesdropper, and eavesdropper and decoder side information. Requiring reliable source reconstruction, we characterize the rate region also for the lossless case. A Gaussian remote source and independent additive Gaussian noise measurement channels are considered to establish their lossy rate region under squared error distortion.

II. SYSTEM MODEL

We next define the rate region for the lossy secure and private source coding problem defined above.

Note that in (2) and (3) we consider conditional mutual information terms to take account of unavoidable privacy and secrecy leakages due to Eve’s side information; see also [21], [31]. Furthermore, considering conditional mutual information terms rather than corresponding conditional entropy terms, the latter of which is used in [6], [14], [32]–[34], to characterize the secrecy and privacy leakages simplifies our analysis.

We next define the rate region for the lossless secure and private source coding problem.

The lossless secure and private source coding region R is the closure of the set of all achievable lossless tuples.

This paper is available on arxiv under CC BY 4.0 DEED license.