Informed watermarking and compression of multi
Transcription
Informed watermarking and compression of multi
INSA de Lyon N ◦ 07 ISAL 0093 2007 THESIS presented to obtain the degree of Doctor of Philosophy in Computer Science A dissertation presented by Çağatay Dikici December 3, 2007 Informed Watermarking and Compression of Multi-Sources prepared at LIRIS under the supervision of Atilla Baskurt and Khalid Idrissi The thesis jury is composed of: Reviewers: M. M. Jean Marc Chassery Bülent Sankur (Directeur de Recherche CNRS) (Professeur) Examiners: Mme. M. M. M. M. M. Christine Guillemot Fabrice Meriaudeau William Puech Florent Dupont Khalid Idrissi Atilla Baskurt (Directrice de Recherche INRIA) (Professeur) (Maître de conférences, HDR) (Maître de conférences, HDR) (Maître de conférences) (Professeur) Abstract Informed Watermarking and Compression of Multi-Sources, (December 2007) Çağatay Dikici, B.S., Bogazici University; M.S., Bogazici University; Technological advances in the fields of telecommunications, multimedia and the diverse choice of portable handhelds during the last decade, derive to create novel services such as sharing of multimedia content, video-conferencing or content protection, where all running on low-power devices. Hence alternative low complexity coding techniques need to be developed for replacing conventional ones. Coding with state information, a potential solution to shifting the encoder complexity to the decoder, has two main applications: 1)Distributed Source Coding(DSC) for compressing a source given a correlated version of it is available only to the decoder. 2)Informed Data Hiding(IDH) for embedding a watermark to a host signal where the host signal is available only to the encoder. For each problem stated above, practical code designs that operate close to the theoretical limits are proposed. The combination of good error correcting codes such as Low Density Parity-Check (LDPC) Codes and good quantization codes such as Trellis Coded Quantization (TCQ) are used at the design of the proposed capacity approaching codes. Moreover, the theoretical achievable rate limits for a relaxed IDH setup, such that a noisy observation of the host signal is available to the decoder is derived. Finally, motivated by the strong duality between DSC and IDH, a hybrid scheme that uses both data hiding and compression is proposed. In addition to the derivation of theoretical channel capacity and rate distortion function, a complete framework is proposed. Keywords: Coding with State Information, Compression, Watermarking, Distributed Source Coding, Writing on Dirty Paper, Low Density Parity Check Codes, Trellis Coded Quantization. -i- - ii - Résumé Tatouage informé et Compression Multi-Sources, (Décembre 2007) Çağatay Dikici, B.S., Bogazici University; M.S., Bogazici University; Les avancées technologiques qu’ont connu les télécommunications, le multimédia et les systèmes mobiles ont ouvert la porte à l’émergence, puis au développement de nouveaux services tels que le partage de bases de donées multimedia, la vidéoconférence ou la protection des contenus, tout en utilisant des systèmes à faible puissance. D’où la nécessité de disposer de nouvelles techniques de codage à complexité réduite. Les techniques de codage exploitant la présence d’une information parallèle peuvent constituer une solution potentielle permettant de déporter la complexité de codage vers le décodeur. Celles-ci s’appliquent notamment à deux principes de codage : 1) Le codage de source distribué (Distributed Source Coding DSC ) pour compresser un signal donné, sachant qu’un autre signal corrélé à celui d’origine est disponible au niveau du décodeur. 2) La dissimulation de données informée (Informed Data Hiding IDH ) permettant d’insérer un message dans un signal hôte, ce dernier n’étant connu qu’au codeur. Pour chacune de ces deux techniques, nous proposons des solutions qui approchent les limites théoriques. Nous combinons pour cela des techniques performantes tant de codage canal, de type LDPC, que de quantification de type Treillis (TCQ). Par ailleurs, nous étudions les limites théoriques pouvant être atteintes par IDH, dans le cas où une version bruitée du signal hôte est disponible au décodeur. Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons un schéma pratique hybride complet mettant en œuvre les deux techniques, ainsi qu’une étude théorique de la fonction débit / distorsion et de la capacité d’un tel système. Mots clés : Codage avec information adjacente, compression, tatouage, codage de sources distribuées, LDPC, TCQ. - iii - - iv - Acknowledgements to my family First of all, I would like to express my deepest sense of gratitude to my supervisors Prof. Atilla Baskurt and Khalid Idrissi for their patient guidance, encouragement and excellent advices throughout this study. I am grateful to Prof. Christine Guillemot for her enthusiasm, sharing her fruitful ideas on information theory, her valuable assistance for maturing my theoretical foundation on source-channel coding, and her hospitality during our collaboration. I would have special thanks to Caroline Fontaine for her advice and valuable discussions. I am thankful to my thesis reviewers Prof. Bülent Sankur and Prof. Jean-Marc Chassery. They provided me a critical reading, valuable suggestions and constructive remarks which have been very important for the improvement of this dissertation. I would like to thank my other committee members Mr. Fabrice Meriaudeau, Mr. William Puech and Mr. Florent Dupont. All LIRIS members, especially the 3 stimulator: the judo zen Guillaume Lavoué, the sailor Julien Ricard, and the theater boy Nicolas Zlatoff. I would like to thank my interns Benoît, Damien, David et Stephane. Also la Migraine team: Rémi, Greg, Elise, Fab, Antho and Claris for their weekly motivation. Finally, great thanks to Laurent, Eléonore and my family who encourage me to finalize this dissertation. -v- - vi - Contents I Problem Statement and Preliminaries Introduction 1 1 Preliminaries II xvii 13 1.1 Notations and Conventions . . . . . . . . . . . . . . . . . . . . . . . 14 1.2 Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . . 15 1.3 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Distributed Source Coding . . . . . . . . . . . . . . . . . . . . . . . . 19 1.7 Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 Message Passing Algorithm . . . . . . . . . . . . . . . . . . . . . . . 20 1.9 Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . . . . . . . 21 1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . . . . . . . . 25 1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Contributions 35 2 Distributed Source Coding 37 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.4 Practical Code Design . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.5 Practical Application for Still-Image Coding . . . . . . . . . . . . . . 54 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3 Informed Data Hiding 59 - vii - CONTENTS 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.4 Proposed Scheme-1: Extension to Cox Miller . . . . . . . . . . . . . 68 3.5 Proposed Scheme-2: Superposition Coding . . . . . . . . . . . . . . . 73 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4 Dirty Paper Coding with Partial State Information 83 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.4 Capacity/rate gain / loss analysis . . . . . . . . . . . . . . . . . . . . 92 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5 Data Hiding and Distributed Source Coding 99 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3 Contribution 1: Capacity Analysis for Multivariate Gaussian Source 5.4 Contribution 2: Practical Code Design . . . . . . . . . . . . . . . . . 115 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Conclusion 108 121 A Achievable Rate Region Calculations for two partially side information known to the encoder and decoder respectively 127 A.1 Derivation of the Achievable Rate Region . . . . . . . . . . . . . . . 127 A.2 Maximization of the Rate . . . . . . . . . . . . . . . . . . . . . . . . 128 A.3 Entropy of Multivariate Gaussian Distribution . . . . . . . . . . . . . 129 B Codes and Degree Distributions for Generating LDPC Matrices 131 B.1 Degree Distributions of rate 2/3 code, for 2 : 1 compression rate in DSC131 B.2 Degree Distribution of rate 1/2 code, for Informed Data Hiding . . . 132 C Publications of the author 133 D Cited Author Index 135 - viii - List of Figures 1 A multimedia communication setup for a low-power device which has data hiding and efficient compression capability. . . . . . . . . . . . . 1 2 A point to point source-channel coding setup. . . . . . . . . . . . . . 2 3 Coding with state information. . . . . . . . . . . . . . . . . . . . . . 3 4 Coding of two correlated sources. . . . . . . . . . . . . . . . . . . . . 6 5 Costa’s “Writing on Dirty Paper” setup. . . . . . . . . . . . . . . . . 7 6 Channel Coding with state informations. . . . . . . . . . . . . . . . . 8 7 Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . . 10 8 Chapter Dependencies of this dissertation. . . . . . . . . . . . . . . . 12 1.1 The Venn diagram of the relationship between entropy and mutual information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2 Binary entropy function H(a) versus a. . . . . . . . . . . . . . . . . . 18 1.3 Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a. 18 1.4 A compression system. . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 A communication system. . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Counting problem on a straight line. . . . . . . . . . . . . . . . . . . 21 1.7 A 1/2 recursive systematic convolution code with memory 2 and generator matrix (3, 5) in octal digits. . . . . . . . . . . . . . . . . . . . 22 State transition of the recursive systematic convolutional code (1, 3) in octal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Output points and corresponding partitions for 2 bits per sample. . . 23 1.10 Viterbi decoding of a vector with length 4. . . . . . . . . . . . . . . . 24 1.11 Bipartite graph representation of the parity check matrix H. . . . . . 27 1.12 Belief propagation on bipartite graph H. . . . . . . . . . . . . . . . . 28 1.13 Performance comparison of the error rates of (3, 6) regular LDPC code, turbo code and optimized irregular LDPC code. The channel is Binary input, additive white gaussian noise. . . . . . . . . . . . . . . . . . . 31 1.8 1.9 - ix - LIST OF FIGURES c 2007 piyalemadra.com, 1.14 LDPC coding example. Cartoon copyright used with permission. (a) Original binary cartoon with size 100 × 100 with 0s correspond to white and 1s correspond to black pixels. The ratio between the number of black pixels and the number of total pixels is 0.2445. (b) Visualization of the cartoon coded with 1/2 systematic LDPC code such that the output of the encoder contains the original image and its parity checks with size 100 × 100. (c) Throughout the transmission, both the cartoon and its parity check bits are exposed to bit errors such that the error probability of a received bit is 0.07. . 32 1.15 LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c) The original cartoon is decoded without any error after 10 iterations. 33 2.1 16 Cases of correlated source coding. . . . . . . . . . . . . . . . . . . 39 2.2 Lines and points of Table-2.2. . . . . . . . . . . . . . . . . . . . . . . 41 2.3 Admissible Slepian-Wolf rate region R for the case {1011}. . . . . . . 41 Wyner-Ziv Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4 2.5 Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}. 42 ∗ (D), RX|Y and H(pz ) − H(D) versus D curves for Graph of RX|Y (D), ∗ pz = 0.28. For binary symmetric case, RX|Y (D) has a rate loss with respect to RX|Y (D) except the points (H(pz ), 0) and (0, pz ), and there is no rate loss for these two points. . . . . . . . . . . . . . . . . . . . 44 2.7 Wyner-Ziv Setup for Gaussian case. . . . . . . . . . . . . . . . . . . . 44 2.8 2 : 1 rate DSC compression using a 2/3 convolutional code. . . . . . 46 2.9 2 : 1 rate DSC compression code design using two systematic 4/5 convolutional codes with an interleaver and iterative MAP decoding. Blocks π correspond to a pseudo-random interleaver, and the block π −1 is the corresponding deinterleaver. For the Log-Likelihood Rap(x=1|y) tio(LLR) calculations log( p(x=0|y) ), the correlation noise level and the received side information Y is used. An iterative decoding is done using Soft-Input Soft-Output (SISO) decoder. . . . . . . . . . . . . . 48 2.10 2 : 1 rate DSC compression code design using two systematic 2/3 rate parallel concatenation convolution codes and 1/2 rate puncturing matrices P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.11 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code. 49 2.12 Eight output points and corresponding partitions for 4 subset. . . . . 50 2.13 Wyner Ziv Coding as a concatenation of a good quantization code and a Slepian-Wolf Coder. . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.14 Our proposed 2 : 1 rate DSC compression code design using LDPC codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.6 -x- LIST OF FIGURES 2.15 Decoding bit error rate versus entropy rate of the correlation noise power H(p1 ) graph for 2 : 1 rate Slepian Wolf compression comparison. The simulations for LDPC is made for input length 4000 length regular LDPC matrix and 104 length irregular LDPC matrix. The graph also contains the S-W limit, the best performances achieved using convolutional code (Aaron and Girod, 2002), punctured turbo code (Lajnef, 2006), and irregular LDPC with length 105 (Liveris et al, 2002a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.16 Encoder and decoder structure. The source is compressed using LDPC binning, the side information Y available to the decoder is the image reconstructed from low frequency (LL2) wavelet composition, and joint decoding of the two received signal. . . . . . . . . . . . . . . . . 55 2.17 Construction of the Side Information. The Low-Low wavelet composition of the second level is transmitted only. Decoder reconstructs the side information by setting all other coefficients to 0. . . . . . . . 56 2.18 Left: Side Info at the receiver; Center: First iteration output of the decoded image; Right: decoding output after 5 iterations. . . . . . . 57 3.1 Channel Coding with State Information Setup. . . . . . . . . . . . . 62 3.2 Watermarked image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.3 Costa setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4 Informed embedding of Miller et al. on DCT coefficients of still images. 67 3.5 Proposed informed embedding setup on DWT coefficients of still images. 68 3.6 Analysis and Synthesis steps of Le Gall DWT. . . . . . . . . . . . . . 69 3.7 Wavelet composition of Lena image. . . . . . . . . . . . . . . . . . . 70 3.8 100 bit message M is inserted into Lena image using LH2, HL2 and HH2 DWT coefficients. No perceptual shaping is applied. . . . . . . 71 The same 100 bit message M is inserted into Lena image using perceptual shaping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.10 Comparison of 40 bit length M embedding process into asia image with and without perceptual shaping. . . . . . . . . . . . . . . . . . . 72 3.11 Superposition of 2 codes. . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.9 3.12 Embedding process of the message M into the work s using superposition coding. An LDPC coding of M to find the channel code c1 is followed by TCQ coding of αs − c1 to find the source code c0 . The watermarked signal c0 +c1 +(1−α)s is sent through the attack channel. 76 3.13 Superposition watermarking extraction by BCJR and LDPC decoding iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.14 Embedding 40 bit length payload to Cameraman image. . . . . . . . 80 - xi - LIST OF FIGURES 3.15 Maximum level of attacked images that the secret message can be still decoded perfectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.1 Channel Coding with state informations. . . . . . . . . . . . . . . . . 86 4.2 P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1}, {1, 0}, {1, 1}, {0, ∞} and {1, ∞}. The rate of transmission R(α) is calculated in nats per unit transmission (Maximum value 0.3466 nats/transmission corresponds to 1 bit/transmission). . . . . . . . . 89 Capacity gain (between RCase-B (α) and RCosta (α)) versus SNR, for different α values where P = 1, SSR= −6 dB and various 10 log(Q/K) values, with perfect knowledge of the channel state information at the encoder (L = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Maximum achievable rate loss (between RCase E (α) and RCosta (α)) versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/L) values. . . . . . . . . . . . . . . . . . . . . . . . 96 Maximum achievable rate gain or loss (between R(α) and RCosta (α)) versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/K) values, with partial knowledge of the channel state information at the encoder (L=1). . . . . . . . . . . . . . . . . 97 4.3 4.4 4.5 5.1 A Communication System between Alice and Bob via a non-secure Carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.2 Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . . 104 5.3 Channel Coding with two sided state information scheme 5.4 Rate Distortion theory with side information at the decoder: WynerZiv Setup. Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5 Multivariate Gaussian Channel of IDH DSC scheme. . . . . . . . . . 109 5.6 Multivariate Gaussian Case: Carrier point of view. . . . . . . . . . . 110 5.7 Gaussian test channel that achieves lower bound found in Equation 5.21. Input: Ŵ ∼ N (0, Q + D1 − D2 ), output: Ŵ ∼ N (0, Q + D1 ). . 111 . . . . . . 107 5.8 Equivalent setup of the test channel in Figure-5.7 by using an addition and a multiplication operator. . . . . . . . . . . . . . . . . . . . . . . 112 5.9 Equivalent Scheme of Gaussian Channel. . . . . . . . . . . . . . . . . 113 5.10 Embedding performance for 1/200 bit per sample with a compression of 2 : 1 of the watermarked string using 2/3 rate LDPC code with block length 4000. Minimum 0.02 bit per sample entropy rate loss with respect to no-embedding case. . . . . . . . . . . . . . . . . . . . 119 - xii - List of Tables 2.2 Achievable rate regions according to the Slepian-Wolf Theorem . . . 40 3.2 Robustness test of the proposed algorithm for the image “asia.pgm”. 40 bit message is embedded into asia image with DWT perceptual shaping. For each attack listed below, the corresponding maximum attack that the secret message M can be decoded without any error. 80 4.2 Special Cases of the proposed channel coding setup. . . . . . . . . . 88 5.2 Channel Coding with State Information Problems . . . . . . . . . . . 122 5.3 Source Coding with State Information Problems . . . . . . . . . . . . 122 - xiii - LIST OF TABLES - xiv - List of Abbreviations Notation AWGN BCJR BSC CCSI C-SNR DCT DISCUS DSC DRM DWT ECC G-P IDCT IDH IDWT i.i.d. l.c.e. LDPC LDPCA LLR LR MAC MAP ML MSE PAM QIM RSC r.v. SCSI SISO SLDPCA Description Additive White Gaussian Noise Bahl, Cocke, Jelinek and Raviv algorithm Binary Symmetric Channel Channel Coding with State Information Correlation-Signal to Noise Ratio Discrete Cosine Transform DIstributed Source Coding Using Syndromes Distributed Source Coding Digital Rights Management Discrete Wavelet Transform Error Correcting Codes Gel’fand-Pinsker Inverse Discrete Cosine Transform Informed Data Hiding Inverse Discrete Wavelet Transform independent identically distributed lower complex envelop Low Density Parity Check Low Density Parity Check Accumulate Log-Likelihood Ratio Likelihood Ratio Multiple-Access Channel Maximum A Posteriori Maximum Likelihood Mean Squared Error Pulse Amplitude Modulation Quantized Index Modulation Reed Solomon Codes random variable Source Coding with State Information Soft-Input Soft-Output Sum Low Density Parity Check Accumulate - xv - LIST OF TABLES Notation SNR S-W TCM TCQ TTCQ W-Z Description Signal to Noise Ratio Slepian-Wolf Trellis Coded Modulation Trellis Coded Quantization Turbo Trellis Coded Quantization Wyner-Ziv - xvi - Part I Problem Statement and Preliminaries - xvii - Introduction Consider the communication setup in Figure-1, which strongly stimulates the foundation of this dissertation. A low-power multimedia device such as mobile phone has various functionalities i.e. an embedded camera, WiFi or 3G network connection. Regardless of the limited power and bandwidth, telecommunications operators want to deploy multimedia applications like video-conferencing in which the mobile device has to handle several tasks like: Capturing sound and video, compressing them with some fidelity criteria, and sending over the network at the uplink side; and receiving the stream, decompressing the content and displaying at the downlink side. During multimedia content transmission, one would like to hide seamless (in the sense of not easily detectable by human audio-visual system) extra information for enhancing the multimedia content or simply for Digital Right Management (DRM) reasons. Figure 1: A multimedia communication setup for a low-power device which has data hiding and efficient compression capability. The state of art conventional audio-video compression standards exploit the redundancy of the data only at the encoder, and with the help of entropy coding, can compress it close to the theoretical limits. Hence in classical compression techniques the encoder is more complex than the decoder. One of the objectives of our work is to shift the encoder complexity to an intermediate powerful server and transcode the data to a conventional compression stream and send to the receiver for a simple -1- Introduction decoding end. The second problematic is to hide information in the multimedia data at the sender side with a fidelity criterion such that you can modify the host multimedia content not more than an acceptable noise level, transmit the modified version of the content to the receiver, and be able to extract the hidden information at the receiver end without access to the original multimedia data. Since the original multimedia is only accessible to the sender, this setup is also known as “blind watermarking”. For the formalization of blind watermarking and the compression problem stated above in a rigorous manner, we will briefly explain the coding concept expressed by Shannon and coding with state information. Coding Concept Figure 2: A point to point source-channel coding setup. Consider the communication system in Figure-2. A signal generated by an information source need to be transmitted to a receiver through a channel where the channel is generally imperfect hence creates errors during transmission. The aim of the transmitter and receiver pair is to minimize its resources such as transmitting power and the number of channel uses, while guaranteeing signal reconstruction with a given fidelity. One can try to minimize the number of bits for representing the input source, which corresponds to compression. On the other hand, a redundant data need to be added for recovering the errors during transmission. Hence the compression process is called as source coding while the error correction codes are called as channel coding in the literature. The duality between the source and channel coding is studied since Shannon (1959), where in source coding the redundancy of the input source is removed; in channel coding case, a controlled redundant data is -2- Introduction added in order to correct the transmission errors. Coding with State Information Figure 3: Coding with state information. In this section, we extend the basic source coding and channel coding setup by introducing a state information S that determines the output of the channel. This state information can be accessible perfectly or partially to the transmitter, to the receiver or both depending on the setup. We are mainly interested in two communication problems with state information in this dissertation in order to find the solution to the data hiding and source coding problem for low-power devices. The first setup is the “channel coding with state information (CCSI) known to the transmitter”. Since only the transmitter has access to the state information, and not the receiver in this setup, blind watermarking application problem can be posed as a CCSI known to the transmitter problem. Gel’fand and Pinsker (1980) and Costa (1983) have valuable contributions in this field. The second one is the “source coding with state information (SCSI) known to the receiver”. This setup considers the theoretical compression rate limits of a source if the state information accessible to the receiver is correlated with the input source. Even though the theoretical foundation of the SCSI known to the receiver has dated back to 1970s by Slepian and Wolf (1973) and Wyner and Ziv (1976), we need to wait until 2000s to develop practical applications for efficiently source coding on low-power devices and sensors (Puri and Ramchandran, 2002; Aaron and Girod, 2002). Despite the random binning argument and coset creation of Slepian and Wolf (1973) for the proof of the achievable rate limits (which is not practically applicable), good error correcting codes can be employed for a sub-optimal solution. -3- Introduction Moreover, like classical source-channel coding duality, Cover and Chiang (2002); Pradhan et al (2003); Su et al (2000) have showed the strong duality between the source and channel coding with state information. We employ the error correcting coding techniques to tackle the two problems stated above. The main idea of Error Correcting Codes(ECC) is to add redundancy to the data to be transmitted in an appropriate manner which serves to detect and correct the erroneous parts from the channel. There exists two classes of ECC: “convolutional codes” and “block codes”. Some examples of block codes are Hamming codes, Reed Solomon Codes(RSC) and Low Density Parity Check(LDPC) Codes Gallager (1963). They all use a parity check matrix to create the redundancy, and have good error correcting capabilities. However if we take into consideration the performance for larger blocks of data and soft decoding possibility, LDPC has more advantages (Mackay, 2003). Convolutional codes are constituted by a finite state machine such that the output of the state machine depends on the current sample and the current state. A trellis path is a sequence of state transitions. Since convolutional codes do not permit all possible state transitions, a sequence of state transitions derived by a convolutional code is a valid trellis path. The decoder wants to calculate the most probable valid trellis path in either Maximum Likelihood(ML) sense or Maximum a posteriori(MAP) sense. The decoding can be done also in the sense of hard decision or soft decision where soft decision can be used for iterative decoding. Berrou and Glavieux (1996); Berrou et al (1993) have proposed a coding algorithm (turbo code) based on concatenation of convolutional codes with an interleaver and have used an optimal decoding algorithm of linear codes which is known as BCJR 1 in an iterative manner which operates close to the theoretical limits. Other techniques have been proposed for improving the performance of the turbo codes such as puncturing ( Acikel and Ryan (1997)) and interleaving ( Benedetto et al (1998); Tepe and Anderson (1998)). After the invention of turbo codes, LDPC codes are reinvented by Mackay and Neal (1997) using belief propagation algorithm. An LDPC code can be represented as a bipartite graph with variable nodes and check codes which are connected with edges. Variable nodes need to satisfy all the check node equations such that the modulo-2 sum of the variable nodes connected to a check node need to be 0. The bipartite graph can be regular or irregular depending on the number of edges connected to each variable node or check node are same or not. There exist studies for both regular and irregular code performances in Mackay and Neal (1997) and in Richardson and Urbanke (2001a); Chung et al (2001a); Chung (2000); Chung et al (2001b) respectively. With a carefully design of the the bipartite graph, irregular codes outperform the regular ones. Richardson et al (2001) have proposed a density evolution method for the design of LDPC codes which per1 The name of the BCJR algorithm is came from the initials of Bahl, Cocke, Jelinek and Raviv, where it is proposed in Bahl et al (1974). -4- Introduction forms 0.13 dB away from the theoretical limits, surpassing the best codes known so far (turbo codes). In this dissertation, we propose a complete system which does combined distributed source coding and data hiding. After a detailed research on state of art DSC and Data Hiding schemes, we apply a high-performance DSC method (based on LDPC) and a high-performance data hiding method (based on Trellis Coded Quantization (TCQ) and LDPC). For the derivation of the theoretical bounds of the system that we proposed, we extend the Costa (1983)s work on “Writing on Dirty Paper” by realizing a partial state information at the encoder and another partial state information at the decoder and we analyze the maximum achievable rates of this setup. This extension reduces to 6 different cases where 4 of them are already known and 2 of them are novel and have interesting application areas. Afterward, the combination of data hiding and Distributed Source Coding system is studied. Based on a practical application scenario, the theoretical rate distortion and channel capacity terms of this setup are derived. Actually the rate-distortion function of our setup is an extension of the Wyner-Ziv theorem with an appropriate correlation relation between the state information and the input source. Moreover, for the channel capacity, we use one of the special case in our findings on “Dirty Paper Coding with Partial State Information”. A practical code design is given applying LDPC and BCJR decoding on TCQ. Summary of Contributions The contributions of this dissertation can be summarized as follows. Our first major contribution is in the field of combined Data Hiding and Distributed Source Coding. We derive the rate distortion function and capacity formula of the embedding process for gaussian input case, and we propose a practical code design using LDPC and TCQ which operates close to theoretical limits. Our second major contribution is in the area of channel coding with partial side informations in gaussian input channels. The maximum achievable rates are derived for channel coding with side information partially/perfectly available to the encoder and partially/perfectly available to the decoder. Hence it is the extension of Costa’s “writing on dirty paper” setup, and this contribution is employed for the calculation of the channel capacity in combined Data Hiding and Distributed Source Coding problem. Moreover, we have proposed a Slepian-Wolf Coding Scheme based on LDPC code which operates 0.08 bit per channel use away from the theoretical limits. This proposed system is applied to a still-image coding system where the image is coded such that the low pass DWT coefficients are available to the decoder. Finally, our contributions in Informed Data Hiding field can be summarized as proposition of two embedding method, the first one is a low-rate embedding method -5- Introduction for DWT coefficients of still-images using a perceptual shaping, and the second one is high-rate embedding method for continuous synthetic data using superposition of a good source code C0 based on TCQ and a good channel code C1 based on LDPC. By applying an iterative decoding between BCJR algorithm and belief propagation, even for an AWGN attack noise which is 1.5 dB away from the theoretical limits, the embedded message can be decoded with an error rate of Pe ≤ 10−5 . Now we give brief explanation of our contributions in the order of appearance in this dissertation. Distributed Source Coding Slepian and Wolf (1973) have derived the compression rate limits of separate encoding and joint decoding of correlated sources drawn from discrete alphabet (See Figure-4). After the extension of this theorem with a distortion constrained by Wyner and Ziv (1976), the first practical code designs can be seen during late 2000s with the idea of encoding on low-power devices such as sensors. Figure 4: Coding of two correlated sources. According to distributed source coding, the statistical dependency of the two correlated sources can be exploited at the decoder. For instance one of the sources is coded with a low rate error correcting codes, and only the parity checks are sent through the channel. The second source assumed to be the noisy version of the first source available at the decoder, hence the decoder tries to correct the noisy part on the second source using the parity checks of the first source. In this dissertation, we propose a practical code design for Slepian-Wolf problem which is based on LDPC codes for discrete alphabet input. We use 2/3 rate LDPC codes which operates close to theoretical limits. A 2/3 rate LDPC corresponds to a 2 : 1 compression because of the ratio between (input source):(parity checks). The LDPC encoding and decoding used for DSC is as in the traditional LDPC codes described in Chapter-1.10 and some decoding modifications are applied for DSC as described in Chapter-2.4. Our system operates on a correlation noise entropy which is 0.08 bit per channel use away from the Slepian-Wolf limit for 2 : 1 compression rate. We also compare the -6- Introduction performance of our method with the existing systems. Please note that the system that we developed in this part is employed for the joint data-hiding compression system in Chapter-5. We also apply our proposed coding schemes for still-image compression applications such that in this setup a refinement coding is made using LDPC codes while low frequency Discrete Wavelet Transform (DWT) component of the image is accessible to the decoder. Blind Watermarking Suppose that the secret message that we embed within a cover signal or image is the information that we want to transmit, then blind watermarking problem can be viewed as Channel Coding with Side Information known to the encoder (See Figure5). Hence the theoretical limits of the secret message embedding rate can be calculated for noncausal memoryless systems. More interesting results are found by Costa (1983), such that for the gaussian input case, the capacity of the system is independent of the cover data S, and there is no capacity loss due to the unavailability of the cover data at the decoder side. Figure 5: Costa’s “Writing on Dirty Paper” setup. We develop two data hiding schemes, the first one achieves low embedding rate which modifies DWT coefficients of the still-image by using trellis coding and controlling the embedding strength based on perceptual sensitivity of the DWT coefficients. The performance of our proposed method in terms of error probability of the message extraction under several attacks are given. The second proposed scheme is focused on a high rate embedding performance by combining a good source code and a good channel code. During embedding process of the secret message, we employ LDPC encoding. Moreover, in order to respect the embedding criterion, 6 level output TCQ is used. Hence an LDPC code and a TCQ code are concatenated for the data hiding process. During transmission, the watermarked signal is exposed to an AWGN attack channel. At the decoder side, the received signal is decoded with belief propagation decoding for the LDPC side and BCJR decoding for the TCQ side. Since two of the decoding methods give soft -7- Introduction output probabilities, the decoding process is done in an iterative manner. For low SNR values, our system operates 1.5 dB away from the Costa’s limit. As in the DSC case, the blind watermarking system using superposition coding is one of the main block of the overall data-hiding compression system that we proposed in Chapter-5. Dirty Paper Coding with Partial State Information As described in the previous section, Costa derives the capacity region of a channel coding with state information problem known only to the decoder. However, for some reasons, partial information on the state of the channel can be available to the encoder or to the decoder (need not to be the same). Hence we derive the capacity of the gaussian channel with a state information, where the state information is partially known to the encoder and to the decoder as in Figure-6. Unlike the Costa’s Figure 6: Channel Coding with state informations. case, maximum achievable rate of this system depends on the state S also. Our contributions can be listed as • The analytic expression of the maximum achievable rate is found to be P (QK + QL + KL) 1 ∗ ∗ , max R(α ) = R(α ) = ln 1 + α 2 N (QK + QL + KL) + QLK (1) which is obtained for α∗ = P QK/(P QK + QN K + L(P Q + P K + QK + N Q + N K)). -8- Introduction • The general setup can be reduced to 6 different cases where 4 of them are well known (Case A,C,D,F) and 2 of them are new(Case B,E). The maximum achievable rates are calculated for all 6 cases. The two new cases are compared with the Costa’s setup. • In order to achieve the maximum achievable rate, the encoder need to know the channel variance parameters. However in real world applications, the exact parameters are not always known to the encoder. We analyzed the rate gains/losses for 6 cases and the general setup when the encoding is done for a non-optimum operating point. We also compare the gain/loss analysis with respect to Costa’s setup. This general setup is relevant for diverse practical applications such as watermarking under desynchronization attacks and point-to-point communication over fading channel where receiver has an estimate of the channel state. Informed Data Hiding and Distributed Source Coding We employ all the contributions up to this point in order to build a system with an Informed Data Hiding (IDH) and Distributed Source Coding (DSC). Motivated by the application scenario in Figure-5.1 on page 102, we derive the channel capacity and rate distortion function of a point to point communication system between Alice and Bob supplied by a non-trust Carrier. Alice sends a secret message by inserting it into a cover data with a power constrained, knowing that a correlated version of the cover data is accessible to Bob. Because of the non-secure transmission, Alice does not share her original copy, and transmits only the watermarked signal. In Carrier point of view, he wants to minimize his bandwidth while respecting a quality of services, so wants to compress Alice’s message given that Bob shares his noisy copy at the decoding end. Our main contributions can be listed as • The analytic expression of the rate distortion function of the Carrier in nats per channel use is found to be RW |Ŝ (D2 ) = ( 1 2 ln 0, D1 D2 + QK (Q+K)D2 QK Q+K , QK D1 + Q+K 0 < D2 < D1 + D2 ≥ . (2) • The embedding capacity of the overall system in nats per channel use is found to be D1 (D1 + Q − D2 ) 1 . C = ln 1 + 2 D2 (D1 + Q) -9- (3) Introduction Figure 7: Data Hiding + Source Coding Scheme. • A practical code design for gaussian case is proposed using the concatenation of the systems proposed in Chapter-2 and Chapter-3, hence a data hiding method using superposition of source and channel code, with a compression method using DSC principles are used in a unique system. The decoding is done as belief propagation iterations for decompressing the watermarked signal, and BCJR-belief propagation iterations for extracting the hidden mark. The performance of the system is compared with the theoretical bounds derived in this Chapter. • A toy example for Discrete Case is proposed, and performance analysis is given without analyzing the theoretical limits. Organization of the Dissertation This dissertation is constituted from two Parts and five Chapters. Part-I is titled as “Problem Statement and Preliminaries”, and includes a general introduction and a chapter introducing preliminary notions such as the definition of information theoretical elements, basic source and channel coding concepts and details of two powerful coding techniques that are used in this dissertation: Trellis Coded Quantization(TCQ) and Low Density Parity Check(LDPC) Codes (Chapter-1). Part-II is dedicated to the contributions of this dissertation and contains four chapters. In Chapter-2, we review the theoretical background for source coding with side information such as Slepian-Wolf theorem and Wyner-Ziv theorem, and explain - 10 - Introduction state of art Distributed Source Coding implementations in literature. We then introduce our practical code design for Slepian-Wolf problem which is based on LDPC codes for discrete alphabet input and we compare this scheme with existing systems. Finally, the extension of our practical code design for still-image compression using LDPC codes for binning and low frequency DWT coefficients as the side information available to the decoder. In Chapter-3, we give the theoretical background of channel coding with side information such as Gel’fand Pinsker theorem and Costa’s “writing on dirty paper” setup, and depict existing Informed data hiding implementations in literature. Then we presented our proposed informed data hiding methods, a low embedding rate on DWT coefficients of still-images and a high embedding rate using superposition of a good source code (TCQ) and a good channel code (LDPC). In Chapter-4, we give our information theoretic contributions on channel coding with side information. We extended the Costa’s “channel coding with side information perfectly known to the encoder” setup to “channel coding with side information partially known to the encoder and partially known to the decoder (which need not to be the same)”. The maximum achievable rate is calculated for this setup. This global setup can be reduced to 6 different sub-cases, where 4 of them are well-known setups, 2 of them are new. We analyze all 6 sub-cases and the general setup, and compare with the Costa’s initial setup. Now we have all ingredients to construct our final system in Chapter-5, in order to make source-channel coding, to hide information within a host signal and compress it using distributed compression techniques. The problem is formalized as the point to point communication between Alice and Bob by using a non-trust carrier. Alice wants to hide some information into her original copy and send it by the carrier. Carrier wants to compress this watermarked data, and the only thing he has is a noisy copy of the original data shared by Bob at the receiver end. We derive the rate-distortion function of the Carrier, and the capacity of embedding system for the gaussian input case. Surprisingly, the absence of the noisy copy of Bob to Carrier encoder does not affect to the rate-distortion function of the Carrier. Similarly, the absence of the Alice’s original copy to Bob does not affect to the embedding capacity formula. After these theoretical findings, we proposed a practical code design for the gaussian case by using our system proposed for DSC in Chapter-2 and high rate embedding system proposed for IDH in Chapter-3. The chapter is finalized with a practical code proposition for Binary Symmetric Channel. The dependencies between the chapters can be seen in Figure-I. - 11 - Introduction 1 Introduction and Preliminaries 2 Distributed Source Coding 3 Informed Watermarking 4 Dirty Paper Coding with Partial State Information 5 Data Hiding and Distributed Source Coding Figure 8: Chapter Dependencies of this dissertation. - 12 - Chapter 1 Preliminaries Contents 1.1 Notations and Conventions . . . . . . . . . . . . . . . . . 14 1.2 1.3 Entropy and Mutual Information . . . . . . . . . . . . . . Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 18 1.4 1.5 Source Coding . . . . . . . . . . . . . . . . . . . . . . . . . Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . 19 19 1.6 1.7 1.8 Distributed Source Coding . . . . . . . . . . . . . . . . . . Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . . Message Passing Algorithm . . . . . . . . . . . . . . . . . 19 20 20 1.9 Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . 21 1.9.1 Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . 22 1.9.2 BCJR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . 25 1.10.1 Decoding with belief propagation . . . . . . . . . . . . . . . 27 1.10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 27 1.10.1.2 Initialization . . . . . . . . . . . . . . . . . . . . . 28 1.10.1.3 Check node iteration . . . . . . . . . . . . . . . . 29 1.10.1.4 Variable node iteration . . . . . . . . . . . . . . . 29 1.10.1.5 Final Guess . . . . . . . . . . . . . . . . . . . . . . 29 1.10.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.10.3 Performance of 1/2 LDPC codes . . . . . . . . . . . . . . . 30 1.10.4 A visual example . . . . . . . . . . . . . . . . . . . . . . . . 31 1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . - 13 - 31 Chapter 1. Preliminaries In this Chapter, we introduce the notations used throughout this dissertation, and define the information theoretical utilities such as entropy, differential entropy and mutual information. After a brief explanation of source coding and channel coding limits with the help of the entropy related quantities, we give a practical source coding example with Trellis Coded Quantization (TCQ) and one channel coding example with LDPC coding. You can also find the explanation of decoding algorithms such as Viterbi decoding, BCJR decoding and belief propagation. 1.1 Notations and Conventions Throughout this dissertation, we will use standard concepts and results from Information theoretic quantities, that can be found, for example in Cover and Thomas (1991). Random variables will be denoted by capital letters, specific values they may take will be denoted the corresponding lower case letters, and the calligraphic font is used for sets. Similarly, random vectors, their realizations, and alphabets will be denoted respectively, by boldface capital letters, boldface lowercase letters, and calligraphic letters subscripted by the corresponding dimension. Thus, for example, X n will denote a random n-vector (X1 , ..., Xn ), and xn = (x1 , ..., xn ) is a specific vector value in X n , the n-th Cartesian power of X , which is drawn from independent and identically distribution(i.i.d.). For a pair of discrete random variables (X, Y ) with a joint distribution p(x, y), the entropy of X is denoted by H(X), the conditional entropy of X given Y by by H(X|Y ), the joint entropy by H(X, Y ), and the mutual information by I(X; Y ). A more detailed description on the entropy related quantities can be found in Chapter-1.2. A distortion measure d is a mapping from the set X × Y into the set of nonnegative reals d : X × Y → R+ . Two distortion functions used in this chapter are: • Hamming (probability of error) distortion which is given by 0, if x = y, d(x, y) = 1, if x 6= y, (1.1) and corresponds also to the probability of error distortion, since Ed(X, Y ) = P r(X 6= Y ) • Square error distance which is given by: d(x, y) = (x − y)2 (1.2) The distortion between two sequences d(x, y) of length n is given by: n d(x, y) = 1X d(xi , yi ) n i=1 - 14 - (1.3) 1.2. Entropy and Mutual Information 1.2 Entropy and Mutual Information Entropy is one of the key elements of the information theory. Borrowed from thermodynamics, it is known as the uncertainty of a random variable. It is measured in nats (natural log base) or in bits (log2 base). Before the definition of the entropy, we introduce Shannon information content. Assume a discrete random variable X drawn from a finite set x ∈ X with a probability mass function p(x) = P r{X = x}. Definition 1.1 Information content of an outcome x is defined to be 1 i(x) = log2 = − log2 (p(x)) . p(x) (1.4) Definition 1.2 Entropy is defined to be the average Shannon information content of an outcome: X H(X) ≡ E{i(x)} = − p(x) log2 (p(x)) (1.5) x∈X with the convention for p(x) = 0, p(x) log2 (p(x)) ≡ 0, since limθ→0+ (θ log2 (θ)) = 0. Now we introduce joint and conditional entropy, and mutual information. Definition 1.3 Joint Entropy H(X, Y ) of a pair of discrete r.v. X, Y drawing from (x, y) ∈ X × Y, with joint probability mass function p(x, y) = P r{X = x, Y = y} is XX H(X, Y ) = − p(x, y) log2 (p(x, y)) . (1.6) x∈X y∈Y Definition 1.4 If (x, y) ∼ p(x, y), then conditional entropy H(Y |X) is H(Y |X) = − =− =− X x∈X X p(x)H(Y |X = x) p(x) x∈X X p(y|x) log2 (p(y|x)) y∈Y XX p(x, y) log2 (p(y|x)) . (1.7) x∈X y∈Y The relation between the joint and conditional entropy can be expressed as: H(X, Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ) - 15 - (1.8) (1.9) Chapter 1. Preliminaries Definition 1.5 The relative entropy or Kullback-Leibler divergence between two probability mass function p(x) and q(x) is D(pkq) = X p(x) log2 x∈X p(x) q(x) (1.10) Definition 1.6 Given (x, y) ∼ p(x, y), the mutual information I(X; Y ) is the relative entropy between the joint distribution and the product distribution p(x)p(y) I(X; Y ) = D(p(x, y)kp(x)p(y)) XX p(x, y) = p(x, y) log2 p(x)p(y) (1.11) x∈X y∈Y Some of the relationships between the entropy and mutual information is I(X; X) = H(X), (1.12) I(X; Y ) = H(X) + H(Y ) − H(X, Y ), (1.13) I(X; Y ) = H(X) − H(X|Y ), I(X; Y ) = H(Y ) − H(Y |X), I(X; Y ) = I(Y ; X). (1.14) (1.15) (1.16) Figure 1.1: The Venn diagram of the relationship between entropy and mutual information. The Venn diagram shown in Figure-1.1 expresses the relationship between H(X), H(Y ), H(X, Y ), H(X|Y ), H(Y |X) and I(X; Y ). In this dissertation, we also use entropy and mutual information of more than two random variables. Now we define the chain rules in order to calculate entropy related functions for more than 2 random variables. - 16 - 1.2. Entropy and Mutual Information Definition 1.7 (Chain Rule for entropy) Let Random variables X1 , X2 , .., Xn be drawn according to p(x1 , x2 , .., xn ), then H(X1 , X2 , .., Xn ) = n X i=1 H(Xi |Xi−1 , .., X1 ). (1.17) Definition 1.8 Conditional mutual information of random variables X and Y given Z is I(X; Y |Z) = H(X|Z) − H(X|Y, Z). (1.18) Definition 1.9 (Chain Rule for mutual information): I(X1 , X2 , .., Xn ; Y ) = n X i=1 H(Xi ; Y |Xi−1 , .., X1 ). (1.19) Information content definition can be extended for continuous random variables drawn from an infinite set. Let X continuous r.v. with probability density function f (x) drawn from a support set S. Definition 1.10 Differential entropy is defined to be Z 1 f (x) log2 h(X) = dx. f (x) S (1.20) Below, we give numerical examples for entropy and differential entropy of several probability distribution functions. Example 1.1 (Binary Distribution) Entropy of a r.v. X from finite set X ∈ {0, 1} where p(0) = a and p(1) = 1 − a is : def H(X) = a log2 (1/a) + (1 − a) log2 (1/(1 − a)) = H(a), (1.21) where 0 ≤ a ≤ 1. The graph of H(a) versus a is shown in Figure-1.2. Please note that H(a) maximizes for a = 1/2. Example 1.2 (Uniform Distribution) Consider a random variable distributed uniformly between 0 and a as seen in Figure-1.3. Then its differential entropy is: Z a 1 log2 a dx = log2 a. (1.22) h(X) = a 0 - 17 - Chapter 1. Preliminaries Figure 1.2: Binary entropy function H(a) versus a. Figure 1.3: Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a. Example 1.3 (Gaussian Distribution) Consider a random variable with a gaussian √ distribution 2X ∼ N (0, P ) hence having probability density function f (x) = 1/ 2πP exp (−x /2P ). Then its differential entropy is: Z ∞ 1 (1.23) f (x) log2 (f (x))dx = log2 2πeP. h(X) = 2 −∞ Remark 1.1 For a given probability density function with variance P , the gaussian distribution has the greatest differential entropy. 1.3 Causality A system is called causal if its output depends only on its past and its present inputs. Otherwise, if the output depends also on future inputs, then it is defined to be noncausal. During this dissertation, we focus on the non-causal systems. - 18 - 1.4. Source Coding 1.4 Source Coding Definition 1.11 We define the rate distortion function of a discrete memoryless system in Figure-1.4 with a fidelity criterion d(X, X̂) ≤ D as R(D) = min I(X; X̂), (1.24) p(x̂|x):E{d(X,X̂)≤D} where the minimum is taken over all conditional distributions p(x̂|x) for which the joint distribution p(x, x̂) satisfies the expected distortion constraint. Figure 1.4: A compression system. Thus, the Rate-Distortion function gives the minimum rate R needed to have a compression of the input with a maximum distortion level D. 1.5 Channel Coding Definition 1.12 We define the channel capacity of a discrete memoryless system in Figure-1.5 as C = max I(X; Y ), p(x) (1.25) where the maximum is taken over all possible input distributions. If we analyze the operational meaning of the channel capacity, it can be expressed as the highest rate in bits per channel use at which information can be sent with arbitrarily low error probability. Figure 1.5: A communication system. 1.6 Distributed Source Coding In the area of compression of correlated multi-sources, Slepian and Wolf (1973) have showed that with a separate encoding of each source and the joint decoding at - 19 - Chapter 1. Preliminaries the receiving end has no rate loss with respect to the case of joint encoding and joint decoding system. Even with the separate encoding, the joint decoder can exploit the correlation between the sources. The idea is the each separate encoder partition possible inputs into random subsets, and send only the index of the subset (known as syndrome) and the decoder used channel coding principles in order to estimate the sources from their syndromes. Motivated by the idea of developing low complexity encoders for low-power handhelds, in this dissertation we propose to transmit the parity checks of a high performance error correcting codes such as LDPC for the separate compression of correlated sources. 1.7 Writing on Dirty Paper Costa (1983) has introduced the terminology “Writing on Dirty Paper” for the problem of coding with state information at the encoder. The encoder can communicate with the decoder using a signal X with a limited power P and try to sent a message M given that the state information S of the channel is accessible only to the encoder. Costa has showed that the non-availability of the state information to the decoder does not affect the capacity of the system. The state information is assumed to be a dirt. Instead of canceling out this dirt by using its limited power P , the encoder can use its power in the direction of the dirt, and can achieve the same capacity where the state information is accessible to the decoder. An auxiliary variable U is used for encoding such that U = X + αS, where X is the output of the encoder, and α is a multiplication constant between 0 and 1. If α is chosen as P/(P + N ), the rate of the communication is maximized. Then it is enough to send the appropriate X by modifying the αS to the closest U value which is indexed by the message M . Chen and Wornell (1998) and Cox et al (1999) have firstly realized that this setup can be used to determine the capacity of a blind watermarking problem. In this dissertation, highly motivated by this setup, we search the capacity of a system where the state information is partially available to the decoder. Moreover practical code design for writing on dirty paper will be proposed. 1.8 Message Passing Algorithm Message Passing is a simple and powerful algorithm that is used to resolve diverse research problems from counting problem to marginalization problems. Since it is fundamental for the belief propagation for LDPC decoding and BCJR algorithm for turbo codes, we mention it with a simple counting problem on a straight line (See Figure-1.6). Instead of dedicating one person to sum up all of the group, the head and the tail of the line can send a message to their neighbors by simply saying 1, And if a person receives a message from only one of his neighbors, it adds up 1 to the message and transmits to his other-side neighbor. If it receives the messages of both of two neighbors, the sum of the line can be found as the sum (left message+right - 20 - 1.9. Trellis Coded Quantization (TCQ) message+1). If there exists no loop, the message passing algorithm converges to the exact solution. You can find the details of BCJR algorithm (backward-forward) and belief propagation in the following sections. Figure 1.6: Counting problem on a straight line. 1.9 Trellis Coded Quantization (TCQ) Trellis Coded Quantization (TCQ) is a limit achieving vector quantization method proposed by Marcellin and Ficher (1990). It uses Ungerboeck (1982)’s set partitioning idea for Trellis Coded Modulation(TCM). Let X n a random n vector (X1 , ..., Xn ) where each element of the vector is i.i.d. with probability density function P (X). We want to quantize this vector by m bits per sample hence to transmit one of the 2m symbols per sample. The basic idea of the TCQ is such that the elements of quantized data Y n constitutes a markov sequence, and while it is transmitted through a noisy channel, X n is the output of the channel. The aim is to find the sequence Y n most probable given X n . First, the possible symbols are doubled to 2m+1 , then partitioned into 2k+1 subsets where k ≤ m. TCQ uses a rate k/(k + 1) convolutional code to expand k input bits to k + 1 to select one of the k + 1 subset and uses the rest m − k bits to select one of the 2m−k symbols in the selected subset. Then by minimizing the MSE between X n and possible sequences Ŷ n , Y n is found. Using a convolutional code and set partitioning has a better performance than conventional modulation techniques. The min-sum algorithm which is also know as Viterbi algorithm can be applied to find the most probable sequence Y n . Now we will give a brief example of trellis and a TCQ example with a 1/2 rate convolutional code with memory 2. A systematic recursive convolutional code with generator matrix (011, 101) in binary form or (3, 5) in octal digits can be seen in Figure-1.7. The blocks D are the delay elements with unit time. The convolutional code in Figure-1.7 corresponds to a state diagram as seen in Figure-1.8 where the states are the 2 bit memory and state transitions are described by the arrows marked as corresponding input output pairs ik /y1,k y0,k . If we map the output sequences to 4 reconstruction levels D0 , D1 , D2 and D3 . For instance for 2 bits per sample, the reconstruction levels are doubled as seen in Figure-1.9. For time instant i and for each possible transitions Dk where 0 ≤ k ≤ 3, - 21 - Chapter 1. Preliminaries Figure 1.7: A 1/2 recursive systematic convolution code with memory 2 and generator matrix (3, 5) in octal digits. the MSE cost of selecting the closest element within the selected output is calculated (Xi − Dk )2 . Starting by state 00 at time 0, the one bit input choses one of the possible 2 subsets. Then using viterbi algorithm as explained in Chapter-1.9.1, the most probable path pn1 is found. Then when at time t, pt governs chose one of the four dictionary Dk by the convolutional code, extra 1 bit is needed to choose the index of the sub-dictionary of Dk . 1.9.1 Viterbi algorithm Viterbi algorithm or known as min-sum algorithm (Viterbi, 1967) tries to find the most probable sequence within the valid codewords. For all time sequences t = 1, .., n and all possible output level k = 1, .., 4, the MSE cost of each output sequence (Xt − Dk )2 is calculated. Initializing the cost of the state 0 at t = 0 as 0 and the other states as ∞, each node transmits the current state cost plus the cost of the arc chosen. In the next step, the node chooses the minimum cost message among the messages it receives and send it to the next time step. At the end the minimum cost of the overall codewords is found. Finally the most probable path that minimizes the word error can be found by back-tracing the minimum sum path. For instance Figure-1.10 shows all the possible paths of a trellis with length 4. Hence Viterbi algorithm searches the minimum cost path among all possible paths. 1.9.2 BCJR While viterbi algorithm is a maximum-likelihood decoding method which minimizes the probability of the word error, minimizes the probability of the sequence error, Bahl et al (1974) proposed an algorithm known also as BCJR, which can minimize the symbol error probability. Borrowing from the message passing algorithm, BCJR calculates the probability of a symbol given the observed sequence. The state transitions of the Markov source are governed by the transition probabilities pt (m|m′ ) = P r{St = m|St−1 = m′ }, - 22 - 1.9. Trellis Coded Quantization (TCQ) 0/D0 1/D3 0 1 0/D0 1/D3 0/D2 1/D1 2 3 0/D2 1/D1 Figure 1.8: State transition of the recursive systematic convolutional code (1, 3) in octal digits. Figure 1.9: Output points and corresponding partitions for 2 bits per sample. and the output by the probabilities qt (X|m′ , m) = P r{xt = X|St = m, St−1 = m′ }. Since outputs are deterministic given the previous and current state, qt (X|m′ , m) term only takes 0 or 1 depending on the possibility of that transition. for 1 ≤ t ≤ τ . The decoder receives the error sequence Y1τ and try to estimate the a posteriori transition probabilities given the observation Y1τ i.e. P r{St−1 = m′ ; St = m|Y1τ } = P r{St−1 = m′ ; St = m; Y1τ }/P r{Y1τ }. (1.26) For this purpose, it is more convenient to estimate the quantity σt (m′ , m) = P r{St−1 = m′ ; St = m; Y1τ }. - 23 - Chapter 1. Preliminaries Figure 1.10: Viterbi decoding of a vector with length 4. Let we define the probability functions αt (m) = P r{St = m; Y1t } τ βt (m) = P r{Yt+1 |St = m} γt (m′ , m) = P r{St = m; Yt |St−1 = m′ } σt (m′ , m) = P r{St−1 = m′ ; St = m; Y1τ } = αt−1 (m′ ) · γt (m′ , m) · βt (m). Now γt (m′ , m) = = 1 X U =0 1 X U =0 P r{St = m|St−1 = m′ } · P r{ut = U |St = m, St−1 = m′ } · P r{Yt |U } pt (m, m′ ) · qt (U |m′ , m) · R(Yt |U ) (1.27) is calculated for each possible transitions and for t = 1, 2, .., τ where R(Yt |U ) is the appropriate symbol transition probabilities of the channel. Then, for t = 1, 2, .., τ αt (m) = = = = M −1 X m′ =0 M −1 X m′ =0 M −1 X m′ =0 M −1 X m′ =0 P r{St−1 = m′ ; St = m; Y1t } P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ , Y1t−1 } P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ } αt−1 (m′ ) · γt (m′ , m). (1.28) The boundary conditions of α0 (m) for t = 0 are α0 (m) = 0, for m 6= 0. α0 (0) = 1; - 24 - (1.29) 1.10. Low Density Parity Check (LDPC) Codes Similarly, for t = 1, 2, .., τ − 1. βt (m) = = = M −1 X m′ =0 M −1 X m′ =0 M −1 X m′ =0 τ P r{St+1 = m′ ; Yt+1 |St = m} τ P r{St+1 = m′ ; Yt+1 |St = m} · P r{Yt+2 |St+1 = m′ } βt+1 (m′ ) · γt+1 (m, m′ ). (1.30) The boundary condition for βτ is βτ (m) = 1/M, (1.31) since the termination state probability is equally distributed over all possible M states. Finally σ is calculated as τ σt (m′ , m) = P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ } · P r{Yt+1 |St = m} = αt−1 (m′ ) · γt (m′ , m) · βt (m). (1.32) The recursive calculation σt (m′ , m) can be done in 4 steps given below. 1. Initialization of α0 (m) and βτ (m). 2. Calculation of γt (m′ , m) and αt (m) for all t = 1, 2, ..; τ and for all possible transitions. 3. Recursively compute βt (m). 4. compute σt (m′ , m). The pseudo-code of BCJR Algorithm can be find in Algorithm-1. 1.10 Low Density Parity Check (LDPC) Codes Low Density Parity Check (LDPC) codes are first proposed by Gallager (1963) and reinvented by Mackay and Neal (1997). A k/n rate linear binary (n, k) LDPC Code is a block code that is defined by an (n − k) × n sparse parity check matrix H, which has few numbers of 1s in every rows and columns (For instance Equation 1.33). Another representation of the parity check code is by its bipartite graph(See Figure 1.11) Mackay (2003). - 25 - Chapter 1. Preliminaries Algorithm 1 BJCR Algorithm Require: The received vector Y τ = {Y1 , Y2 , .., Yτ } Ensure: σt (m′ , m) Initialize α0 (m) and βτ (m) according to Equations-1.29 and 1.31 while t : 1 ≤ t ≤ τ do calculate γt (m′ , m) as in Equation-1.27 calculate αt (m) as in Equation-1.28 end while while t : τ − 1 ≥ t ≥ 1 do calculate βt (m) as in Equation-1.30 end while while t : 1 ≤ t ≤ τ do calculate σt (m′ , m) as in Equation-1.32 end while Return σt (m′ , m). H= 1 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1 . (1.33) An ensemble of the LDPC codes is described by the degree distribution polynomials λ(x) and ρ(x) Richardson et al (2001); Chung (2000). λ(x) is given as X λ(x) = λi xi−1 , (1.34) i and ρ(x) is defined as ρ(x) = X ρj xj−1 , (1.35) j where λi is the fraction of edges that are incident on degree-i bit nodes and ρj is the fraction of edges that are incident on degree-j check nodes. A code is to be regular (wc ,wr ) if the degree polynomials are λ(x) = xwc −1 and ρ(x) = xwr −1 .The rate of the LDPC code for a given pair of degree profiles is bounded by R1 R ≥ 1 − R 01 0 ρ(x)dx λ(x)dx , (1.36) with equality if and only if the rows of the parity check matrix are linearly independent. - 26 - 1.10. Low Density Parity Check (LDPC) Codes Figure 1.11: Bipartite graph representation of the parity check matrix H. 1.10.1 Decoding with belief propagation The transmitter sends a codeword x such that Ht x = 0. The receiver receives the vector y with a transition probability p(y|x). The aim of the decoder is to find maximum likelihood codeword xM L = arg maxx p(y|x). If H does not include cycles, the sum product algorithm converges the exact solution Pearl (1988). You can find below the sum-product algorithm on bipartite graph.We use the following notation, also shown in Figure 1.12. 1.10.1.1 Definitions • The set of bits n that participates the check m is N (m) ≡ {n : Hmn = 1}. For example N (1) ≡ {1, 3, 5, 7} in Figure 1.11. • The set of checks in which bit n participates is M(n) ≡ {m : Hmn = 1}. For example M(1) ≡ {1, 2} in Figure 1.11. - 27 - Chapter 1. Preliminaries Figure 1.12: Belief propagation on bipartite graph H. • N (m)\n is the set N (m) with bit n excluded. x is the probability of the n’th bit of the vector x is x given the informations • qmn obtained via checks other than check m. x is the probability of check m satisfied if bit n of x is considered fixed at x • rmn and the other bits qmn′ : n′ ∈ N (m)\n. • δqmn is difference between the probabilities n’th bit of the vector x is 0 and 1 given the informations obtained via checks other than check m, δqmn = 0 − q1 . qmn mn • δrmn is the probability check m satisfied if bit n x is 0 given the informations obtained via checks other than check m minus that of bit n x is 1, δrmn = 0 − r1 . rmn mn 1.10.1.2 Initialization Depending on the vector y received from the channel and the channel model, the likelihood probability p(xn |y) for each bit n are calculated. For instance, for a memoryless binary symmetric channel with crossover probability ρ, p(x1 = 0|y1 = 0) = (1 − ρ) and p(x1 = 1|y1 = 0) = ρ. 0 and q 1 values are initialized with the corresponding likelihood probabilities qmn mn 0 1 received from the channel respectively, such that qmn = p(xn = 0|y) and qmn = - 28 - 1.10. Low Density Parity Check (LDPC) Codes p(xn = 1|y). Then each variable node sends the messages δqmn to its connected check. 1.10.1.3 Check node iteration a , which is an approxiEach check node sends a message to the connecting bit j, rij mation to the probability that check i is satisfied given the symbol j is a: a rij = P r{check i satisfy|xj = a}, (1.37) The 0 rmn ≈ X p( xn′ :n′ ∈N (m)\n X xz = 0mod 2|xn = 0) Y x ′ n qmn ′ (1.38) N (m)\n xz :z∈N (m) a by first calculating δr Then there is a shortcut for calculating rij mn δrmn = Y (1.39) δqmn′ , n′ ∈N (m)\n 0 = 1/2(1 + δr 1 where rmn mn ) and rmn = 1/2(1 − δrmn ). The δrmn can be calculated efficiently by using backward-forward algorithm Bahl et al (1974). 1.10.1.4 Variable node iteration 1 0 values are calculated by using the output from the check and qmn In this step, qmn node iteration. Y 0 0 rm (1.40) qmn = αmn p(xn = 0|y) ′n, m′ ∈M(n)\m and 1 qmn = αmn p(xn = 1|y) Y 1 rm ′n, (1.41) m′ ∈M(n)\m 0 + q 1 = 1. where αmn is a normalization factor such that qmn mn 1.10.1.5 Final Guess Posterior probabilities of each bit can be calculated as Y 0 rmn , qn0 = αn p(xn = 0|y) (1.42) m∈M(n) and qn1 = αn p(xn = 1|y) Y m∈M(n) - 29 - 1 rmn . (1.43) Chapter 1. Preliminaries The estimate x̂ can be found by just thresholding the posterior probabilities x̂n = arg max qni . i (1.44) For the codeword decoding point, we can check if all the check nodes are satisfied Hx̂ = 0mod2. If it is not a codeword the check-node and variable-node iterations will be repeated respectively. The iterations halt even if a codeword is found or a maximum number of iteration is reached. 1.10.2 Encoding Assume an (n − k) × n sparse parity check matrix H in systematic form, such that H = [P|IM ]. Then, the corresponding generator matrix G is simply an n × k dense matrix with the form G = IK |Pt , where P has dimension k × m and I is the identity matrix. Hence from k input bit vector t a length n codeword vector x can be calculated by simply a matrix product operation G × t = x. The method of Richardson and Urbanke (2001b) can be used for fast encoding of LDPC codes. 1.10.3 Performance of 1/2 LDPC codes In this part, we evaluate the error correcting capacity of rate 1/2 LDPC binary codes for various block lengths and degree distribution polynomials. Let xn/2 is a n/2 length binary string with Bernoulli(1/2). Using a 1/2 rate LDPC coder, x is coded as a n bit length vector r. then it is modulated to R using 2 level Pulse Amplitude Modulation (PAM) as Ri = √ −√Q, if ri = 0 , + Q, if ri = 1 (1.45) Then the AWGN channel outputs Y = R + Z where Z is i.i.d. r.v. ∼ N (0, N ). Decoder initializes the likelihood ratio as √ fN (Yi − Q) p(ri = 1|Yi ) √ = p(ri = 0|Yi ) fN (Yi + Q) √ √ −(Yi − Q)2 −(Yi + Q)2 − = exp 2N 2N √ 2Yi Q = exp , (1.46) N (1.47) where fN corresponds to probability density function of a gaussian distribution with 0 mean and variance N . Then belief propagation decoding is done as explained in Chapter-1.10.1. The performance comparisons of the decoding error rates are given in Richardson et al (2001) for (3, 6) regular LDPC code, turbo code and - 30 - 1.11. Conclusion optimized irregular LDPC code (See Figure 1.13). Please note that the comparison is made for a code length of 106 for all codes. Figure 1.13: Performance comparison of the error rates of (3, 6) regular LDPC code, turbo code and optimized irregular LDPC code. The channel is Binary input, additive white gaussian noise. 1.10.4 A visual example In this section we give a visual example of LDPC coding by using a black and white cartoon image as the input binary string to be coded. Let the image in Figure1.14(a) is a binary string need to be transmitted through a noisy channel. This 100 × 100 cartoon is composed from 1s and 0s that corresponds to black and white pixels respectively. We add redundancy in order to detect and correct the erroneous bits during the transmission. A 1/2 rate systematic regular LDPC code with the degree polynomials λ(x) = x2 and ρ(x) = x5 is used to code the original image (each information bit participates 3 checks and each check bit is calculated by sum of 6 information bits,). Encoded image with its redundancy bits can be seen in Figure1.14(b). Afterward the encoded bits are transmitted through a Binary Symmetric Channel with crossover probability p(BSC) = 0.07. The decoder receives the noisy image in Figure1.14(c) and uses the belief propagation method explained in Chapter1.10.1 by taking into account of the apriory probability of the systematic bits is known to be P (x = 1) = 0.2445 and the channel characteristic is p(BSC) = 0.07. You can see the output of the belief propagation in Figure-1.15 after 1 iteration (a), after 5 iterations (b), and after 10 iterations (c) in Figure1.15. 1.11 Conclusion This chapter has introduced both theoretical and practical tools that will be used in this dissertation. Entropy, mutual information definitions will be used to calculate - 31 - Chapter 1. Preliminaries (a) (b) (c) c 2007 piyalemadra.com, Figure 1.14: LDPC coding example. Cartoon copyright used with permission. (a) Original binary cartoon with size 100 × 100 with 0s correspond to white and 1s correspond to black pixels. The ratio between the number of black pixels and the number of total pixels is 0.2445. (b) Visualization of the cartoon coded with 1/2 systematic LDPC code such that the output of the encoder contains the original image and its parity checks with size 100×100. (c) Throughout the transmission, both the cartoon and its parity check bits are exposed to bit errors such that the error probability of a received bit is 0.07. the capacity calculations of the proposed systems in Chapter-4 and Chapter-5. Furthermore, high performance channel code LDPC and high performance source code TCQ will be utilized in order to design of our practical codes for data hiding and Slepian-Wolf source coding in the following Chapters. - 32 - 1.11. Conclusion (a) (b) (c) Figure 1.15: LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c) The original cartoon is decoded without any error after 10 iterations. - 33 - Chapter 1. Preliminaries - 34 - Part II Contributions - 35 - Chapter 2 Distributed Source Coding Contents 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . 39 2.2.1 2.2.2 2.3 Slepian-Wolf Coding of Discrete Sources . . . . . . . . . . . 39 Wyner-Ziv Theorem . . . . . . . . . . . . . . . . . . . . . . 41 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.3.1 Code Design for Slepian-Wolf Coding . . . . . . . . . . . . 45 2.3.1.1 2.3.1.2 Convolutional Codes . . . . . . . . . . . . . . . . . 45 Turbo Codes . . . . . . . . . . . . . . . . . . . . . 47 2.3.1.3 LDPC Codes . . . . . . . . . . . . . . . . . . . . . 47 2.3.2 Code Design for Wyner-Ziv Coding . . . . . . . . . . . . . . 48 2.4 Practical Code Design . . . . . . . . . . . . . . . . . . . . 50 2.5 2.4.1 Input Constraints and Theoretical Correlation Noise Analysis for a Given Rate . . . . . . . . . . . . . . . . . . . . . . 50 2.4.2 2.4.3 2.4.4 LDPC Code Generation and Coset Index Calculation . . . 51 Modified Sum Product Algorithm . . . . . . . . . . . . . . 51 Experimental Setup and Performance Analysis . . . . . . . 53 Practical Application for Still-Image Coding . . . . . . . 54 2.5.1 Side Information . . . . . . . . . . . . . . . . . . . . . . . . 55 2.5.1.1 2.5.1.2 2.6 Coset Creation . . . . . . . . . . . . . . . . . . . . 56 Iterative Decoding . . . . . . . . . . . . . . . . . . 56 2.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . 56 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 - 37 - Chapter 2. Distributed Source Coding A practical code design for Slepian-Wolf setup is proposed. Based on LDPC binning techniques, a performance of only 0.08 bits/channel use away from the theoretical limits is achieved. The system is applied to a still image coding scheme where the decoder has access to the low-pass wavelet coefficients of the image, a complementary coding is done based on DSC principles for the refinement of the side information at the decoder.1 2.1 Introduction Slepian and Wolf (1973) derived the achievable rate region for the problem of lossless source coding with side information. Wyner and Ziv (1976) later showed the rate distortion function for such a system. Early 2000s, a potential application of Slepian-Wolf and Wyner-Ziv theorems was realized such that compression complexity problems on low-power devices can be shifted to the decoder side, and the practical code designs have been proposed based on channel coding principles. In this chapter, we introduce the recent work on constructing practical codes for source coding with side information using the framework of LDPC codes. The orientation of the chapter is as follows. In Chapter-2.2, the details of the two stimulating theorems for the DSC are given, Slepian-Wolf theorem for lossless compression of correlated sources, and Wyner-Ziv theorem for the rate-distortion function of a source where a correlated version of it is available only to the decoder. Then the prior work in order to design practical codes on this area is given in Chapter-2.3. Chapter-2.4 gives the details of our code design based on LDPC and compares our setup with the existing DSC systems. Finally, the application of our code design applied on compression of stillimage is proposed in Chapter-2.5 where it includes distributed source coding of a still image given that low-pass wavelet coefficients are available to the decoder. 2.1.1 List of Symbols List of symbols that are used in this chapter can be find below. X, Y X̂, Ŷ H(X) H(X, Y ) H(X|Y ) RX , RY D Π, Π−1 Two i.i.d. correlated input sources. Estimations at the decoder. Entropy of X. Joint entropy of X and Y . Conditional entropy of X given Y . Achievable rates of X and Y . Distortion level. Interleaver and deinterleaver. 1 The contents of this chapter have been presented partially in Dikici et al (2005) and Dikici et al (2006b). - 38 - 2.2. Theoretical Background 2.2 2.2.1 Theoretical Background Slepian-Wolf Coding of Discrete Sources The Slepian-Wolf theorem states the admissible rate regions for coding two correlated i.i.d. sources X and Y which are drawn from a finite alphabet. The encoding and decoding of these two correlated sources depends upon the information available at the encoders and decoders. Figure-2.1 generalizes 16 different cases by simply switching on and off of 4 switches S1 , S2 , S3 and S4 (Slepian and Wolf, 1973). A state variable si is associated with switch Si taking the value 0 if the switch is opened, 1 if it is closed. The quadruple {s1 s2 s3 s4 } will be used to specify the settings of the switches. The cases vary in novelty and interest. For example case {1111} is already known since Shannon such that two correlated sources can be jointly compressed with a total rate of RX + RY ≥ H(X, Y ). However the admissible regions of the cases {0011} and {0001} are the most interesting results of the Slepian Wolf theorem. Figure 2.1: 16 Cases of correlated source coding. Table-2.2 lists twelve theorems whose implications in connection with Figure-2.2 give the admissible rate region R for the 16 cases. Certain lines and points on Figure2.2 are labeled with the names of the theorems in Table-2.2. The admissible region of a setup is determined immediately with these lines and points and the theorems f and g. Symbol x in the first column of the Table-2.2 states that the theorem holds both when the corresponding switch is open and when it is closed. For instance, in order to find the admissible region of the setting {1011}, Table2.2 states that the Theorems B E a c d e f g apply. The first two show that R can not extend below the line B and nor below the line E of Figure-2.2. The next fourr show that the points a c d and e lie in R. Theorem f shows that points above a on the RY axis and points on B to the right of c lie in R. Finally Theorem g shows the line segment ac is in R (See Figure-2.3 for the rate region of the case {1011}). For the setting {0011}, the theorems A B E c d e f g hold. According to the first three theorems, R can not extend left side of line A nor below the lines E and B. The points c, e lie in R. Theorem f then shows that every point above d on A and - 39 - Chapter 2. Distributed Source Coding Table 2.2: Achievable rate regions according to the Slepian-Wolf Theorem s1 , s2 , s3 , s4 Theorem name 0xxx x0xx xx0x xxx0 xxxx A B C D E 1xx1 x11x xx1x xxx1 xxxx a b c d e xxxx f xxxx g Theorem It is necessary that: RX ≥ H(X|Y ) RY ≥ H(Y |X) RY ≥ H(Y ) RX ≥ H(X) RX + RY ≥ H(X, Y ) It is sufficient that: RX = 0 RY = H(X, Y ) + ǫXY RX = H(X, Y ) + ǫXY RY = 0 RX = H(X) + ǫX RY = H(Y |X) + ǫY RX = H(X|Y ) + ǫX RY = H(Y ) + ǫY RX = H(X) + ǫX RY = H(Y ) + ǫY ǫX , ǫY , ǫXY > 0 Bit stuffing: (RX , RY ) ∈ R =⇒ (RX + δX , RY + δY ) ∈ R δ X , δY ≥ 0 Limited time sharing: If (RX , RY ) ∈ R =⇒ (RX ′ , RY ′ ) ∈ R and RX + RY = H(X, Y ) and RX ′ + RY ′ = H(X, Y ), then (RX ′′ + RY ′′ ) ∈ R, RX ′′ = λRX + (1 − λ) + RX ′ RY ′′ = λRY + (1 − λ) + RY ′ 0 ≤ λ ≤ 1 every point to the right of c on B are also in R. By Theorem g, the line segment dc is in R. The region R of Subfigure-2.4(a) is thus established. The novelty of the Slepian-Wolf setup is that the minimum admissible regions of 2 separate encodersunique decoder({0011}) and unique encoder- unique decoder({1111}) overlaps on the operating line segment dc. The admissible region of {0011} can be expressed as: RX ≥ H(X|Y ) (2.1) RY ≥ H(Y |X) (2.2) RX + RY ≥ H(X, Y ) (2.3) Moreover, the setting {0001} corresponds to the “Coding with Side Information at the decoder” or “Distributed Source Coding”, where a source X is compressed such that a correlated version Y is accessible at the decoder. Table-2.2 shows that Theorems A B C E d e f g all apply. Locating the lines ABCE on Figure-2.2, R can - 40 - 2.2. Theoretical Background Figure 2.2: Lines and points of Table-2.2. Figure 2.3: Admissible Slepian-Wolf rate region R for the case {1011}. not extend to the left of A nor below C. The point d is in R, then by Theorem f, all the points to the right of d on C and every point above d on A are in R (See Subfigure-2.4(b)). 2.2.2 Wyner-Ziv Theorem Wyner and Ziv (1976) find the rate-distortion function of a source X given that a correlated information Y available at the encoder, at the decoder or both where distortion is defined as a nonnegative function d(X, X̂) ≥ 0. As seen in Figure-2.5, two switches A and B controls the side information Y available to the encoder or the decoder. Wyner-Ziv analyze the rate distortion of three cases: • Switches A and B are open, i.e. no side information: - 41 - Chapter 2. Distributed Source Coding (a) Case {0011}. (b) Case {0001}. Figure 2.4: Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}. Then the classical Shannon Theory yields RX (D) = min I(X; X̂). (2.4) p(x̂|x):E{d(X,X̂)}≤D • Switches A and B are closed, i.e. both the encoder and the decoder have access to the side information: In this case the rate distortion function is RX|Y (D) = min I(X; X̂|Y ) (2.5) p(x̂|x,y):E{d(X,X̂)}≤D • Switch A is open and B is closed, i.e. only the decoder has access to the side information: Then Wyner-Ziv show that the rate is ∗ RX|Y (D) = min p(z|x)p(x̂|s,z):E{d(X,X̂)}≤D I(X; Z) − I(Y ; Z) (2.6) Wyner and Ziv (1976) show that: ∗ RX|Y (D) ≤ RX|Y (D) ≤ RX (D) (2.7) For D = 0 the theorem is consistent with the Slepian Wolf Theorem such that ∗ RX|Y (0) = RX|Y (0) = H(X|Y ). Wyner-ziv derived the rate distortion function for binary symmetric case where they assumed that X is the unbiased input to a BSC channel with crossover probability pz with 0 ≤ pz ≤ 0.5 and Y is the corresponding output. Y can be expressed as Y = X ⊕Z where Z is bernoulli(pz ) distributed binary string and ⊕ is the addition in - 42 - 2.2. Theoretical Background Figure 2.5: Wyner-Ziv Setup. ∗ for the Hamming distance modulo 2 arithmetic. The rate distortion function RX|Y distortion measure is shown by Wyner and Ziv (1976) as: ∗ RX|Y (D) = l.c.e. {H(pz ∗ D) − H(D), (pz , 0)} , 0 ≤ D ≤ pz (2.8) where l.c.e. is the lower convex envelop and pz ∗ D = pz (1 − D) + D(1 − pz ), and H(λ) = −λ ln λ − (1 − λ) ln(1 − λ) is the entropy function for binary distribution ∗ ∗ which defined in Chapter-1.2. As seen from the graph of RX|Y in Figure2.6, RX|Y = ∗ H(pz ∗ D) − H(D) for 0 ≤ D ≤ dc and RX|Y is a straight line segment between (dc , H(pz ∗ dc ) − H(dc )) and (pz , 0). Hence, if we define g(D) = H(pz ∗ D) − H(D), then dc is the solution of the equation g(dc ) = g ′ (dc ), d c − p0 (2.9) where g ′ (dc ) is the derivative of g(D) with respect to D at point dc . You can also find the graph of RX|Y (D), the rate distortion curve, where Y is accessible both at the encoder and at the decoder. The analytic form of RX|Y (D) is known as Cover and Thomas (1991): H(pz ) − H(D), 0 < D ≤ pz , (2.10) RX|Y (D) = 0, D ≥ pz , ∗ (D) only at two Hence, for the binary symmetric source case, RX|Y (D) = RX|Y (R, D) points: (H(X|Y ), 0) and (0, pz ). Otherwise there exists a rate loss with ∗ (D). respect to the RX|Y (D), RX|Y (D) < RX|Y A more interesting result is found for coding when switch A is open and B is closed for the continuous Gaussian case, such that there is no rate loss with respect to RX|Y (D) for all D values (Wyner and Ziv, 1976). Let X has i.i.d. gaussian distribution with N (0, Q) and Y = X +Z where Z is gaussian i.i.d. with N (0, N ) and ∗ is independent of X (See in Figure2.7). Then, the rate distortion function RX|Y (D) is equal to the RX|Y (D) which has been calculated by Berger (1971): - 43 - Chapter 2. Distributed Source Coding ∗ (D), and H(pz ) − H(D) versus D curves for Figure 2.6: Graph of RX|Y (D), RX|Y ∗ pz = 0.28. For binary symmetric case, RX|Y (D) has a rate loss with respect to RX|Y (D) except the points (H(pz ), 0) and (0, pz ), and there is no rate loss for these two points. ∗ RX|Y (D) = RX|Y (D) = ( QN ln (Q+N )D , 0 < D < 0, D≥ 1 2 QN Q+N , QN Q+N , Figure 2.7: Wyner-Ziv Setup for Gaussian case. - 44 - (2.11) 2.3. Related Works because of the fact that the term I(X; Z|X̂, Y ) = 0 in the right hand-side of I(X; Z)− I(Y ; Z) = I(X; X̂|Y ) − I(X; Z|X̂, Y ) in Equation-2.6 for the gaussian case. 2.3 Related Works In this section, we give the existing practical code designs for the Slepian-Wolf coding problem. Starting from Wyner’s proposition based on using parity check codes, we will state the state of art techniques based on convolutional codes, turbo codes, puncturing turbo codes and finally LDPC codes. Furthermore, we will state out the Wyner-Ziv lossy compression design as a problem of quantization following with a Slepian-Wolf coding, and mention the existing practical code designs. 2.3.1 Code Design for Slepian-Wolf Coding Slepian and Wolf (1973) have proposed a coding scheme based on random binning in their proofs. However, because of its non-constructive nature, it is not applicable in practical code design. Wyner (1974) first proposed a coding scheme based on good parity-check codes for the {0001} setup of the Slepian-Wolf Coding problem. The idea is to partition the codeword space into cosets using ”good” (in the sense that the codewords in the same coset are as far as possible) parity-check code H, and transmit only the coset index s to the receiver. Then receiver can have an estimate of the source by accessing its coset index s and the correlated input Y . Hence the receiver tries to estimate X̂ by assuming that Y is the noisy observation of X and try to eliminate the noise of Y by using the parity check information of X sent by the encoder. The two n-bit length binary source vectors X and Y can be modeled as Y = X ⊕ U where ⊕ is the modulo 2 operation, U is a n-length binary string with bernoulli distribution p1 . Assume that an n − k × n parity check matrix H partitions the n dimensional vector space into 2k disjoint subspaces. The code vectors of X must satisfy H · Xt = 0. Decoding is done by calculating the syndrome of Y , s = H · Yt = H · Ut . Then using a decoding function f (s), decoder finds the error sequence and estimate X̂ = Y ⊕ f (s). The probability of error P r{X 6= X̂} = P r{f (H · Ut ) 6= U} → 0 for n → ∞. For a practical code design using syndromes, we need to wait till early 2000s. 2.3.1.1 Convolutional Codes Pradhan and Ramchandran (1999) first used a channel coding technique known as DIstributed Source Coding Using Syndromes (DISCUS) for the slepian-wolf problem (setup {0001}). Borrowing from Ungerboeck (1982)’s Trellis Coded Modulation (TCM) method, Pradhan and Ramchandran (1999); Kusuma et al (2001) have proposed a 4-level and various number of state trellis-structured construction - 45 - Chapter 2. Distributed Source Coding based framework. In order to obtain a 2 : 1 compression rate, a 2/3 systematic convolutional code is used and for an n bit input X, the convolutional code outputs n bit X and n/2 bit coset index s, and only s is sent through the channel. The decoder finds the sequence which is closest element to the Y within the received coset index of X (See Figure-2.8). Figure 2.8: 2 : 1 rate DSC compression using a 2/3 convolutional code. Let us give an intuitive example for the binary case. Assume that X and Y are 3 bit long binary strings where the bits of X are drawn i.i.d. with P r{Xi = 0} = P r{Xi = 1} = 0.5. The correlated information Y is drawn such that the hamming distance between X and Y is at most 1. For instance, given X has the value 101 possible sequences of Y are 001, 100, 101 and 111. The entropy and the conditional entropy of X and Y are found as H(X) = 3, H(Y ) = 3, H(X|Y ) = 2. According to the S-W theorem, H(X|Y ) bit per channel use is enough to transmit X without loss. First of all, let us assume the case where Y is accessible both to the encoder and to the decoder. Since the encoder has access to Y , it can code just the error pattern between X and Y and the decoder can successfully decode X using the error pattern between X and Y . The possible sequences of X given Y is four, hence two bit is sufficient to communicate without any loss, which achieves the rate H(X|Y ). However, in Slepian-Wolf setup {0001}, Y is not accessible to the encoder. With a carefully design of a parity check code, X can be still sent with H(X|Y ) = 2 bits. Assume the 2/3 parity check matrix H= 1 1 0 1 0 1 , (2.12) where each row defines a parity check equation over modulo-2 summation of the input bits. Hence the syndrome bits are calculated as c1 = x1 ⊕ x2 and c2 = x1 ⊕ x3 . For the encoding of an X sequence 101 is done by calculating c1 c2 pairs c1 = 1 ⊕ 0 c2 = 1 ⊕ 1, hence 10. The decoder has access to the side information Y and the syndrome index of X, and it decodes the most probable sequence of X̂. For a received Y sequence, the decoder verifies if both of the check equations satisfy, and by changing at most 1 bit of Y , X̂ is estimated. For instance, for Y = 100 and the - 46 - 2.3. Related Works ? syndrome 10, the decoder verifies whether the checks are satisfied: c1 = y1 ⊕ y2 = 1 ? and c2 = y1 ⊕ y3 = 0. Since only the c2 is not satisfied, flipping the value of the third bit of Y is enough to satisfy both of the equations, so X̂ = 101 is estimated without any error. In Pradhan and Ramchandran (2000), a practical code design for the setup {0011} of slepian-wolf problem has been proposed using two convolutional codes for compressing X and Y separately, hence it can operate not only on the symmetric regions like the setup {0001} or {0010}, but also the intermediate rate regions as in the setup {0011}. 2.3.1.2 Turbo Codes Afterward, more powerful channel coding techniques were employed for the coset consctruction. First of all, Turbo code, which is invented by Berrou et al (1993) and improved by Benedetto et al (1998); Tepe and Anderson (1998); Berrou and Glavieux (1996), has been applied to the DSC problem by Garcia-Frias and Zhao (2001). Bajcsy and Mitran (2001a) have used the parallel concatenation of finite state machines using Latin squares proposed in (Bajcsy and Mitran, 2001b). Aaron and Girod (2002) have used two parallel 4/5 rate systematic convolutional code with an interleaver and transmit the parity bits to obtain a 2 : 1 rate compression. After the calculation of the likelihood ratios of the input-bits given the side information Y and the parity bits, the estimation of X̂ is done with an iterative manner by using MAP algorithm (See Figure-2.9). For a 2:1 compression rate, Aaron and Girod (2002) achieves lossless compression with a correlation noise entropy H(p1 ) ≤ 0.381 which corresponds a gap of 0.154 with respect to the S-W limit. Garcia-Frias and Zhao (2002) have employed the puncturing concept in Acikel and Ryan (1997). Several other systems have been proposed using turbo codes (Chou et al, 2003; Liveris et al, 2002b, 2003b). Lajnef (2006) proposed a turbo coding based on puncturing, while he has obtained a 2 : 1 rate compression by using two 2/3 rate parallel systematic convolutional codes with an interleaver. The overall system has n/2 + n/2 parity bits. By using a puncturing matrix, half of the parity bits are dropped and a compression rate 2 : 1 is obtained. By using the iterative SISO decoding, the system achieves a lossless transmission with a correlation noise entropy H(p1 ) ≤ 0.4233 which is 0.0767 far away from the S-W limit. 2.3.1.3 LDPC Codes As described in detail in Chapter-1.10, LDPC code is a powerful error correcting code which is invented by Gallager (1963), reinvented and improved by Mackay and Neal (1997); Richardson et al (2001); Chung et al (2001a). Because of its good distance properties, Liveris et al (2002a) first used the LDPC codes in DSC - 47 - Chapter 2. Distributed Source Coding Figure 2.9: 2 : 1 rate DSC compression code design using two systematic 4/5 convolutional codes with an interleaver and iterative MAP decoding. Blocks π correspond to a pseudo-random interleaver, and the block π −1 is the corresponding deinterleaver. p(x=1|y) For the Log-Likelihood Ratio(LLR) calculations log( p(x=0|y) ), the correlation noise level and the received side information Y is used. An iterative decoding is done using Soft-Input Soft-Output (SISO) decoder. field (See Figure-2.11). By using 2/3 rate irregular LDPC codes with long block lengths like 106 , and a compression rate of 2 : 1, Liveris et al (2002a) achieve lossless transmission with a correlation noise entropy H(p1 ) ≤ 0.466 which is 0.034 far away from the S-W limit, so far the best probability of error rates obtained for a given correlation noise in the literature. Then LDPC is used in Schonberg et al (2002) for coding the general SlepianWolf problem ({0011}) by replacing the convolutional codes of Pradhan and Ramchandran (2000) by LDPC codes. In Varodayan et al (2005, 2006) also proposed a S-W coding scheme based on LDPC Accumulate (LDPCA) and Sum LDPC Accumulate (SLDPCA) codes. 2.3.2 Code Design for Wyner-Ziv Coding Since for the gaussian input case, Wyner-Ziv theorem states out that there is no loss in rate whether the side information is accessible to the encoder or not, the researchers have focused their effort on the design of DSC codes close to the S-W limit. The state of art practical designs assume the wyner ziv problem as the concatenation of a good source code (quantization) which achieves good rate distortion performance, and a S-W coder, which achieves lossless compression with side information(See Figure-2.13). The input X is first quantized by a good source code such - 48 - 2.3. Related Works Figure 2.10: 2 : 1 rate DSC compression code design using two systematic 2/3 rate parallel concatenation convolution codes and 1/2 rate puncturing matrices P . Figure 2.11: 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code. as TCQ (Marcellin and Ficher, 1990), nested lattices as in Zamir and Shamai (1998) or lloyd-max based quantizer as in Rebollo-Monedero and Girod (2005). Then, the quantized stream is coded with a S-W lossless coder which is based on a systematic turbo code (Aaron et al, 2003) or a systematic LDPC code (Liu et al, 2006). In Pradhan and Ramchandran (1999), an 8-level max-lloyd quantization of the input source where the outputs are labelled into 4 subsets D0 , D1 , D2 and D3 (See Figure-2.12), their convolutional code based S-W coder described in Chapter-2.3.1.1 is performed 7 dB away from the 1 bit/sample Wyner-Ziv distortion limits for a Correlation-Signal to Noise Ratio (C-SNR) level of 12 dB between X and Y. Furthermore, the S-W and W-Z problems are extended to three sources in Liveris et al (2003a) and Lajnef et al (2006). Moreover the S-W and W-Z coding paradigms are applied on video coding (Puri - 49 - Chapter 2. Distributed Source Coding Figure 2.12: Eight output points and corresponding partitions for 4 subset. and Ramchandran, 2002; Puri et al, 2006; Girod et al, 2005; Aaron et al, 2003; Westerlaken et al, 2005; Liveris et al, 2002b), on sensor networks (Xiong et al, 2004; Pradhan and Ramchandran, 2000; Kusuma et al, 2001; Pradhan et al, 2002),multiple description coding (Stankovic et al, 2007) and on multiple-camera arrays (Zhu et al, 2003; Gehrigand and Dragotti, 2004). Figure 2.13: Wyner Ziv Coding as a concatenation of a good quantization code and a Slepian-Wolf Coder. 2.4 Practical Code Design In this section, we describe our proposed LDPC based S-W coding scheme in detail. Based on LDPC coding for the syndrome calculations, we used a modified productsum algorithm (or belief propagation) for the decoding. 2.4.1 Input Constraints and Theoretical Correlation Noise Analysis for a Given Rate Let X n = {X1 , X2 , ..Xn } be the sequence of n-length binary string with i.i.d. random variables P r{Xi = 1} = P r{Xi = 0} = 0.5 noncausally available to the encoder. Similarly U n = {U1 , U2 , ..Un } be the sequence of n-length binary string with i.i.d. random variables P r{Ui = 1} = (1 − P r{Ui = 0}) = p1 where 0 ≤ p1 ≤ 0.5 The side information Y n = {Y1 , Y2 , ..Yn } noncausally available to the decoder is modeled as Yi = Xi ⊕ Ui where ⊕ is the modulo-2 sum operation. The entropy of X and Y can be calculated as H(X) = H(Y ) = H(0.5) = 0.5 log2 (2) + 0.5 log2 (2) = 1 bit per channel use. Now the conditional entropy of X given Y is found to be H(X|Y ) = H(Z) = H(p1 ) = p1 log2 (1/p1 ) + (1 − p1 ) log2 (1/(1 − p1 ) bit per channel use. Hence, for instance for a fixed compression rate of X n as 1/2 bit per channel - 50 - 2.4. Practical Code Design use which corresponds to n/2 bits, S-W theorem states that P r{X̂ 6= X} = 0 for the correlation probability p1 such that H(X|Y ) = H(p1 ) ≤ 1/2. In our experiments, we fix the compression rate to 1/2 bit per channel use and find the maximum correlation level p1 for an arbitrarily small probability of error like P r{X̂ 6= X} = 10−5 . 2.4.2 LDPC Code Generation and Coset Index Calculation For these experiments, we generate the LDPC matrices using the degree polynomials found and distributed by the Communications Theory Lab (LTHC) at Ecole Polytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The degree distribution polynomials used in this dissertation can be found in Appendix-B. In order to obtain a 2 : 1 compression rate of X, we use 2/3 rate systematic LDPC codes where for n input bit, n/2 parity check bits are calculated (See Figure2.14). The encoder discards the systematic bits and transmit only the n/2 parity bits. The decoder calculates the likelihood ratios and runs a modified Sum-Product Algorithm as explained in the following section. Figure 2.14: Our proposed 2 : 1 rate DSC compression code design using LDPC codes. 2.4.3 Modified Sum Product Algorithm The classical LDPC decoding is done by belief propagation or known as Sum-Product Algorithm as described in Chapter-1.10. The algorithm is designed for a channel where each transmitted data is exposed to the same channel characteristics. However, in S-W coding using syndromes all the received data do not exposed to the same - 51 - Chapter 2. Distributed Source Coding channel. While there is a correlation noise between X and Y ; the syndrome of X sent by the encoder does not contain any error. Hence we modify the decoding algorithm for S-W starting by the likelihood ratio calculations. For a rate 2/3 systematic LDPC code, n-bit input is coded as a total of 3n/2bit where n bits are the systematic input bits, the rest of n/2 bits are the parity bits z which satisfies the equation 0 = H · [R]t , where 3n/2-length vector R = {z1 , z2 , .., zn/2 , X1 , X2 , .., Xn }. The decoder receives the syndrome vector z and the side information Y = X ⊕U as explained in Chapter-2.4.1. You can see in Figure-2.14 3n/2 variable nodes corresponds to the circles at the left-hand side of the decoder. We group the variable nodes into two: the check bits in blue and the systematic bits in pink. Since the check bits do not exposed to error, we initialized the likelihood ratios of the blue circles as: p(Ri = 1|zi ) = p(Ri = 0|zi ) ∞, if zi = 1 0, if zi = 0 (2.13) for i = 1, 2, .., n/2. The likelihood ratios of the systematic bits are calculated for i = n/2 + 1, n/2 + 2, .., 3n/2 as p(Ri = 1|Yi−n/2 ) = p(Ri = 0|Yi−n/2 ) ( (1−p1 ) p1 , p1 (1−p1 ) , if Yi−n/2 = 1 if Yi−n/2 = 0 (2.14) The next step is the modifications on the definitions given in Chapter-1.10.1.1 by grouping the variable bits as systematic variable bits and the parity variable bits. The set N (m) which signifies the set of variable bits that participates the checksumming of m is divided into two subsets N1 (m) and N2 (m) as the set of systematic bits and parity bits that participate the check m respectively. N (m) = N1 (m) ∪ N2 (m). x is redefined for calculation of the probability of check m satisfied Moreover rmn only for the systematic bit n if the systematic bit n of X is considered fixed at Xn and the other systematic bits qmn′ : n′ ∈ N1 (m)\n. The check node iteration δrmn in Equation-1.39 is modified as P Y R δqmn′ . δrmn = (−1) i∈N2 (m) i n′ ∈N (2.15) 1 (m)\n The variable node iteration equations are the same as mentioned in Equation-1.40 and Equation-1.41, and calculated only for the systematic variable nodes. Similarly, the final guess step is calculated for the systematic variable nodes. 0 and q 0 values by using p(Ri =1|zi ) The decoding starts with the initialization of qmn mn p(Ri =0|zi ) p(R =1|Y ) and p(Rii =0|Yi−n/2 ) calculated as in Equations-2.13 and 2.14. Then the check-node i−n/2 and variable node iterations will be repeated until a valid codeword is found or the maximum number iteration is reached. Finally the decoder calculates - 52 - 2.4. Practical Code Design qn0 = αn p(xn = 0|[z|y]) Y 0 rmn , (2.16) Y 1 rmn ; (2.17) m∈M(n) and qn1 = αn p(xn = 1|[z|y]) m∈M(n) and outputs the estimation x̂ by thresholding the posterior probabilities j . x̂i = arg max qi+n/2 j (2.18) The bit error rate Pe of the system then calculated as: Pn (xi ⊕ x̂i ) Pe = i=1 , (2.19) n P where is defined to be summation over real numbers while ⊕ is the modulo-2 summation. Hence Pe is the ratio between the number of error bits to total number of bits. 2.4.4 Experimental Setup and Performance Analysis In this section, we compare our proposed S-W system with respect to the existing ones. We code the input in several block-lengths varying from 4 × 103 to 1 × 105 . Please note that LDPC decoding process has better performance for higher block lengths, however the decoding complexity of the decoder increases for longer block lengths. So there is a trade off between the performance and the decoding complexity. The length n input binary string X which is drawn from a bernoulli(0.5) distribution is coded with 2/3 systematic LDPC code generated with a degree distribution as in Appendix-B. The noise binary string U which is drawn from a bernoulli(p1 ) is modulo-2 added to the X to create Y = X ⊕ U , where 0 ≤ H(p1 ) ≤ 0.5. Recall from the S-W theorem that X can be compressed with a rate RX ≥ H(X|Y ) = H(p1 ). In our experiments, we fix RX = 0.5 and search the maximum error noise p1 that the decoder can extract X with a low probability of error (Pe (X 6= X̂) → 105 ). Please note that according to the S-W theorem, the theoretical limit of the p1 = 0.11 where it corresponds to have an entropy of H(p1 ) = 0.5. The simulation results can be seen in Figure-2.15. The best performance of convolutional code, turbo code, punctured turbo code and irregular LDPC code published are given as H(p1 ) = 0.35, H(p1 ) = 0.39, H(p1 ) = 0.42 respectively (Aaron and Girod, 2002; Lajnef, 2006; Liveris et al, 2002a). Our length 4000 and length 104 regular LDPC codes perform with a low probability of decoding error at H(p1 ) = 0.36 and H(p1 ) = 0.37 which reside between the best convolutional code and the best turbo code. Our irregular 104 achieves H(p1 ) = 0.42 and has a similar performance with respect to the best punctured turbo code performance in Lajnef et al (2006). Xiong et al (2004) have achieved a better performance for a block length of 105 with a higher complexity of decoding. - 53 - Chapter 2. Distributed Source Coding Figure 2.15: Decoding bit error rate versus entropy rate of the correlation noise power H(p1 ) graph for 2 : 1 rate Slepian Wolf compression comparison. The simulations for LDPC is made for input length 4000 length regular LDPC matrix and 104 length irregular LDPC matrix. The graph also contains the S-W limit, the best performances achieved using convolutional code (Aaron and Girod, 2002), punctured turbo code (Lajnef, 2006), and irregular LDPC with length 105 (Liveris et al, 2002a). 2.5 Practical Application for Still-Image Coding In this section, we propose a compression scheme for still images, by exploiting the theory of Distributed Coding of correlated multi-sources. Two corrupted versions of an image are encoded separately but decoded jointly (Dikici et al, 2006a). Our approach has two main results: i) use of decomposition of low-pass wavelet coefficients for creating the Side Information, and ii) LDPC based coset creation using the quantized version of the original image in pixel domain. In the case of coding for mobile terminals, the proposed codec exploits the channel coding principles in order to have a simple encoder with a low transmission rate and high PSNR. The application of distributed source coding techniques on still images are not trivial, because the image should be divided into two sources X1 , X2 which will be encoded separately. One of the solution of that problem can be sub-sampling the image in two images (Ozonat, 2000). However, we are interested in distributed image compression given that a compressed version of that image is accessible at - 54 - 2.5. Practical Application for Still-Image Coding the decoder (See Figure-2.16). For instance Low-frequency component of the image is accessible to the decoder as a side information and a low-power device wants to improve the quality of this side information by using low-complexity coding techniques. We introduce an efficient distributed coding technique for still images, using low-pass discrete wavelet transformation as the side information and LDPC coding as the mapping of the cosets. In our setup, the low-pass component of the discrete wavelet composition of the image is assumed to be accessible to the decoder as side information X2 . For X1 , a uniform quantized version of the original image is used. Instead of classical source encoding of X1 , after an LDPC coding, the coset index of X1 is sent to the decoder. The decoder finds the value of the syndrome that is closest to X2 . Figure 2.16: Encoder and decoder structure. The source is compressed using LDPC binning, the side information Y available to the decoder is the image reconstructed from low frequency (LL2) wavelet composition, and joint decoding of the two received signal. We will explain the extraction of side information (Section-2.5.1), the coset calculation using quantization and LDPC coding (Section-2.5.1.1), iterative joint decoding (Section-2.5.1) and give our experimental results (Section-2.5.1). 2.5.1 Side Information Side Information is known as the information available to the decoder which is correlated with the original signal X, and it will be used at the decoder in order to estimate X̂ with the help of received coset index. We use the following assumption for the side information: Let X(M, N ) be an M ×N gray level image matrix which has integer pixel values within the range of 0 and 255. The image X is decomposed into its 2-level wavelet coefficients employing the 5/3 tap filter set in Le Gall and Tabatabai (2000). The side information image Y is reconstructed from the synthesis of DWT using only low pass component LL2 and setting the rest of the coefficients to 0. The visualization of the SI that is computed in the encoder can be seen in Figure-2.17. - 55 - Chapter 2. Distributed Source Coding The correlation noise between the original image X, and the side information Y can be modeled as a laplacian distribution f (X, Y ) = α2 e−α|X−Y | where α can be estimated at the encoder using the residual error between the LL wavelet decomposition of the first level and that of second level. We observed that using the estimate of α instead of calculating the real distance values does not significantly degrade the performance of the system. 2.5.1.1 Coset Creation The image X is quantized with n-bit uniform quantizer and the quantized bits are coded with 2/3 rate LDPC coder as explained in Section-2.4.2. After discarding the systematic output bits of the LDPC coder, only the parity bits (coset index) z are sent to the decoder. 2.5.1.2 Iterative Decoding After the assumption of the correlation noise between the side information Y and the quantized signal Xq have a laplacian distribution with variance 2/α2 , by utilizing the appropriate correlation noise variance α between quantized image Xq and the side information Y , the likelihood function p(Xq |Y = y) is calculated at the decoder for initializing the LDPC belief propagation decoding. Then, modified sum-product algorithm is employed as explained in Chapter-2.4.3. 2.5.2 Experimental Results The proposed algorithm is applied on the image ’Lena’. In our experimental setup, we examine the effects of the quantization and calculate the rate distortion operating points. The image Lena is processed as the following steps: • The input image is first linearly quantized at 256, 128, 64, 32 and 16 levels respectively. Please recall that 256 level quantization corresponds to lossless quantization because of the input image pixels are 8 bits depth. Figure 2.17: Construction of the Side Information. The Low-Low wavelet composition of the second level is transmitted only. Decoder reconstructs the side infor• The quantized bits are coded with mation by setting all other coefficients to 2/3 systematic LDPC coder which 0. is generated pseudo-randomly with an appropriate length n. - 56 - 2.6. Conclusion • The decoder has access to the side information Y reconstructed from LL2 DWT coefficients. The effects of the decoding iteration can be shown in Figure-2.18. In this figure, a 130 × 160 pixel subset of the outputs with a compression rate 16 : 5 are given. The leftmost picture is the side information that is reconstructed such that all of the wavelet coefficients except the ones that are LL received by the encoder are set to 0. The decoding of the cosets after first iteration in the center, and the rightmost one is the output of the decision based on the decoding after 5 iterations. The quality improvement on the edges such as the face, shoulder and hair region can be seen. Moreover the PSNR values of these three images are 28.9 dB 34.16 dB and 34.97 dB respectively. Figure 2.18: Left: Side Info at the receiver; Center: First iteration output of the decoded image; Right: decoding output after 5 iterations. 2.6 Conclusion This chapter has proposed a close to limit Slepian-Wolf lossless compression of an input source while a correlated side information is accessible only to the decoder. The parity bits of a systematic 2/3 rate LDPC code is used for achieving a 2 : 1 rate compression of the input. For the binary symmetric case, a correlation noise entropy of H(pz ) = 0.39 and H(pz ) = 0.42 is achieved using regular length 4000 and irregular length 104 LDPC matrix respectively. Since the Slepian-Wolf limits of 2 : 1 compression rate for BSC channel corresponds to a correlation noise entropy of 0.5, the proposed system operates 0.08 bit per channel use away from the theoretical limits. Furthermore this study shows the feasibility of such a multi-source coding scheme for still images in which the Low-pass wavelet coefficients and LDPC binning of the image are encoded separately, and decoded jointly at the receiver. - 57 - Chapter 2. Distributed Source Coding - 58 - Chapter 3 Informed Data Hiding Contents 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.1.1 Types of watermark . . . . . . . . . . . . . . . . . . . . . . 60 3.1.2 Types of attack models . . . . . . . . . . . . . . . . . . . . 61 3.1.3 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2 Theoretical Background . . . . . . . . . . . . . . . . . . . 63 3.3 3.4 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . Proposed Scheme-1: Extension to Cox Miller . . . . . . 3.4.1 3.4.2 66 68 Embedding on Discrete Wavelet Transform Coefficients . . 69 Perceptual Shaping for DWT . . . . . . . . . . . . . . . . . 70 3.4.3 Attack Channel . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 73 3.5 Proposed Scheme-2: Superposition Coding . . . . . . . . 73 3.5.1 3.5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Code Construction . . . . . . . . . . . . . . . . . . . . . . . 74 3.5.2.1 3.5.2.2 3.6 Source Code C0 . . . . . . . . . . . . . . . . . . . . 74 Channel Code C1 . . . . . . . . . . . . . . . . . . . 75 3.5.3 3.5.4 3.5.5 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Gaussian Attack Channel . . . . . . . . . . . . . . . . . . . 76 Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5.6 3.5.7 Details of Joint Iterative Decoding C0 and C1 . . . . . . . . 77 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 78 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . - 59 - 79 Chapter 3. Informed Data Hiding We address the data hiding problem where the host signal is not accessible to the decoder. Exploiting the theorem of Costa such that the non-availability of the host signal to the decoder does not affect the capacity, we proposed two practical code designs for informed data hiding problem. The first one is for embedding low-rate message within DWT coefficients of still images using trellis coded modulation. With the conjunction of perceptual shaping function during the embedding process, robustness of this method for several types of attacks are given. The second code design is for embedding high rate message within continuous signals using a combination of good source code (TCQ) and good channel code (LDPC). After AWGN attack channel, the receiver decodes the hidden message by BCJR and belief propagation decoding with an iterative manner. For 1/2 data embedding rate, the hidden message can be extracted with a low decoding error like Pe ≤ 10−5 , even after an attack channel variance which is 1.5 dB away from the theoretical limits. 3.1 Introduction In this chapter, we compare the existing informed data hiding techniques and propose a high embedding rate informed data hiding method with a blind detection in order to use our complete system which will be explained in Chapter-5. Before passing into the details of the theoretical and implementation issues of informed data hiding, we will define the basic notions of watermarking systems and will explain where our work fits in. Humans have been interested in hiding a information (message) within an innocent host signal (cover) since medieval time (Hartung and Kutter, 1999). This hiding process is named differently depending on the application. For instance, Steganography, origin from a Greek word means “covert communication”, stands for a point-to-point secret communication that is not known by the third parties. Hence the secret information need not to be robust after the manipulations. Watermarking, on the other hand, must satisfy the desideratum of robustness to the malicious attacks. Even the third parties know the existence of the mark, it is hard to remove the hidden message. To meet the robustness issue, the information embedding rate in watermarking is much less than that of steganography. Data Hiding or Data Embedding resides between the steganography and watermarking, and the third parties know that there exists a message that is embedded to the signal, but there is no need to protect it. The idea is to embed a complementary information to the host data. 3.1.1 Types of watermark The watermarking process can be grouped as robust or fragile. In robust watermarking, the mark can be detectable even after a severe processing. The attacker goal is to make the detector unable to detect the mark while keeping a perceptual quality. The example applications can be inserting a mark for detecting illegal use of a copy, - 60 - 3.1. Introduction or to find the distributer of the illegal copy by inserting a special message for each copy which is known as fingerprint. On the other hand, fragile watermarking is used for authentication, that is control of tampering. It can be used for DVD detectors to authenticate the data by a detector, and with a small variations of the signal, the detector has to be failed. The third parties want to change the watermarked data while the detector still can extract the message or creating a valid work for a new data. 3.1.2 Types of attack models Because there exists different types of watermarking applications, the malicious attacks also vary. The overall system has been modeled as the game between the watermarker and the attacker (Moulin and Mihcak, 2004). With the amount of knowledge of the watermarker about his attacker, the watermarker tries to maximize his capacity of the embedding rate while the attacker tries to minimize it. This game-theoretical approach is used as a tool for calculating the capacity for a worst case attack. Below, we give several possible assumptions about the attacker (Craver et al, 1998): • Attacker knows nothing. • Attacker knows the algorithm. This is the most widely used assumption on watermarking. The security depends on the key, and not on the algorithm. This assumption is related with the Kerckhoffs law in cryptography which states that a cryptosystem should be secure even if everything about the system, except the key is publicly available (Kerckhoffs, 1883). • Attacker has access to several watermarked data (Collusion attack). In this model, the access can be different host signals coded with the same mark or the same host coded with different marks (Stone, 1996). • Attacker has access to the detector like a black box (Oracle attack). Several attacks can be applied in order to remove the mark (Gradient descend attack, sensitivity analysis attack. etc.). There exists various types of attacks, mainly classified into four in Hartung et al (1999) as : • Simple Attacks are attacks that adds a noise on whole watermarked data without trying to identify and isolation of the mark. Some examples of this type are: linear or non-linear filtering, compression, addition of noise and quantification. • Synchronization Attacks are attacks that attempt to disable the detection of the mark by geometric distortion, spatio-temporal shifts, zooming, rotation, cropping (Petitcolas, 2000). - 61 - Chapter 3. Informed Data Hiding • Fake Watermark Attacks are attacks that try to confuse the decoder by producing fake original data or fake watermarked data (Holliman and Memon, 2000). • Removal Attacks attempt to analyze the watermarked data, estimate and remove the mark from the host data. Examples are denoising and collusion attacks (Stone, 1996). Softwares such as Stirmark (Kuhn and Petitcolas, 2000) and Checkmark (Pereira, 2001) are publicly available for simulating various kinds of attacks on still images. In this chapter, we solely focused on blind watermarking problem, where the cover data or image S, in which the hidden information M will be embedded, is accessible only to the encoder but not to the decoder (See Figure-3.1). Since the original cover image is not accessible to the receiver, the decoding process is called blind decoding. We analyze this problem in information theoretical way. After introducing the prior work in this field, we propose two coding schemes, the first one is for low embedding rate of data on images. We use Discrete Wavelet Transform coefficients of the host image for robust embedding. Up to 1000 bits can be efficiently embedded into 256 × 256 images with an acceptable perceptual coding. Our second work is a high rate informed data hiding rather then a digital watermarking, because of our assumptions of continuous input and AWGN channel fits better on an IDH system. In this work, we embed the secret data with a rate 1/2 per host sample, and it performs close to theoretical limits on low SNR embedding regimes facing AWGN attacks. Figure 3.1: Channel Coding with State Information Setup. 3.1.3 List of Symbols List of symbols that are used in this chapter can be find below. M M M̂ Discrete Message to be transmitted (Watermark). Alphabet of Watermark. Decoded Watermark. - 62 - 3.2. Theoretical Background Pe S X W Z Y U α D P, Q, N C0 C1 3.2 Probability of decoding error. State information. Stegotexts. Watermarked data. Attack noise. Received signal. Auxiliary variable. A constant for coding with side information. Distortion level. Variances of X, S and Z respectively. Source code. Channel code. Theoretical Background A simple watermark text “Art&Flowers” is inserted to the cover image which can be easily seen by human eye in Figure3.2. A malicious user can easily remove this watermark and can be use for his own purpose without a legal permission. Actually in this example the watermark data is independent of the picture itself, such that some of the inserted watermark resides within white background which can be seen by human eye. Hence the watermarking process needs to satisfy three main constraints: • Insertion strength to guarantee imperceptibility, • Robustness to malicious attacks, • Capacity to accommodate the secret message. Figure 3.2: Watermarked image. Embedding as a function of the host data and of the secret message is referred as informed embedding, because of the participation of the host data in the embedding - 63 - Chapter 3. Informed Data Hiding process (Cox and Miller, 2002). The watermarking problem has been firstly realized as channel coding with side information (See Figure-3.3) by Chen and Wornell (1998). Gel’fand and Pinsker (1980) derived the capacity formula for a class of discrete channels {X , p(y|x, s), Y, S} with a noncausal state S n = {S1 , S2 .., Sn }, Si i.i.d. ∼ p(s). A discrete message M with finite cardinality |M| where each possible values are equally probable is encoded with a deterministic function f : M × S n → X n where it satisfies a distortion measure E{d(f (M, S n ), 0)} ≤ P , then transmitted through the channel with conditional probability function p(y|x, s). The decoding function g : Y n → M̂ estimates M̂ . Then the average probability of error is |M| Pe = 1 X P r {g(Y n ) 6= k|M = k} → 0 as n → ∞. |M| (3.1) k=1 The greatest integer 2nC less then or equal to |M| can be sent per channel use. Then the supremum of the rates C is defined to be the capacity of the channel and can be calculated as: C = max [I(U ; Y ) − I(U ; S)], p(x,u|s) (3.2) where U is the auxiliary random variable with finite cardinality. The maximization is over p(x, u|s). The Gel’fand Pinsker setup has been extended to the continuous alphabet gaussian channel in Costa (1983). According to Costa’s setup (as seen in Figure-3.3(a)), a message M drawn from discrete finite set M is sent through a channel with a limited power signal X : 1/nE{(X)2 } ≤ P , and the channel output is modeled as Y = X + S + Z where S is an interference signal known by the encoder drawn from ∼ N (0, Q) and Z is a noise component drawn from ∼ N (0, Z). The aim is to find the theoretical upper bounds of the quantity of the secret information M that can be transmitted through this channel with probability of decoding error P (M 6= M̂ ) → 0. Surprisingly, Costa shows that the capacity of this channel is independent of the interference signal S, and equals to 1 P R = ln 1 + , (3.3) 2 N which also equals to the rate in the case where S is accessible to both the encoder and the decoder. The key point on achieving this rate without the accessibility of S to the decoder is to use an auxiliary random variable U such that U = X + αS where α is a constant α = P/(P + N ). For sending the index M , the encoder searches within possible U s for that message M such that the difference between U and scaled interference αS satisfies the power constraint 1/n(U − αS)2 ≤ P . Then it sends X = U − αS on the channel. The channel outputs Y = X + S + Z, and the - 64 - 3.2. Theoretical Background decoder finds the closest U to Y and estimates M̂ as the index that U resides in. A more detailed derivation of the capacity of Costa can be found in Chapter-4. (a) Costa’s writing on dirty paper setup. (b) Costa’s setup applying to watermarking problem. Figure 3.3: Costa setup. Cox et al (1999) have realized as in Figure-3.3(b) that if the channel state S is assumed to be the host signal of the watermark, the work is defined to be S + X, and Z is defined to be the attack noise; then this blind watermarking problem can be modeled as Costa’s “writing on dirty paper”, which means that even if the original host data is not accessible to the decoder, there is no loss on the capacity of the channel. Costa’s work has been extended to arbitrarily distributed interference by (Cohen and Lapidoth, 2002; Erez et al, 2005). The theoretical limits of watermark systems have been studied in Moulin and O’Sullivan (2003); Chen and Wornell (1999) with taking the notion of privacy of the watermark with a key into account. - 65 - Chapter 3. Informed Data Hiding 3.3 Prior Work Costa’s research provides a theoretical solution using random binning argument but this solution could not be implemented practically because of the complexity issue. Quantization Index Modulation (QIM) proposed by Chen and Wornell (1998) use lattice codes, where the message to be embedded divides the lattice into sublattices, and given the host signal, the aim is to quantize it using the proper sublattice. They improved QIM using the Costa’s approach and named it Distortion Constrained QIM (DC-QIM) (Chen and Wornell, 2001). This system had been a superior performance comparing to the spread-spectrum techniques, however the drawback of this system is when the embedding rate is high, it is hard to efficiently sub-divide the quantization lattice. Chou et al (2000) have applied error correcting codes (ECC) for the coding concept. They used the Distributed Coding Concept explained in Chapter-2 and the duality between DSC and IDH (Cover and Chiang, 2002; Pradhan et al, 2003). A trellis based convolutional code has been used in order to partition the space. Le Guelvouit (2005) has proposed a system based on Turbo TCQ with the message forces the trellis to pass through certain path. Bastug and Sankur (2004) have proposed LDPC codes to improve the quality of payload of the watermark. Afterward, the combination of the good quantizer codes with the good channel codes has been proposed by several researchers. Eggers et al (2003) have proposed a system called “Scalar Costa Scheme”, which is similar to QIM but differs from it by taking the watermark noise ratio (WNR) into account. At the encoding process, the Costa’s α = P/(P + N ) has been employed for a better performance while QIM supposed an infinite length coding hence fixing α = 1. Miller et al (2004) have developed an informed coding by guaranteeing a robustness level. A modified trellis path is utilized in order to find the best embedding noise correlated with the host signal. The coding process can be explained briefly in four steps. • Choice of the embedding region of the host signal: Discrete Cosine Transform (DCT) coefficients of the host image is calculated for each 8×8 blocks. Discarding the DC coefficient, the first twelve AC coefficients are selected as seen in Figure-3.4(a). • Informed Coding: A trellis with a length equal to the number of bits to be sent is created, and depending on the message bits M , all the arcs except the corresponding message M is deleted from the trellis. Using the selected DCT coefficients and a pseudorandom key selected commonly by the encoder decoder pair, the signal which is mostly correlated with the host image S is found. - 66 - 3.3. Prior Work (a) 12 selected DCT coefficients to modify for embedding process. (b) Geometric interpretation of the embedding process. The aim is to move the host image S to the closest point within the target region that corresponds to the index M to be sent. Figure 3.4: Informed embedding of Miller et al. on DCT coefficients of still images. • Embedding with Perceptual Shaping: In the embedding process, the cover image S is needed to be modified such that the decoder can decode the correct embedded bits with high probability. The process can be interpreted geometrically as in Figure-3.4(b). In this Voronori Diagram, the space is divided into five different regions, each corresponding to a message index. The message M to be sent is called as g (good index) while the other regions as b index. Assume that the host signal S resides in the region b1 , and embedding process modifies this image such that it falls into the good region which satisfies perceptual quality and a robustness. Watson’s metric is used for the modification of the DCT coefficients (Watson, 1993), while the work image W must be decoded correctly under a fix distortion level. Nested lattice code has been proposed by Zamir et al (2002) where there exists two codes, one source code Λ1 and one channel code Λ2 such that the codewords of Λ2 can be subset of the codes of Λ1 : Λ1 ⊃ Λ2 . However it is hard to generate nested lattice codes with both good distance properties. Then Bennatan et al (2006) proposed a coding method using Superposition of good source code C0 with a good source C1 . Exploiting the duality between Multiple Access Channel (MAC) and Writing on Dirty paper, they obtained 1.2 dB away performance for 1/4 embedding rate using joint TCQ and LDPC coding. - 67 - Chapter 3. Informed Data Hiding 3.4 Proposed Scheme-1: Extension to Cox Miller The algorithm of Miller et al (2004) suffers from the block visual artifacts because of the modification of DCT coefficients. Even after a perceptual shaping using the Watson’s algorithm (Watson, 1993), the effect of embedding can be detectable easily. In this section, we propose an informed embedding and coding technique similar to Miller et al (2004), but we employed Discrete Wavelet Transform (DWT) for embedding process in order to minimize the block effects. Furthermore, a perceptual shaping based on DWT coefficients are applied for adjusting the embedding strength depending on the sensibility of the human visual system to the altering of the DWT coefficients. Figure 3.5: Proposed informed embedding setup on DWT coefficients of still images. The block diagram of the proposed system can be seen in Figure-3.4. After the extraction of the DWT coefficients, the selected ones pass through the informed coder and embedder. The informed coder finds the most correlated signal on the modified trellis where the trellis path is fixed by the message bits M . Then, the embedder modifies the host image in the direction of the correlated signal and the output signal can be decoded correctly with a robustness measure. The embedding is done by counting the perceptual effects of each coefficient into account. Detailed explanation of the blocks can be found in the following Subsections. - 68 - 3.4. Proposed Scheme-1: Extension to Cox Miller 3.4.1 Embedding on Discrete Wavelet Transform Coefficients JPEG-2000 fixed two types of wavelet types in their standards, a reversible 5-3 tap Le Gall filter and irreversible 9-7 tap Cohen-Daubechies-Fauvaue filter (Marcellin et al, 2000). Since the first one has perfectly reconstructible, we used Le Gall filter in our experiments where the low pass and high pass z transform is given as H0 (z) and H1 (z) respectively (Le Gall and Tabatabai, 2000): 1 H0 (z) = z(1 + z −1 )2 (−z − z −1 + 4), 8 1 H1 (z) = z(1 + z −1 )2 . 2 (3.4) (3.5) The analysis and the synthesis steps for a 2-D image based on a 1-D Le Gall filter can be explained as follows. The composition of wavelet coefficients are based on levels, and each level there exist four frequency components, Low-Low (LL), Low-High (LH), High-Low (HL) and High-High (HH). For each level, these four components can be calculated by using down-sampling and applying the analysis filters in H0 and H1 in horizontal and vertical directions (See Figure-3.6(a)). The LL component can be used then to calculate the next higher level. (a) Analysis. (b) Synthesis. Figure 3.6: Analysis and Synthesis steps of Le Gall DWT. The reconstruction of the image from the DWT components can be done using the synthesis filters given as g0 (n) = (−1)n h1 [n], n g1 (n) = (−1) h0 [n]. (3.6) (3.7) Similar to the analysis process, each component is up-sampled following with the application of the synthesis filters g0 and g1 both in vertical and horizontal directions See Figure-3.6(b)). Two level DWT coefficients of the Lena image can be visualized as in Figure-3.7. - 69 - Chapter 3. Informed Data Hiding In our work, we choose LH2, HL2 and HH2 components of the DWT coefficients for the embedding process. Thee reasons is that, as it has been shown, maximum robustness is attained when watermarks are embedded into wellpopulated bands. Since the first level coefficients contain lots of zeros, we selected all of the second level coefficients except LL2, the low pass one. Moreover, for an objective comparison with Miller’s work with DCT, we created the same length of trellis, hence the same amount of coefficients Figure 3.7: Wavelet composition of Lena must participate in the embedding pro- image. cess. For obtaining a ration 12/64, the amount of DCT coefficients over the total number of coefficients used in Miller et al. ’s work, the combination of LH2, HL2 and HH2 levels has the same ratio of 3/16. In a first experiment, we use the same informed encoding and embedding process without any perceptual shaping, which means that the embedding strength on all three level coefficients are same. As seen in Figure-3.8(a), even without any perceptual shaping, our embedding algorithm achieves a PSNR value of 39 dB for the image Lena. Compared with the DCT domain embedding, instead of the block artifacts in DCT, the embedding noise is distributed overall of the image. The difference between the host image and the coded image in wavelet domain can be seen in Figure-3.8(b). Please note that the components LH2, HL2 and HH2 are modified equally. 3.4.2 Perceptual Shaping for DWT Watson (1993) has proposed a contrast masking method for perceptual quality shaping based on DCT coefficients. Using a weight matrix T which contains the weight of each DCT coefficient, and the local features of the image (Low-pass component) a metric defining the effect of each DCT coefficient can be calculated. This metric is used for determining the perceptual shaping weights in the Miller et al. ’s method. Moreover, the visual impact of the DWT components have been studied in Watson et al (1997); Levický and Foriš (2004). The weights of LH and HL are same because of the calculation of these two components includes a low-pass and a high-pass filter. However the third component HH has been proved to be less sensitive to the perturbations. After our subjective tests, we defined fix weighting ratios matrix T for each components, 2/7 for LH, 2/7 for HL and 2/7 for HH. Then as in DCT case, a metric for each DWT coefficient is calculated for determining the embedding power for a better perceptual output. - 70 - 3.4. Proposed Scheme-1: Extension to Cox Miller (a) Coded. PSNR value of 39.0005 dB. (b) DWT coefficient differences. MSE: 6.157; dmin, dmax: −23, 26 Figure 3.8: 100 bit message M is inserted into Lena image using LH2, HL2 and HH2 DWT coefficients. No perceptual shaping is applied. You can see in Figure-3.9(a), the embedding the same amount of bits as in Figure3.8, but with the perceptual shaping described above. Because of the perceptual shaping, the modifications are concentrated at the contours of the image (See Figure3.9(b)). Furthermore, the insertion into the HH2 component is 1/3 more than that (a) Coded. PSNR value of 38.8 dB. (b) DWT coefficient differences. MSE: 6.21; dmin, dmax: −67, 75 Figure 3.9: The same 100 bit message M is inserted into Lena image using perceptual shaping. into LH2 and HL2 components. Comparing with the embedding without perceptual shaping in Figure-3.8, a similar PSNR value is achieved in Figure-3.9 with errors concentrated at the less-sensitive DWT coefficients. - 71 - Chapter 3. Informed Data Hiding Another visual example of the effect of perceptual shaping can be seen in Figure3.10. The image asia is code comparison in Figure-3.10(a). (a) Coded with DWT embedding without any perceptual shaping. PSNR value of 39.4 dB. (b) DWT coefficient differents of (a) : 2.95; dmin, dmax: −16, 19 (c) Coded with DWT embedding using perceptual shaping. PSNR value of 40.237 dB. (d) DWT coefficient differents of (b) : 6.157; dmin, dmax: −23, 26. Figure 3.10: Comparison of 40 bit length M embedding process into asia image with and without perceptual shaping. 3.4.3 Attack Channel For the attack channel, we simulate various attacks from linear filtering to compression attacks using Stirmark (Petitcolas, 2000). Since the proposed embedding method depends on the trellis length, hence on the image dimension, the attacks that modify the image dimensions can easily de-synchronizes the system. Hence we do not apply such types of attacks as cropping, geometric-distortion, affine transform and rotation. The lists of attacks that we apply on the watermarked images: JPEG compression, convolution filtering, median filtering, additive noise, PSNR (all pixel values increased by the same quantity), rotation and scale, small random distortions, - 72 - 3.5. Proposed Scheme-2: Superposition Coding and auto-correlation. 3.4.4 Simulation Results With the combination of coding on selected DWT coefficients and the embedding with the perceptual shaping, we obtain a superior image quality with respect to Miller et al. ’s work while preserving the same amount of robustness. For instance Figure3.14 in page 80 compares the embedding outputs of 40 bit length message M into Cameraman image by using Miller’s algorithm (Figure-3.14(a)) and our algorithm (Figure-3.14(b)). Table-3.2 in page 80 shows the performance of the perceptually shaped asia image face to several attacks. The right column indicates the maximum attack level that the embedded message survives still at the decoding. Several attacked images that the embedded message can be still decoded correctly can be found in Figure-3.15 at the end of this chapter (page 81). 3.5 Proposed Scheme-2: Superposition Coding The Miller et al (2004) algorithm works quite well for certain insertion rate such as thousand bit per 256 × 256 image, however it could not embed for higher rates because of the insufficient coefficients to fill out the trellis. For this reason, for high-rate embedding systems such as 1 bit per 2 coefficients of the cover signal, we developed a similar system as in Bennatan et al (2006). The coding is done by superposition of a good channel code C1 and a good source code C0 . The receiver makes an iterative decoding between the channel code estimation and the source code estimation. We use LDPC coder as the channel code and the TCQ as the source code. 3.5.1 Definition Assume a source code C0 quantizes a continuous input source vector x P = xn1 i.i.d. having values in the range [−A,A] with a mean square distortion 1/n ni=1 x2i ≤ P . Moreover length-n channel code C1 can be constructed according to zero-mean distribution with a variance Q (the value of Q is determined with a function of P and attack noise power N given in Section-3.5.2) where Q < P . The superposition code is defined as C = C0 + C1 , where addition being the standard addition over the real-number field. C corresponds to the auxiliary variable U of Costa. The aim is to find the vector c, that is closest to the scaled host signal αs. - 73 - Chapter 3. Informed Data Hiding (a) Code C0 for time instant t. (b) Pulse Amplitude Modulation of code C1 for time instant t. (c) Code C0 + C1 for time instant t. Figure 3.11: Superposition of 2 codes. 3.5.2 Code Construction Code constructions close to theoretical limits are proposed for C0 and C1 . Here are the detailed explanations of the two codes. 3.5.2.1 Source Code C0 C0 is designed to meet the fidelity P criterion between the host signal s and the watermarked signal w such that 1/n ni=1 (si − wi )2 ≤ P . We select the quantization code C0 as a Trellis Coded Quantization (TCQ) which has a 1/2 convolutional code feedback polynomial (671, 1631) in octal digit (Please refer to Section-1.9 for more information on TCQ). For an input in the range [−A,A], 6-level PAM output signals [−5A/4, −3A/4, −A/4, A/4, 3A/4, 5A/4] are used by labeling them with the 4-level output of the convolutional code as [D3 , D0 , D1 , D2 , D3 , D0 ] (See Figure-3.11(a)). The reason of distributing the 6 PAM output signals not between [−A,A] is the fact that for the side-points of the input there exists only one choice for the trellis, which leads to a performance loss (Marcellin and Ficher, 1990). Forney and Ungerboeck (1998) have proposed several techniques including replication of the output signal levels. According to our simulation results, our source code C0 can quantize an input x uniformly distributed in the range [−1,1] to QC0 (x) with a mean-distortion of P = 0.062 where QC0 (x) is the reconstruction of the quantized vector x. The rate distortion limit is 0.0585, which can be calculated as R(D) = H(X) − H(D) ∼ log2 2A − 1/2 log2 (2πeD) (3.8) for R = 1. Hence the C0 can able to quantize the input source with a gap of 0.19 dB from the theoretical limits. - 74 - 3.5. Proposed Scheme-2: Superposition Coding 3.5.2.2 Channel Code C1 C1 is designed to spread the secret message M to a codeword such that 1 bit of the codeword is embedded into one sample of the host signal. Since we want to achieve a 1/2 embedding rate, we design an irregular LDPC code with rate 1/2 (Please refer to Section-1.10 for more information on LDPC). The input of the LDPC code is n/2 bit-long message M and the output is √ the√n bit codeword. The codeword is two level PAM modulated with strength − Q, Q depending on the codeword bit value is 0 or 1 (See Figure-3.11(b)). Exploiting the duality between MAC channel and dirty paper coding, an optimum Q value can be calculated by Boutros and Caire (2002) as: Q = αP, (3.9) where P is the C0 ’s quantization MSE distortion level, α corresponds to Costa’s P α = P +N and N is the noise variance of the attack channel. For the LDPC coding, we generate the LDPC matrices using the degree polynomials found and distributed by the Communications Theory Lab (LTHC) at Ecole Polytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The irregular 1/2 rate degree distribution polynomial used in this section can be found in Appendix-B. It achieves a performance of a 0.11 dB away from the Shannon limit. In order to visualize the superposition scheme for a time instant t, the possible combinations of c0,t + c1,t can be seen in Figure-3.11(c). 3.5.3 Encoder Given the n/2 bit message M and the n sample host s = {s1 , s2 , .., sn }, the encoder searches the vector c = c0 + c1 that is closest to the scaled host vector αs where α is a scaling constant equal to α = P/(P + N ). The encoding process can be seen in Figure-3.12. The encoder starts with the computation of c1 . 1/2 rate LDPC coding of the n/2 bit message M outputs n bit codeword k composing from 0’s and 1’s. Then length-n vector c1 is found by 2-level PAM with equation √ −√Q, if ki = 0 c1,i = , (3.10) + Q, if ki = 1 where Q is a constant scalar Q = αP . Hence the variance of the c1 vector is equal to Q. The second step is to search the n-length c0 vector such that c0 + c1 is closest to the αs. Since our TCQ coder can quantize a vector with a variance P , αs − c1 vector is given as input to a vector quantizer. The viterbi algorithm searches the all possible paths on the trellis to find the minimum-error sequence. The output vector of the quantizer is assigned to c0 as c0 = QC0 (αs − c1 ). - 75 - (3.11) Chapter 3. Informed Data Hiding Figure 3.12: Embedding process of the message M into the work s using superposition coding. An LDPC coding of M to find the channel code c1 is followed by TCQ coding of αs−c1 to find the source code c0 . The watermarked signal c0 +c1 +(1−α)s is sent through the attack channel. The superposition code c is then assigned to c = c0 + c1 = QC0 (αs − c1 ) + c1 . (3.12) Since the quantification code QC0 assures a quantization error limited by P , the encoder can find the embedding noise as: x = c0 + c1 − αs. (3.13) The watermarked signal w is then w = s + x. 3.5.4 Gaussian Attack Channel Stego signal w is subjected to additive channel noise Z which is i.i.d. N (0, N ). Hence the attack channel outputs y = w + z = x + s + z. 3.5.5 (3.14) Decoder The decoder searches the ĉ0 and ĉ1 pair such that the conditional probability P (y|(ĉ0 + ĉ1 )) (3.15) is maximized. Since the encoding is done by computing c1 followed by the search of c0 , the decoding iteration first starts with the estimation of ĉ0 and terminates - 76 - 3.5. Proposed Scheme-2: Superposition Coding with the estimation of ĉ1 . The main steps of the decoding process can be seen in Figure-3.13. The receiver computes ŷ = αy = αs + αx + αz = c0 + c1 − (1 − α)x + αz = c0 + c1 + ẑ (3.16) (3.17) where the Equation-3.16 follows from the Equation-3.13 and the effective noise ẑ is defined to be ẑ = −(1 − α)x + αz, gaussian distributed with mean 0, and variance σẑ2 as σẑ2 = (1 − α)2 P + α2 N = αN, (3.18) because of the fact that α = P/(P + N ). The decoding is done between a BCJR decoder and a LDPC belief propagation decoder in which both outputs soft decision probabilities of ĉ0 and ĉ1 respectively. The decoding is done with an iterative manner and the final guess is made from P (ĉ0 ) after a certain iteration or a codeword k is found. Figure 3.13: Superposition watermarking extraction by BCJR and LDPC decoding iterations. 3.5.6 Details of Joint Iterative Decoding C0 and C1 The joint iterative decoding can be described in three steps; the two update rules of plain-likelihood calculations, BCJR iteration and the LDPC iteration where the detail of each step can be found below. • Update rules of plain likelihood calculations: - 77 - Chapter 3. Informed Data Hiding Plain-likelihood is the ratio between the probability of possible outcomes given the observations. The likelihood calculations are done before starting each BCJR or LDPC iterations in order to initiate the cost function of every path in the trellis or the LDPC bipartite graph. There exist two likelihood calculations. The first one is the n by 4 matrix v from direction Y to BCJR decoder, and the second one is the n by 2 matrix r from direction Y to LDPC decoder. The i’th elements of t’th row of v, vti corresponds to the likelihood of ĉ0,t = Di given yt and r where Di is the i’th the output level of the TCQ coder closest to the channel output yt . Each element of v can be calculated as vti = √ P2 rtb · fσẑ (ŷt − Di + (−1)b Q)) P4 b=1 P2 √ b Q)) i=1 b=1 rtb · fσẑ (ŷt − Di + (−1) (3.19) for t = 1, 2, .., n and i = 1, 2, 3, 4, where fσẑ is the probability density function of a Gaussian r.v. N (0, αN ), and rt1 , rt2 are the messages coming from the LDPC node iteration which signifies likelihood of t’th element of ĉ1 is 0 or 1. At the beginning of the decoding, the LDPC decoder sends rt0 = rt1 = 1/2 which means there is no prior knowledge on the c1,t . The b’th elements of t’th row of r, rtb corresponds to the likelihood of ĉ1,t = (b − 1) given yt and v where Di is the i’th the output level of the TCQ coder closest to the channel output yt . Each element of r can be calculated as √ · fσẑ (ŷt − Di + (−1)b Q) rtb = P4 P4 √ √ i=1 vti · fσẑ (ŷt − Di − Q) + i=1 vti · fσẑ (ŷt − Di + Q) P4 i=1 vti (3.20) for t = 1, 2, .., n and b = 1, 2. Similarly fσẑ is the probability density function of a Gaussian r.v. N (0, αN ), and Di is the i’th level output of the TCQ coder. • Iteration BCJR: The branch metrics of the trellis are initialized by the received vectors r for each sample. Then a BCJR iteration is done as explained in Section-1.9.2, and the BJCR outputs the probability P (c0 |ŷ, r), which will be mapped to the message matrix v. • Iteration LDPC: The variable node likelihoods v are calculated as explained in the previous item. Then 10 LDPC iteration is executed between the variable nodes and the check nodes as explained in the Section-1.10. The LDPC decoder outputs the likelihood probability P (c1 |ŷ, v), which will be mapped to the message vector r. 3.5.7 Simulation Results In our simulations we embed 105 bit M within a host signal S with length 2 · 105 i.i.d. uniformly distributed in the range [−1/α,1/α]. Since our embedding method - 78 - 3.6. Conclusion can achieve an MSE performance of P = 0.062, the theoretical maximum AWGN variance can be calculated from Equation-3.3 with the values R = 1/2 and P = 0.062. Hence in theory, maximum attack variance is found to be N = 0.062. In our experiments, starting from N = 0.062 and decreasing the N by small amount, we search the maximum AWGN variance N , that the probability of message error is low enough (Pe ≤ 10−5 ). For each N value, we created 20 random host signal and embed a random message M with appropriate α = P/(P + N ) value. After a maximum 100 decoding iteration, the error rate in the decoding M̂ is calculated. We achieve a decoding error rate of 3 · 10−6 for N = 0.0439 where there is a 0.062 10 log10 = 1.5 dB (3.21) 0.0439 gap from the theoretical setup. 3.6 Conclusion In this chapter, we have proposed two informed watermarking practical code design, one for low rate data-embedding on DWT coefficients of the still images, and the other for high rate data-embedding using the superposition of a good source code (TCQ) and good channel code(LDPC) under AWGN attack channel. In the low embedding-rate code design, up to 1000 bit message M is embedded into LH2, HL2 and HH2 components of DWT transform coefficients using a trellis where the valid trellis path is driven my the message M . Based on Watson perceptual metric, the sensibility of each DWT coefficient of the host image is calculated and the embedding process is done by taking this sensibility metric into account. For the high embedding-rate code design, we use continuous alphabet synthetic state information and embed the message M with a rate of 1/2 bit per channel use. Embedding is done by the superposition of a good source code based on TCQ and a good channel code based on LDPC codes. By using a iterative decoding algorithm of BCJR for the source code and belief propagation for the channel code, up to an AWGN attack noise that corresponds to 1.5 dB away from the theoretical embedding limits. This high rate embedding system can be used in conjunction with a compression system like in Chapter-2 to build a joint embedding and compression system. - 79 - Chapter 3. Informed Data Hiding (a) Coded with Miller et al. . PSNR value of 31.5 dB. (b) Coded with proposed method and perceptual shaping. PSNR value = 32.2 dB. Figure 3.14: Embedding 40 bit length payload to Cameraman image. Table 3.2: Robustness test of the proposed algorithm for the image “asia.pgm”. 40 bit message is embedded into asia image with DWT perceptual shaping. For each attack listed below, the corresponding maximum attack that the secret message M can be decoded without any error. Stirmark 4.0 Attack Type JPEG compression Convolution filtering Median filtering Additive noise Rotation and scale Auto-correlation PSNR asia image Maximum level Quality factor of 12% gaussian filter 3×3 3% ±0.25◦ 3 by 100 - 80 - 3.6. Conclusion (a) Convolution1 (gaussian). (b) JPEG QF=12%. (c) Median 3 × 3. (d) Noise 3%. Figure 3.15: Maximum level of attacked images that the secret message can be still decoded perfectly. - 81 - Chapter 3. Informed Data Hiding - 82 - Chapter 4 Dirty Paper Coding with Partial State Information Contents 4.1 4.2 4.3 4.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 85 Problem statement . . . . . . . . . . . . . . . . . . . . . . Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . 4.3.0.1 4.3.0.2 Case A . . . . . . . . . . . . . . . . . . . . . . . . 90 Case B . . . . . . . . . . . . . . . . . . . . . . . . 90 4.3.0.3 4.3.0.4 Case C . . . . . . . . . . . . . . . . . . . . . . . . 90 Case D . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.0.5 Case E . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.0.6 Case F . . . . . . . . . . . . . . . . . . . . . . . . 92 Capacity/rate gain / loss analysis . . . . . . . . . . . . . . 92 4.4.1 4.4.2 4.5 85 87 For optimum values of α . . . . . . . . . . . . . . . . . . . . 92 For non optimum values of α . . . . . . . . . . . . . . . . . 93 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . - 83 - 94 Chapter 4. Dirty Paper Coding with Partial State Information A generalization of the problem of dirty paper coding is considered in which (possibly different) noisy versions of the state information, assumed to be i.i.d. Gaussian random variables, are available at the encoder and at the decoder. This chapter derives the maximum achievable rate formula for this general problem. This general setup encompasses the cases where the state information is perfectly known either at the encoder or decoder or at both. Moreover, it generalizes the analysis to cases where the state information is known only partially. In addition, this chapter shows that in realistic situations where the AWGN noise power is not known at the encoder, partial information at the decoder can increase the maximum achievable rate with respect to Costa’s coding setup.1 4.1 Introduction The problem of coding for communication over a channel whose conditional probability distribution is controlled by a random state parameter finds applications in diverse areas ranging from coding for information storage in a memory with defective cells, data hiding, and coding for multiple input multiple output communication. The particular case where the state is known causally at the encoder only (no channel state information at the receiver) has been first considered by Shannon in 1958 (Shannon, 1958). In Gel’fand and Pinsker (1980), Gel’fand and Pinsker consider the channel coding problem with non-causal state information available at the transmitter. In their setup, the transmitter wishes to send a message M ∈ {1, ..., |M|} over a memoryless channel defined by the transition probabilities p(y|x, s), where X and Y are the channel input and output and S is an i.i.d. random variable representing the sequence of states {S1 , . . . SN } of the channel known non-causally at the encoder but unknown at the decoder. The general Gel’fand-Pinsker problem suffers from some capacity loss when compared with channel coding with side information available at both encoder and decoder. In Costa (1983), Costa has shown that there is no loss in capacity if the channel state is additive white Gaussian interference ("dirt"). The design of codes for approaching Costa’s capacity is known as the dirty paper coding problem. The capacity loss is derived in Zaidi and Duhamel (2005) for additive white Gaussian channel state S partially available at the encoder but not to the decoder. The capacity for information storage in a memory where the channel state is perfectly available at the decoder but not to the encoder is derived in Heegard and Gamal (1983). The authors in Moulin and Wang (2007) consider a generalized Gel’fand-Pinsker coding problem and derive capacity formulas, as well as random coding and sphere packing exponents. In this chapter, we focus on the particular problem of dirty paper coding with correlated partial state information at the encoder and at the decoder. The Gel’fandPinsker coding problem where (possibly different) noisy versions of the channel sequence are available at both sides has actually been first considered in Salehi (1992) 1 This chapter corresponds to a paper that will be soon submitted. - 84 - 4.2. Problem statement for a binary input - binary output channel. The targeted application was information storage in a memory with defecting cells. Here, the problem we focus on can be regarded as a special case of a coding problem with two-sided state information examined in Cover and Chiang (2002). The state information, the channel input and output are assumed to be i.i.d. Gaussian random variables. The maximum achievable rates formulas are derived for this general problem as a function of α by expressing as in Costa (1983) U = X + αS where U is an auxiliary random variable. This will give us the general capacity formula for the cases where there is only a partial or a null side information at the encoder side, while there is a perfect one at the decoder side. The analytic expressions of capacity/maximum achievable rate gains and losses with respect to Costa’s set up are given for six particular cases with optimum and non optimum values of the α parameter. It is shown that in the general case, a capacity gain or loss can be obtained in a realistic situation where the optimum α is not known. 4.1.1 List of Symbols List of symbols that are used in this chapter can be find below. M M M̂ Pe R X S S1 S2 θ, T Z Y U α N Σ P, Q L, N, K 4.2 Discrete Message to be transmitted (Watermark). Alphabet of Watermark. Decoded Watermark. Probability of decoding error. Communication rate. Stegotexts. State information. Partial state information available to encoder. Partial state information available to the decoder. Additive random noise of S1 and S2 . Channel noise. Received signal. Auxiliary variable. A constant for coding with side information. Gaussian distribution. Covariance matrix. Variances of X and S respectively. Variances of θ, T and Z respectively. Problem statement Consider the communication problem shown in Figure 4.1. We use the same notation as Costa (1983) throughout this chapter. An index M ∈ {1, ..., |M|} will be sent to - 85 - Chapter 4. Dirty Paper Coding with Partial State Information the receiver in n uses of the channel, where |M| is the greatest integer smaller than or equal to enR , and R is the rate in nats per transmission. Let S = (S1 , S2 , ..., Sn ) be the sequence of noncausal state of the channel for n transmissions, assumed to be a sequence of independent identically distributed (i.i.d.) N (0, QI) random variables. We consider the cases where this sequence of states is partially known to the encoder S1 = (S1,1 , S1,2 , ..., S1,n ) and to the decoder S2 = (S2,1 , S2,2 , ..., S2,n ) noncausally and expressed as S1 and S2 throughout this chapter. This problem can be cast into a two-sided state information set-up close to the one considered in Cover and Chiang (2002), where S is defined by a pair of independent and identically distributed (i.i.d.) correlated state information (S1 , S2 ) available at the sender and at the receiver respectively. The state information available at the encoder and at the decoder is expressed in terms of the channel state as S1 = S + θ and S2 = S + T , where θ and T are i.i.d. random variables according to N (0, LI) and N (0, KI), and I is the n × n identity matrix. Figure 4.1: Channel Coding with state informations. Based on M and SP 1 , the encoder sends a codeword X, which must satisfy the power constraint (1/n) ni=1 Xi2 ≤ P . The channel output is given by Y = X +S+Z, where the channel noise Z is i.i.d. according to N (0, N I). Upon receipt of Y and S2 , the decoder creates an estimate M̂ (Y, S2 ) of the index M . Under the assumption that the index M is uniformly distributed over {1, .., M }, the probability of error Pe , - 86 - 4.3. Achievable Rate is given by M o n 1 X Pe = P r M̂ (Y, S2 ) 6= k|M = k . M (4.1) C = max [I(U ; Y, S2 ) − I(U ; S1 )], (4.2) k=1 The general formula for the capacity of this set-up in the case of finite alphabets is given by Cover and Chiang (2002): p(x,u|s1 ) where the maximum is over all joint distributions of p(u)p(s1 , s2 , x|u)p(y|x, s1 , s2 ), where U is an auxiliary random variable with finite cardinality. But, in our case the alphabets are continuous and the only general capacity expression that has been stated is Moulin and Wang (2007): C= sup min [I(U ; Y, S2 ) − I(U ; S1 )], p(x,u|s1 ) p(y|x,s) (4.3) So, here we will be interested in the estimation of the maximum achievable rate for particular distributions and constructions, and see that in some cases it can be identified with the capacity. The perfect codes can be created as in Cover and Chiang (2002) using the random binning argument. First, en(I(U ;Y,S2 )−2ǫ) i.i.d. sequences of U are generated according to distribution p(u) and each of them is indexed as U (i) where i ∈ {1, 2, ..., en(I(U ;Y,S2 )−2ǫ) } . Then these sequences are randomly distributed into en(R−4ǫ) bins where R corresponds to the rate of the system. Given the state S1 = S + θ and the message M ∈ {1, ..., |M|}, the encoder searches the codeword U (i) within the bin indexed by M such that the pair (U (i), S1 ) is jointly typical. Then it sends the corresponding X which is jointly typical with (U (i); S1 ). During the transmission, the signal is exposed to the additive interference S and Z. The receiver receives Y = X + S + Z from the channel and observes the noncausal state information S2 = S + T . The decoder searches for the sequence U (i) such that (U (i), Y, S2 ) is strongly jointly typical and assigns M̂ as the index of the bin containing the sequence U (i). All possible error events go to 0 as n → ∞ Cover and Chiang (2002). 4.3 Achievable Rate We assume that X, S, Z, θ and T are random variables with respective Gaussian distributions N (0, P I), N (0, QI), N (0, N I), N (0, LI), and N (0, KI). Hence, the joint distribution f (X, S, Z, θ, T ) is a multivariate Gaussian ∼ N (0, Σ) where the covariance matrix Σ is: - 87 - Chapter 4. Dirty Paper Coding with Partial State Information Σ= PI 0 0 0 0 0 QI 0 0 0 0 0 NI 0 0 0 0 0 LI 0 0 0 0 0 KI . (4.4) We consider U = X + αS1 = X + αS + αθ where α is a parameter to be determined. The achievable rate is then function of the parameter α and is given by R(α) = I(U ; Y, S2 ) − I(U ; S1 ), where2 1 P ((P + Q + N )(Q + K) − Q2 ) R(α) = ln . 2 P QK(1 − α)2 + N K(P + α2 (Q + L)) + α2 L(P Q + P K + QK + N Q) + P N Q (4.5) Similarly to Costa (1983), the graphs of R(α) versus α are presented in Figure 4.2 where P = Q = N = 1 and for several {L,K} pairs such as {0, 0}, {0, 1}, {1, 0}, {1, 1}, {0, ∞} and {1, ∞}. Maximizing R(α) over α, we get3 P (QK + QL + KL) 1 ∗ , max R(α) = R(α ) = ln 1 + α 2 N (QK + QL + KL) + QLK (4.6) which is obtained for α∗ = P QK/(P QK +QN K +L(P Q+P K +QK +N Q+N K)). Therefore, if the noise powers Q, N, L, K are known at the encoder, we can obtain the maximum achievable rate given in Equation 4.6. Table 4.2: Special Cases of the proposed channel coding setup. CASES General Case Case A Case B Case C Case D Case E Case F Encoder State S1 Partial Perfect Perfect Perfect Partial Partial ∅ Decoder State S2 Partial Perfect Partial ∅ Perfect ∅ Perfect Rate loss for αopt Rloss General 0 0 0 0 Rloss Case E 0 Citation Dikici et al. Section-4.3 Dikici et al. Section-4.3.0.2 Costa (1983) Zaidi and Duhamel (2005) Heegard and Gamal (1983) The system can be further analyzed for six particular cases, as listed in Table 4.2. Let us first recall that the capacity in the more favorable case, where there is a perfect P knowledge of S both at the encoder and decoder, is equal to C ∗ = 12 ln 1 + N . Costa showed that this capacity is achievable through Gaussian distributions and 2 3 See Appendix A.1 for the derivation of the achievable rate. See Appendix A.2 for the method of derivation. - 88 - 4.3. Achievable Rate Figure 4.2: P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1}, {1, 0}, {1, 1}, {0, ∞} and {1, ∞}. The rate of transmission R(α) is calculated in nats per unit transmission (Maximum value 0.3466 nats/transmission corresponds to 1 bit/transmission). the construction U = X + αS and that, while we keep a perfect knowledge of S at the encoder, the capacity is still reached even if there is no side information at the decoder side. Hence, this construction is particularly interesting, and our purpose here is to study the maximum achievable rates it reaches in several other cases. We will first consider, as Costa did, a perfect knowledge of S at the encoder side, deriving cases A, B and C to distinguish the different amount of information at the decoder side. Cases A ([perfect,perfect]) and C ([perfect,∅]) are not new, since they correspond to the ones explored by Costa. In both, the maximum achievable rate is equal to C ∗ and then the capacity is reached. The conclusion concerning the capacity for Case B ([perfect,partial]) could be derived from Case C, as it is weaker, but we give here the proper expression of the achievable rate, that was not stated by Costa, and see in Section 4.4 that for non optimal values of α there is some possible gain. - 89 - Chapter 4. Dirty Paper Coding with Partial State Information 4.3.0.1 Case A S1 = S, S2 = S. This corresponds to the encoder-decoder state pair [perfect, perfect] where K → 0 and L → 0. Then the achievable rate is 1 P RCase-A = lim R(α) = ln 1 + (4.7) K→0, L→0 2 N which is independent of α and reaches C ∗ , hence showing that it is in fact the capacity, and that the capacity is achieved by this construction. Hence, there is no need of an auxiliary variable U , and we simply have U = X. The graph of Case A is presented in Figure 4.2 where P = Q = N = 1 and K = L = 0. 4.3.0.2 Case B S1 = S, S2 = S + T . This corresponds to the encoder-decoder state pair [perfect, partial] where L → 0. The achievable rate of the system is given by 1 RCase-B (α) = lim R(α) = ln L→0 2 P (K(P + Q + N ) + Q(P + N )) P QK(1 − α)2 + N K(P + α2 Q) + P N Q . (4.8) ⋄ RCase-B (α) is maximized for α∗ = P/(P + N ) which corresponds to a rate of P 1 ⋄ RCase-B (α ) = 2 ln 1 + N = C . Hence, here also the capacity can be reached by this construction. It is not really surprising, since Costa showed (as we recall in Case C, see below) that the capacity C ∗ can be reached by this construction, when there is a perfect side information at the encoder and even if there is no side information at the decoder. The graph of Case B is presented in Figure 4.2 where P = Q = N = K = 1. 4.3.0.3 Case C S1 = S, S2 = S + T which corresponds to the encoder-decoder state pair [perfect, ∅] where L → 0 and K → ∞. The achievable rate becomes 1 RCase-C (α) = lim R(α) = ln K→∞, L→0 2 P (P + Q + N ) P Q(1 − α)2 + N (P + α2 Q) (4.9) P This rate is maximized for α⋄ = P/(P +N ) then giving RCase C (α⋄ ) = 12 ln 1 + N = ∗ C . As Costa showed, the capacity is then reached. The graph of Costa’s limit can be seen in Figure 4.2 for P = Q = N = 1. - 90 - 4.3. Achievable Rate Now, more interesting cases are the ones where the knowledge at the encoder side is only partial. We will first consider in Case D the case where S is perfectly known at the decoder side, and show that the maximum achievable rate still reaches C ∗ . Then, we will consider in Case E the possibility for the decoder to access no side information at all, and see that there is a loss in terms of maximum achievable rate. At last, we will consider in Case F the case where there is no knowledge at the encoder, but a perfect one at the decoder side, showing that the maximum achievable rate hence reaches C ∗ . 4.3.0.4 Case D S1 = S + θ, S2 = S. The encoder-decoder state pair is [partial, perfect] where K → 0. The achievable rate in this case is 1 P (P + N ) RCase-D = lim R(α) = ln . (4.10) K→0 2 α2 L(P + N ) + P N The rate RCase-D is independent of the state power Q. It is maximized for α∇ = 0 P = C ∗. which corresponds to a maximum achievable rate of RCase-D (α∇ ) = 12 ln 1 + N Actually, if the state is perfectly known to the decoder, but the encoder has a noisy version of the state, the rate is maximized when we consider U = X, and the capacity is still reached with this construction. The graph of RCase-D is given in Figure 4.2 for P = Q = N = L = 1. 4.3.0.5 Case E S1 = S +θ, S2 = S +T . The encoder-decoder state pair is [partial, ∅] where K → ∞. For this setup the rate is P (P + Q + N ) 1 , RCase-E (α) = lim R(α) = ln K→∞ 2 P Q(1 − α)2 + N (P + α2 (Q + L)) + α2 L(P + Q) (4.11) It is maximized for α† = P Q/(P Q + QN + LP + LQ + LN ) which corresponds to a rate of P (Q + L) 1 † . (4.12) RCase-E (α ) = ln 1 + 2 N (Q + L) + QL The graph of Case E can be seen in Figure 4.2 for P = Q = N = L = 1. Please note that there exists a loss in Case E with respect to Case A (RCase-E (α† ) < RCase-A ). Here, we cannot state that RCase-E (α† ) corresponds to a capacity: it is the maximum achievable rate for our construction. Zaidi et. al. (Zaidi and Duhamel, 2005) analyze the capacity loss of a setup similar to the Case E such that the channel state S is not perfectly available to the encoder and is defined by S = S1 + θ where in our case S1 = S + θ. A practical code construction technique for this setup can be found in Zamir et al (2002). - 91 - Chapter 4. Dirty Paper Coding with Partial State Information 4.3.0.6 Case F S1 = ∅, S2 = S. The encoder-decoder state pair is [∅, perfect] where K → 0 and α = 0. For this setup the rate is 1 P RCase-F = lim R(0) = ln 1 + . (4.13) K→0 2 N Since there is no state information available at the encoder, the auxiliary variable U is U = X. Please remark that the capacity is reached, stating its value for this case, and showing that this construction enables to achieve it. 4.4 Capacity/rate gain / loss analysis In this section, we analyze the capacity analysis of Dirty paper codes with partial state information at the encoder and decoder sides given in Equation 4.5, and the special cases of this setup which are given in Section 4.3. Moreover the rate gain/loss between the special cases where the encoder does not have knowledge to the optimum coding parameter α. Since the capacity/maximum achievable rate has a non-negative value, the gain is calculated such that the capacity/maximum achievable rate of a system is defined as max{0, R(α)}. 4.4.1 For optimum values of α If the transmitter uses the optimum value of the α parameter for each setup, there is no capacity gain nor loss for the particular cases A,B,C, D and F. The achievable capacity in that case is given by 1 P ⋄ ⋄ ∇ RCase A = RCase B (α ) = RCase C (α ) = RCase D (α ) = ln 1 + = C ∗. 2 N (4.14) In Case E, the optimum value of α yields a maximum achievable rate loss: P QL 1 † ⋄ . Rloss Case E = RCase-E (α ) − RCosta (α ) = − ln 1 + 2 N ((Q + L)(P + N ) + QL) (4.15) Similarly, for the optimum value of α, the maximum achievable rate loss for the general case is 1 P QLK ∗ ⋄ Rloss general = R(α )−RCosta (α ) = − ln 1 + . 2 N ((P + N )(QK + QL + LK) + QLK) (4.16) - 92 - 4.4. Capacity/rate gain / loss analysis 4.4.2 For non optimum values of α However, in actual systems, the transmitter does not have perfect knowledge of the additive variances N, Q, L, and K, so can not always code with the optimum α parameter. Assuming that the coding is done with a non-optimum α, we analyze the rate gain or loss with respect to Costa’s coding setup [perfect,∅] . For instance, for using the same non-optimal α, there exists a rate gain in Case B [perfect, partial] with respect to Costa’s setup which is given by: Cgain CaseB-C (α) = max{0, RCase-B (α)} − max{0, Rcosta (α)} P Q(1−α)2 +N (P +α2 Q) K(P +Q+N )+Q(P +N ) 1 . ln if Rcosta (α) > 0 2 +N K(P +α2 Q)+P N Q) (P +Q+N ) (P QK(1−α) 2 P (K(P +Q+N )+Q(P +N )) 1 = else if RCase-B (α) > 0 2 ln P QK(1−α)2 +N K(P +α2 Q)+P N Q 0 otherwise (4.17) Let us define the to State Ratio (SSR) and Signal to Noise Ratio (SNR) as Signal P P . SSR = 10 log10 Q and SNR = 10 log10 N The graphs showing the capacity gains between Case B and Costa’s setup for an SNR value ranging between −15 dB and +15 dB for different values of the α parameter and of 10 log(Q/K) (∞, 6 dB, 2.1 dB and −1 dB) in Figure 4.3. We fix P = 1, L = 0, SSR = −6 dB4 . We observe that, given the values of P, Q, K and fixing α, the capacity is maximized for a certain SNR value such that RCase-B (α) = Rcosta (α⋄ ), hence there is 0 capacity gain for that SNR value. However, for other SNR values, there always exists a capacity gain with respect to Rcosta (α). It is also evident that, given fixed P, Q, N values and an estimate of α, decreasing the 10 log(Q/K) value decreases the capacity gain. Voloshynovskiy et al (2004) assumed the statistics of the state modeled as a mixture of Gaussian distributions, to be available at the decoder. When a noisy version of the state is available at the decoder (Case B) with 10 log(Q/K) ∼ = 2 dB (See Figure 4.3(c)), the same rate gain with respect to Costa’s Setup is observed as in Voloshynovskiy et al (2004). For higher values of 10 log(Q/K), a higher capacity gain can be obtained with respect to the only knowledge at the decoder of statistical distributions of the state information. In Case E [partial, ∅], without optimum α at the transmitter, there is a maximum 4 Such low SSR values are relevant for practical application such as watermarking. - 93 - Chapter 4. Dirty Paper Coding with Partial State Information achievable rate loss with respect to Costa’s setup, given by Closs CaseE-C (α) = max{0, RCase-E (α)} − max{0, Rcosta (α)} P Q(1−α)2 +N (P +α2 Q) 1 ln if RCase-E (α) > 0 2 (Q+L))+α2 L(P Q+P K+QK+N Q)+P N Q 2 P QK(1−α)2 +N K(P +α 2 2 P Q(1−α) +N (P +α Q) 1 = else if Rcosta (α) > 0 2 ln P (P +Q+N ) 0 otherwise (4.18) The maximum achievable rate loss versus SNR graphs between Case E and Costa’s setup can be found in Figure 4.4 where P = 1, SSR = −6 dB, 10 log(Q/L) = 2.1 dB and 6 dB, for an SNR value ranging between −15 dB and 15 dB. Finally, without optimum α parameter, there exists a maximum achievable rate gain or loss between the general case [partial,partial] and Costa’s setup [perfect,∅] expressed as a function of P, N, Q, L, K and α. Figure 4.5 shows the maximum achievable rate gain/loss versus SNR graph for an SNR value ranging between −15 dB and 15 dB. Please note that P = L = 1, SSR = −6 dB, 10 log(Q/K) = 2.1 dB (for Figure 4.5(a)) and 10 log(Q/K) = 6 dB (for Figure 4.5(b)). Then, the maximum achievable rate gain/loss is plotted for several alpha values: 0, 0.2, 0.4 and 0.6. 4.5 Conclusion This chapter has analyzed the maximum achievable rate losses and gains for the general setup where the partial state information is available at the encoder and at the decoder under Gaussian interference. In particular, we derived the capacity for the case [partial or ∅,perfect], showing that Costa’s construction enables to reach it; this is not the case of [partial,partial or ∅], for which only a maximum achievable rate has ben stated. We then analyzed the gain/loss in terms of achievable rates if the optimal coding parameter α is not accessible to the encoder. This general setup is relevant for practical applications such as watermarking under desynchronization attacks and point-to-point communication over fading channel where the receiver has an estimate of the channel state. - 94 - 4.5. Conclusion (a) Capacity gain for 10 log(Q/K)= ∞. (c) Capacity gain for 10 log(Q/K)= 2.1 dB. (b) Capacity gain for 10 log(Q/K)= 6 dB. (d) Capacity gain for 10 log(Q/K)= −1 dB. Figure 4.3: Capacity gain (between RCase-B (α) and RCosta (α)) versus SNR, for different α values where P = 1, SSR= −6 dB and various 10 log(Q/K) values, with perfect knowledge of the channel state information at the encoder (L = 0). - 95 - Chapter 4. Dirty Paper Coding with Partial State Information (a) Maximum achievable rate loss for 10 log(Q/L)= 2.1 dB. (b) Maximum achievable rate loss for 10 log(Q/L)= 6 dB. Figure 4.4: Maximum achievable rate loss (between RCase E (α) and RCosta (α)) versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/L) values. - 96 - 4.5. Conclusion (a) Maximum achievable rate gain/loss for 10 log(Q/K)= 2.1 dB. (b) Maximum achievable rate gain/loss for 10 log(Q/K)= 6 dB. Figure 4.5: Maximum achievable rate gain or loss (between R(α) and RCosta (α)) versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/K) values, with partial knowledge of the channel state information at the encoder (L=1). - 97 - Chapter 4. Dirty Paper Coding with Partial State Information - 98 - Chapter 5 Data Hiding and Distributed Source Coding Contents 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.1.2 Formal Statement of Problem . . . . . . . . . . . . . . . . . 103 5.1.2.1 Data Hiding (F 1, G1) . . . . . . . . . . . . . . . . 105 5.1.2.2 5.1.2.3 5.2 Source Coding (F 2, G2) . . . . . . . . . . . . . . . 105 Summary of the overall setup . . . . . . . . . . . 106 5.1.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 106 Theoretical Background . . . . . . . . . . . . . . . . . . . 107 5.2.1 Channel Coding with Side Information (CCSI) . . . . . . . 107 5.2.2 Source Coding with Side Information (SCSI) . . . . . . . . 108 5.3 Contribution 1: Capacity Analysis for Multivariate Gaussian Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3.1 Evaluation of the Rate Distortion Function of the Carrier . 109 5.3.2 Capacity of the channel . . . . . . . . . . . . . . . . . . . . 113 5.4 Contribution 2: Practical Code Design . . . . . . . . . . 115 5.4.1 Practical Code Design for the Multivariate Gaussian Case . 115 5.4.1.1 5.4.1.2 5.4.1.3 5.4.2 Data Hiding Coder-Decoder Pair (F 1 − G1) . . . 116 Distributed Source Coding Coder-Decoder Pair (F 2 − G2) . . . . . . . . . . . . . . . . . . . . . . 116 Theoretical Limits and Performance Analysis of the Proposed System . . . . . . . . . . . . . . . . 117 Practical Code Design for Discrete Case . . . . . . . . . . . 117 5.4.2.1 Data Hiding Coder-Decoder Pair (F 1 − G1) . . . 117 5.4.2.2 Distributed Source Coding Coder-Decoder Pair (F 2 − G2) . . . . . . . . . . . . . . . . . . . . . . 118 - 99 - Chapter 5. Data Hiding and Distributed Source Coding 5.5 5.4.2.3 Conclusion Experimental Setup . . . . . . . . . . . . . . . . . 118 . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 - 100 - 5.1. Introduction We address the problem of combination of the Informed Data Hiding and Distributed Source Coding within a single system. With the existing limited power devices such as multi-sensor systems, PDAs etc., the researchers are attracted to low complexity data compression and watermarking applications. In this work, we provide an original framework based on Distributed Source Coding (DSC) and informed data hiding (IDH) which uses duality between the source and the channel coding with side information. A mark M is inserted into an host signal S with a fidelity criterion d(S, W ) ≤ D1 , and then the watermarked signal W is compressed; given that Ŝ, a noisy version of the host signal available only to the decoder. The decoder estimates both the message M̂ with a low probability of error Pe (M 6= M̂ ) ≤ 10−5 and the host signal Ŵ with a fidelity criterion d(W, Ŵ ) ≤ D2 . The rate-distortion function of the compression of the watermarked signal W and the capacity of the overall system is derived for the Gaussian case. Moreover a practical code design based on Trellis Coded Quantization (TCQ) and Low Density Parity Check (LDPC) codes is proposed and is evaluated for both binary and gaussian input cases.1 5.1 Introduction Both the Gel’fand Pinsker (G-P) model of channel coding with side information at the encoder Gel’fand and Pinsker (1980) and the Slepian-Wolf (S-W) model of lossless source coding with side information at the decoder Slepian and Wolf (1973) have various practical applications such as blind watermarking, distributed video coding, writing on defected cells. G-P model is extended to the continuous alphabet gaussian sources by Costa Costa (1983), and lossy version of the S-W is developed by Wyner-Ziv (W-Z) Wyner and Ziv (1976). The duality between these channel coding and source coding problems is studied in Pradhan et al (2003); Su et al (2000), and a more general model where the state information is partially available to the encoder and the decoder (which need not to be the same) is studied both for source coding Cover and Chiang (2002) and the channel coding Moulin and Wang (2007); Voloshynovskiy et al (2004)cases. Recently, various combinations of these source-channel coding schemes have been investigated, for instance combined data hiding and lossy compression with state information available only to the encoder in Maor and Merhav (2005); Yang and Sun (2006), and joint source-channel coding for W-Z and G-P channels where there exists two parallel channels in Merhav and Shamai (2003). In this chapter, we address the two problems i) data hiding with the state information available to the encoder and partial state information available to the decoder; ii) lossy compression with partial state information available only to the decoder. These two problems can be applied together within a scenario described as in the following. Consider the communication problem shown in Figure 5.1. Alice wants to send 1 This chapter corresponds to a paper that will be soon submitted. It has been presented partially in Dikici et al (2006b) and it is also related to the work of Dikici et al (2006c). - 101 - Chapter 5. Data Hiding and Distributed Source Coding Figure 5.1: A Communication System between Alice and Bob via a non-secure Carrier. a message M to Bob with a non-secure Carrier. She uses a host signal S which is available only to her, and Ŝ a noisy version of the host signal is available to Bob. Alice does not share her host signal with neither Bob nor the Carrier, however Bob shares his noisy version with the Carrier at the decoding end. Alice embeds her secret message M within the host signal S with a fidelity criterion such that Ed(S, W ) ≤ D1 . Carrier wants to compress the message W while guaranteeing a quality of service (QoS) to Alice and Bob such that his delivered copy Ŵ satisfies the constraint Ed(W, Ŵ ) ≤ D2 . Hence the Carrier compresses W with the information that Bob will share his noisy copy Ŝ at the decoding end. After the delivery of Ŵ to Bob, Bob extracts the hidden message M̂ using his noisy copy Ŝ with a low probability of error Pe . The novelty of our work is that we analyzed the theoretical limits of the system, and then propose a practical code design that operates close to the theoretical limits. One of the application areas of this system can be the development of a low complexity encoder for a mobile handheld that can compress the redundancy of the multimedia data while also carefully embedding the hidden information like meta-data. 5.1.1 List of Symbols List of symbols that are used in this chapter can be find below. M M M̂ Pe RC RS Discrete Message to be transmitted (Watermark). Alphabet of Watermark. Decoded Watermark. Probability of decoding error. Capacity of the data-hiding system. Compression rate. - 102 - 5.1. Introduction F 1 − G1 F 2 − G2 X S Ŝ W Ŵ B, T, Z U α D, D1 , D2 N Σ Q, K, Ñ h(X) h(X, Y, Z) I(X; Y |Z) E C0 C1 5.1.2 Encoder-decoder pair of data hiding. Encoder-decoder pair of the Wyner-Ziv compression. Stegotexts. State information. Partial state information available to the decoder. Watermarked data. Decompressed watermarked data at the decoder. Additive random noises. Auxiliary variable. A constant for coding with side information. Distortion levels. Gaussian distribution. Covariance matrix. Variances of S, T and Z respectively. Differential entropy of X. Joint differential entropy of X, Y and Z. Mutual information of X and Y given Z. Expectation operator. Source code. Channel code. Formal Statement of Problem In this section we will give a precise statement of the problem which we stated informally in the previous section. Here we consider the discrete-valued hidden message M , and continuous-valued host signal S and side information Ŝ. Specifically, the sequence {(Sn , Ŝn )}∞ n=1 repre- - 103 - Chapter 5. Data Hiding and Distributed Source Coding Figure 5.2: Data Hiding + Source Coding Scheme. sents independent samples of a pair of dependent random variables (S, Ŝ) with joint probability p(s, ŝ), taking values within continuous infinite set alphabet S × Ŝ, that is Q for any n and sn ×ŝn ∈ S n × Ŝ n , p(sn , ŝn ) = ni p(si , ŝi ). (S, Ŝ, W, Ŵ ) has joint probability distribution p(S, Ŝ, W, Ŵ ) and takes values within the set S × Ŝ × W × Ŵ. An index M ∈ {1, ..., 2nRC } will be sent to the receiver in n uses of the channel, where RC is the embedding capacity of the channel per transmission. The sequence {Xn } ∞ n=1 which takes values within infinite set X with a power constrained E(d(S, S + X)) ≤ D1 is used to transmit the index M , where X is independent given S and Ŝ. Furthermore the coded signal W is compressed by sending an index V ∈ {1, ..., 2nRS } with a fidelity criterion E(d(W, Ŵ )) ≤ D2 where RS is the rate of the Carrier per transmission given for a distortion D2 . The goal is to form the best estimate of M̂ with the probability of decoding error Pe → 0 respecting the fidelity criteria E(d(S, W )) ≤ D1 and E(d(W, Ŵ )) ≤ D2 , where S is available only to the embedding process and Ŝ is available to the decompression and extraction. This problem involves interplay between source coding and channel coding with side information. We consider the following system, involving embedding-extraction and compression-decompression pairs which are marked as [F 1 − G1] and [F 2 − G2] respectively. Let us define the data hiding (F1,G1) and source coding (F2,G2) mappings respectively in the following sections. - 104 - 5.1. Introduction 5.1.2.1 Data Hiding (F 1, G1) There is a mapping pair F1 and G1 given as F 1 : M × S n → X n, (5.1) where E(d(X, 0)) ≤ D1 , and W is defined as W = X + S so E(d(S, W )) ≤ D1 ; and G1 : Ŵ n × Ŝ n → M̂, (5.2) where E(d(W, Ŵ )) ≤ D2 . Given an encoder decoder pair [F1-G1], the error probability averaged over all possible messages M and all host signal S n is defined by p(F 1, G1) = P r{M̂ 6= M }. Definition 5.1 RC is the achievable rate if there exists a encoder decoder pair F1G1 such that p(F 1, G1) → 0. The capacity C is the supremum of the achievable rates. 5.1.2.2 Source Coding (F 2, G2) A source code (n, v, ∆) is defined by two mappings F2, G2, an encoder and a decoder respectively, where F 2 : W n → {1, 2, ..., v} , (5.3) and G2 : {1, 2, ..., v} × Ŝ n → Ŵ n , (5.4) d(W, Ŵ) = ∆ (5.5) and Definition 5.2 A pair (RS , D2 ) is said to be achievable if, for arbitrary ǫ > 0, there exists (for n sufficiently large) a code (n, v, ∆) with v ≤ 2n(RS +ǫ) , ∆ ≤ D2 + ǫ. (5.6) Definition 5.3 The rate distortion function R(D2 ) is R(D2 ) = min (RS ,D2 )∈R where R is the set of achievable (RS , D2 ) pairs. - 105 - RS , (5.7) Chapter 5. Data Hiding and Distributed Source Coding 5.1.2.3 Summary of the overall setup Sender has access to the realization of the secret message M and the noncausal host signal realization sn . The encoder function F 1 finds xn = F 1(M, sn ) (5.8) P with a power criterion 1/n xi 2 ≤ D1 . Then, the sender passes the watermarked signal wn = sn +xn to the unreliable carrier. The carrier compresses the watermarked signal as v = F 2(wn ) = F 2(sn + F 1(M, sn )), (5.9) and transmit it to the receiver. The receiver shares its noisy version of the host signal with the carrier and the carrier reconstructs the watermarked signal ŵn = G2(v, ŝn ) = G2(F 2(sn + F 1(M, sn )), ŝn ), (5.10) with a fidelity criterion d(wn , ŵn ) ≤ D2 . At the final step, the receiver estimates the secret message M̂ = G1(ŵn , ŝn ) = G1(G2(F 2(sn + F 1(M, sn )), ŝn ), ŝn ). 5.1.3 (5.11) Summary of Results In this chapter, we give the rate distortion function of the Carrier and the capacity formula of the system with continuous alphabet gaussian distributed state information. Letting f (S, X, T ) has a multivariate Gaussian distribution ∼ N (0, ΣS,X,T ) where the covariance matrix ΣS,X,T = diag(Q, D1 , K). Defining the state information available to the decoders G1 and G2 as Ŝ = S + T and watermarked signal as W = S + X, Theorem-5.2 states that the minimum rate for a mean distortion level E{d(W, Ŵ )} ≤ D2 for the Carrier is ( QK QK D1 1 0 < D2 < D1 + Q+K ln + , 2 D2 (Q+K)D2 , RS (D2 ) = (5.12) QK 0, D2 ≥ D1 + Q+K in nats per channel use. Moreover, according to the Theorem-5.3, the capacity of the overall system is given as 1 D1 (D1 + Q − D2 ) RC = ln 1 + (5.13) 2 D2 (D1 + Q) in nats per channel use. Some of our remarks can be found below: • Remark-1: The rate distortion function RS (D2 ) of the carrier is the same as in the case where the state information Ŝ is accessible at the both compressor (F 2) and de-compressor (G2). - 106 - 5.2. Theoretical Background • Remark-2: The overall capacity RC does not depends on the accessibility of Ŝ to the decoder G1 or not. In return, the accessibility of Ŝ to the decompressor (G2) affects the capacity RC indirectly because RC depends on D2 (Equation-5.13), where D2 depends on K (Equation-5.12). • Remark-3: Unlike the capacity term found in Equation-4.8 in page 90, the overall capacity RC depends on the variance Q of the host signal S. Finally a practical coding approach for both Gaussian case is proposed using superposition coding in Chapter-3.5 and LDPC binning method in Chapter-2.4. And a similar coding scheme is done for the the Binary Symmetric Case. The organization of the remaining of this Chapter as follows: After giving the theoretical background of the source-channel coding in Chapter5.2, Chapter-5.3 focuses on the rate-distortion function of the carrier and the overall capacity analysis of the system. You can find the proofs of the rate distortion function and the capacity term in this section. Then the practical code design for Gaussian Case is given in Chapter-5.4.1 and practical code design for the binary symmetric case is given in Chapter-5.4.2. 5.2 5.2.1 Theoretical Background Channel Coding with Side Information (CCSI) The capacity of the memoryless channel p(y|x, s, ŝ) with state information (S, Ŝ) i.i.d. p(s, ŝ), all taking values from finite alphabet, with S n available to the sender and Ŝ n available to the receiver noncausally (See Figure5.3) is given in Cover and Chiang (2002) as C = max [I(U ; Y, Ŝ) − I(U ; S)], p(x,u|s) (5.14) where the maximum is over all joint distributions of p(u)p(s, ŝ, x|u)p(y|x, s, ŝ), where U is an auxiliary random variable with finite cardinality. Figure 5.3: Channel Coding with two sided state information scheme - 107 - Chapter 5. Data Hiding and Distributed Source Coding Moreover, the general capacity expression for the continuous alphabet case has been stated in Moulin and Wang (2007): C = sup min [I(U ; Y, Ŝ) − I(U ; S̃)], p(x,u|s̃) p(y|x,s) (5.15) where s̃ is the state information partially available to the encoder. The achievable rate region for the continuous alphabet Gaussian case has been derived in Chapter-4. The readers can refer to Chapter-3.2for more detailed background on CCSI. 5.2.2 Source Coding with Side Information (SCSI) The details of the two main theorems concerning SCSI are given in Chapter-2.2. While Slepian and Wolf (1973) derived the minimum achievable rate for lossless compression of discrete input source, Wyner and Ziv (1976) extended this theory to lossy case and derived the rate distortion function for the binary symmetric case and the continuous alphabet gaussian input case. Figure 5.4: Rate Distortion theory with side information at the decoder: Wyner-Ziv Setup. Scheme. 5.3 Contribution 1: Capacity Analysis for Multivariate Gaussian Source In this section, we derive the capacity of the multivariate gaussian IDH-DSC communication problem shown in Figure 5.5. An index M ∈ {1, ..., m} will be sent to the receiver in n uses of the channel, where m is the greatest integer smaller than or equal to enRC , and RC is the rate in nats per transmission. Let S = (S1 , S2 , ..., Sn ) be the sequence of noncausal state of the channel for n transmissions perfectly known to the encoder, assumed to be a sequence of independent identically distributed (i.i.d.) N (0, Q) random variables. We consider the case where this sequence of state is partially known to the decoder Ŝ = (Ŝ1 , Ŝ2 , ..., Ŝn ) noncausally and is modeled as Ŝ = S + T where θ is i.i.d. random variable according to N (0, K). We use the squared error metric for the distortion measure of gaussian source. We first evaluate the rate distortion function of the Carrier RS (D2 ), and then find the capacity of the overall system RC . - 108 - 5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source Figure 5.5: Multivariate Gaussian Channel of IDH DSC scheme. 5.3.1 Evaluation of the Rate Distortion Function of the Carrier Consider the communication channel for the Carrier point of view in Figure-5.6. The noisy version of the host signal sn , such that Ŝ = S + T , is available to the encoder when the Switch-A is closed, and it is not available to the encoder if the Switch-A is open. We are interested in the case where the Switch-A is open, however to derive the rate distortion function for this case, the case where the switch is closed is employed. We assume that the joint distribution f (S, X, T ) has a multivariate Gaussian distribution ∼ N (0, ΣS,X,T ) where the covariance matrix ΣS,X,T is ΣS,X,T = diag(Q, D1 , K). Definition 5.4 If switch-A is closed, the rate distortion function RW |Ŝ (D2 ) for compressing W given a noisy observation Ŝ available both to the encoder and to the decoder with a fidelity criterion d(W, Ŵ ) ≤ D2 is defined as RW |Ŝ (D2 ) = min p(ŵ|w,s):E{d(w,ŵ)}≤D2 I(W ; Ŵ |Ŝ). (5.16) ∗ Definition 5.5 If switch-A is open, the rate distortion function RW (D2 ) for com|Ŝ pressing W given a noisy observation Ŝ available only to the decoder with a fidelity criterion d(W, Ŵ ) ≤ D2 (See Figure5.6) is: h i ∗ RW (D ) = inf I(W ; E) − I( Ŝ; E) , (5.17) 2 |Ŝ p(ŵ|w,s):E{d(w,ŵ)}≤D2 where E is an auxiliary variable. Theorem 5.1 The rate distortion function RW |Ŝ (D2 ) is: - 109 - Chapter 5. Data Hiding and Distributed Source Coding Figure 5.6: Multivariate Gaussian Case: Carrier point of view. RW |Ŝ (D2 ) = ( 1 2 ln 0, D1 D2 + QK (Q+K)D2 0 < D2 < D1 + D2 ≥ D 1 + QK Q+K , QK Q+K . (5.18) Proof : We first find a lower bound for the rate distortion function. Then prove that this is achievable. Since E{d(w, ŵ)} ≤ D2 , we observe I(W ; Ŵ |Ŝ) = h(W |Ŝ) − h(W |Ŝ, Ŵ ) = h(W, Ŝ) − h(Ŝ) − h(W − Ŵ |Ŝ, Ŵ ) ≥ h(W, Ŝ) − h(Ŝ) − h(W − Ŵ ) (5.19) ≥ h(W, Ŝ) − h(Ŝ) − h(N (0, Ed(W, Ŵ ))) (5.20) 1 1 = h(W, S) − ln((2πe)(Q + K)) − ln((2πe)D2 ) 2 2 1 1 = ln((2πe)2 ((Q + D1 )(Q + K) − Q2 )) − ln((2πe)2 (Q + K)D2 ) 2 2 (5.21) D1 (Q + K) + QK 1 = ln 2 (Q + K)D2 1 D1 QK = ln + . (5.22) 2 D2 (Q + K)D2 where h is the differential entropy defined in Chapter-1.2. Please note that Equation 5.19 follows from the fact that conditioning reduces entropy, Equation 5.20 follows from the fact that the gaussian distribution maximizes the entropy for a given variance, and Equation 5.21 follows from the fact that the - 110 - 5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source joint probability of p(w, s) is multivariate gaussian distribution and covariance matrix Q + D1 Q Σ(w,s) = . Q Q+K 2 with mean 0 (5.23) Hence 1 RW |Ŝ (D2 ) ≥ ln 2 D1 QK + D2 (Q + K)D2 in nats, (5.24) or 1 RW |Ŝ (D2 ) ≥ log2 2 D1 QK + D2 (Q + K)D2 in bits. (5.25) To find the conditional density f (ŵ|w) that achieves this lower bound, it is more convenient to look at the test channel (conditional density f (w|ŵ)) to construct f (w|ŵ) to achieve the equality in the bound. We choose the joint QK }}, distribution as shown in Figure5.7. If D2 ≤ max{0, min{Q + D1 , D1 + Q+K we choose W = Ŵ + B, Ŵ ∼ N (0, Q + D1 − D2 ), For greater values of D2 , if Q + D1 < D1 + QK Q+K B ∼ N (0, D2 ) (5.26) we choose Ŵ = 0 with probability 1 achieving R(D2 ) = 0, and if Q + D1 ≥ D1 + QK Q+K we choose Ŵ = Ŝ with probability 1 achieving R(D2 ) = 0. This completes the proof. Figure 5.7: Gaussian test channel that achieves lower bound found in Equation 5.21. Input: Ŵ ∼ N (0, Q + D1 − D2 ), output: Ŵ ∼ N (0, Q + D1 ). 2 2 See AppendixA.3 for the formula of joint differential entropy of a multivariate gaussian distributed random variables. - 111 - Chapter 5. Data Hiding and Distributed Source Coding Please note that the equivalence of this test channel can be constructed as W is the input and Ŵ is the output of the channel by using an addition and a multiplication operation (See Figure 5.8). The equivalent channel outputs Ŵ = (W + Z) · a D2 (D1 +Q) where Z is i.i.d. N (0, D ), and a is a constant multiplier defined as a = 1 +Q−D2 D1 +Q−D2 D1 +Q . Then Ŵ has a gaussian distribution with 0 mean and variance 2 σŴ = a2 ((D1 + Q) + D2 (D1 + Q) ) = D1 + Q − D2 . D1 + Q − D2 (5.27) We will use this equivalent channel in our capacity calculations of the overall system. Figure 5.8: Equivalent setup of the test channel in Figure-5.7 by using an addition and a multiplication operator. Theorem 5.2 For the independent multivariate gaussian case, the rate distortion ∗ function RW (D2 ) has the value |Ŝ ∗ RW (D2 ) = RW |Ŝ (D2 ) = |Ŝ ( 1 2 ln 0, D1 D2 + QK (Q+K)D2 , 0 < D2 < D1 + D2 ≥ D 1 + QK Q+K , QK Q+K . (5.28) Proof : We give a similar proof as in Wyner and Ziv (1976); Oohama (1997). Let Y , E are conditionally independent given X, then the term I(W ; E) − I(Ŝ; E) in Equation-5.17 is I(W ; E) − I(Ŝ; E) = h(E|Ŝ) − h(E|Ŵ ) = h(E|Ŝ) − h(E|Ŵ , Ŝ) = I(W ; E|Ŝ) (5.29) ≥ I(W ; Ŵ |Ŝ), (5.30) where Equation-5.29 follows from the assumption of Y , E are conditionally independent given X, and Equation-5.30 follows from the data processing inequality. The equality in Equation-5.30 holds if and only if - 112 - 5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source h(W, Z|Ŵ , Ŝ) = 0. (5.31) For the independent gaussian variables X, T and W in Figure-5.5, the equation h(W, Z|Ŵ , Ŝ) = 0 holds, and there is no rate loss with respect to the case ∗ Switch-A is closed. Hence RW (D2 ) = RW |Ŝ (D2 ) and equals to the value as |Ŝ in Equation-5.28. 2 5.3.2 Capacity of the channel In this Section, we derive the achievable communication rate between Alice and Bob. With our findings of the rate-distortion function of the Carrier in the previous section, the overall system can be sketched as in Figure-5.9 by replacing the Carrier step by its equivalent channel setup given in Figure-5.8. The setup in Figure-5.9 is closely related to the Case-B of "Dirty Paper Coding with Partial State Information" (Chapter-4). The two differences between the Figure-4.1 in page 86 and the Figure-5.9 are i) The absence of the random variable θ in Figure-5.9 and ii) A multiplication element is added to the output of the channel in Figure-5.9 such that it outputs Ŵ = a · (X + S + Z) while the setup in Figure-4.1 outputs Y = X + S + Z. We follow the same methodology as in Chapter-4.3 in order to find the achievable rate region. Figure 5.9: Equivalent Scheme of Gaussian Channel. Theorem 5.3 The capacity RC for the communication system given in Figure-5.9 is 1 D1 (D1 + Q − D2 ) RC = ln 1 + . (5.32) 2 D2 (D1 + Q) - 113 - Chapter 5. Data Hiding and Distributed Source Coding Proof : Let X, S, Z, and T are i.i.d. random variables with respective Gaussian 2 (D1 +Q) distributions N (0, D1 ), N (0, Q), N (0, D D1 +Q−D2 ), and N (0, K). We define the variance of the r.v. Z as D1 +Q−D2 D1 +Q D2 (D1 +Q) D1 +Q−D2 = Ñ , and the multiplication constant = a. Then, the joint distribution f (X, S, Z, T ) has a multivariate Gaussian distribution ∼ N (0, ΣX,S,Z,T ) with a covariance matrix ΣX,S,Z,T = diag(D1 , Q, Ñ , K). The channel outputs Ŵ = a · (X + S + Z). Assuming U = X + αS where α is a constant to be determined, the joint distribution, f (U, Ŵ , Ŝ) is then a multivariate gaussian with mean 0 and covariance matrix ΣU,Ŵ ,Ŝ = BΣX,S,Z,T Bt where B is the matrix that satisfies the equation X U Ŵ = B · S . Z Ŝ T (5.33) The solution for the matrix B 1 α 0 0 B= a a a 0 0 0 0 1 (5.34) yields to a covariance matrix D1 + α 2 Q a(P + αQ) αQ ΣU,Ŵ ,Ŝ = a(P + αQ) a2 (D1 + Q + Ñ ) aQ αQ aQ Q+K (5.35) Then the relevant mutual informations can be calculated to yield I(U ; Ŵ , Ŝ) = h (U ) + h(Ŵ , Ŝ) − h(U, Ŵ , Ŝ) = h(X + αS) + h(a(X + S + Z), S + T ) − h(U, Ŵ , Ŝ) 1 = ln (2πe)(D1 + α2 Q) 2 1 + ln (2πe)2 (a2 ((D1 + Q + Ñ )(Q + K) − Q)) 2 1 (5.36) − ln (2πe)3 (a2 (D1 QK(1 − α)2 + Ñ K(D1 + α2 Q) + D1 Ñ Q)) 2 (5.37) and similarly I(U ; S) = h(U ) + h(S) − h(U, S) = - 114 - 1 ln 2 D1 + α 2 Q D1 . (5.38) 5.4. Contribution 2: Practical Code Design Then if the term I(U ; Ŵ , Ŝ) − I(U ; S) can be given as a function of α as ! 1 D1 (K(D1 + Q + Ñ ) + Q(D1 + Ñ )) R(α) = ln . (5.39) 2 D1 QK(1 − α)2 + Ñ K(D1 + α2 Q) + D1 Ñ Q Equation-5.39 has the same form as in Equation-4.8 in page 90. As in same way, if the Equation-5.39 is maximized with respect to α, the maximum achievable rate is found to be 1 1 D1 D1 (D1 + Q − D2 ) R(α⋄ ) = ln 1 + = ln 1 + (5.40) 2 2 D2 (D1 + Q) Ñ for α⋄ = D1 /(D1 + Ñ ). Please note that the maximum achievable rate does not depends on the correlation noise K. Since the achievable rate can not pass the capacity where the state information S is perfectly available both to the encoder and to the decoder which is equal to R(α⋄ ), then the capacity of this channel is 1 D1 (D1 + Q − D2 ) ⋄ RC = R(α ) = ln 1 + (5.41) 2 D2 (D1 + Q) This completes the proof. 5.4 2 Contribution 2: Practical Code Design In the following two sections, we will give practical code design of a hybrid scheme which utilize both channel coding and rate distortion with state information at the encoder and decoder for the Data Hiding and Distributed Source Coding problem introduced in the previous sections. The first code design is designed for evaluating the theoretical limits calculated for the Gaussian side information case in Chapter5.3. While for the embedding part, the superposition data hiding code explained in Chapter-3.5 is applied; for the source coding part, we use the DSC coding mechanism explained in Chapter-2.4. The second practical design one is for the side information with discrete alphabet case. 5.4.1 Practical Code Design for the Multivariate Gaussian Case The theoretical rate distortion function of the Carrier and the overall capacity limits of the communication system given in Figure-5.5 for the Gaussian case is calculated as Equation-5.28 and Equation-5.32 in Chapter-5.3. In this section, we propose a hybrid scheme for gaussian case which utilize both channel coding and rate distortion with state information at the encoder and decoder respectively. Briefly, Alice has a n-length host vector s where each element of the vector is i.i.d. with probability distribution ∼ N (0, Q). Bob has a noisy version of this host vector ŝ = s + t where each element of t is i.i.d. with probability - 115 - Chapter 5. Data Hiding and Distributed Source Coding distribution ∼ N (0, K). At the decoding end, Bob shares this noisy version with the Carrier. Alice embeds n/2 bit length message M within s (which corresponds to an embedding rate of RC = 1/2 bit per P channel use) such that the watermarked signal w satisfies a fidelity criterion 1/n ni=1 (wi −si )2 ≤ D1 . The Carrier then compresses the vector w to RS = 1 bit/channel use and decompresses at the decoder side as ŵ using Pthe noisy version ŝ shared by Bob such that the MSE distortion level satisfies 1/n ni=1 (wi − ŵi )2 ≤ D2 . In the final stage, Bob extracts the hidden message M̂ with the help of ŵ and ŝ. Decoding error probability can be calculated as Pn/2 (Mi ⊕ M̂i ) Pe = i=1 , (5.42) n/2 P where is defined to be summation over real numbers while ⊕ is the modulo-2 summation. Up to this point, we only fix the embedding rate RC as 1/2 bit per channel use and compression rate RS as 1 bit per channel use. Below you can find the details of each block. 5.4.1.1 Data Hiding Coder-Decoder Pair (F 1 − G1) For F 1 − G1 pair for Alice and Bob, we use the superposition embedding described in Chapter-3.5. F 1 is composed from an LDPC coder and a TCQ coder. A 1/2 rate LDPC code C1 modulates the hidden message M with a variance of α⋄ D1 as D1 is a constant that maximizes described in Chapter-3.5.3, where α⋄ = D2 (D1 +Q) (D1 + D 1 +Q−D2 ) Equation-5.40. Then the quantization code C0 finds the embedding error signal x which has a variance D1 . Finally F 1 outputs the watermarked signal w = s + x. The decoder G1 receives the noisy observation ŵ from the Carrier and accesses the noisy state information ŝ. Then extracts the message M̂ using a joint LDPCBCJR decoding algorithm as explained in Chapter-3.5.6. According to the performance of the data hiding system explained in Chapter-3.5, for Q = 1 the data can be embedded with a embedding noise variance D1 = 0.062. For an embedding rate of 1/2 bit per channel use, the hidden message can be decoded even after an AWGN noise which is 1.5 dB away from the theoretical AWGN noise. 5.4.1.2 Distributed Source Coding Coder-Decoder Pair (F 2 − G2) We now explain our code design for the Carrier’s F 2 − G2 pair. For F 2, a 4-level lloyd-max quantizer is used for quantizing w as 2 bit per sample vector wq . The Carrier then codes these quantized 2n bits by a 2/3 rate LDPC code as explained in Chapter-2.4, and only the n bit parity vector z is transmitted to the Bob. In the decoder end G2, with the help of noisy state information ŝ shared by Bob, the Carrier applies an iterative belief propagation decoding process (See Chapter2.4.3 for details). - 116 - 5.4. Contribution 2: Practical Code Design 5.4.1.3 Theoretical Limits and Performance Analysis of the Proposed System The theoretical limits of the rate-distortion and channel capacity are calculated as Equation-5.12 and Equation-5.13. Let us fix the embedding capacity RC as 1/2 bit/channel use, the bit-rate of the carrier RS as 2 bits/channel use, the embedding power D1 as 0.062 and the variance of the host signal Q as 1. The theoretical D2 value in order to achieve this capacity can be found by evaluating Equation-5.13 1 0.062(0.062 + 1 − D2 ) 1 = log2 1 + , (5.43) 2 2 D2 (0.062 + 1) which yields D2 = 0.0586. If we replace the theoretical D2 value to the rate distortion function Equation-5.12 to find the corresponding K value that achieves a rate 1 bit/channel use, we end up with 1 0.062 K 1 = log2 + , (5.44) 2 0.0586 (1 + K)0.0586 which corresponds to K = 0.2082. In our system the embedding process can be perfectly reconstructed up to a MSE level D2 = 0.0422 which corresponds a gap of 0.0586 = 1.43 dB (5.45) 10 log10 0.0422 from the theoretical setup. 5.4.2 Practical Code Design for Discrete Case In this section, we develop a toy example for the combined IDH DSC setup in Binary Symmetric Case. A simple embedding process is followed by DSC coding based on LDPC binning. The aim is to achieve a low embedding rate with a fidelity criterion based on hamming distance. Then the watermarked signal is compressed using Slepian-Wolf coding. 5.4.2.1 Data Hiding Coder-Decoder Pair (F 1 − G1) For the case of informed data hiding of M within S, we used basic quantization based on memoryless coset construction. The algorithm is described as follows: 3 bits information is partitioned into 4 cosets such that each element of the coset has a hamming distance of 3. According to the two bits data of M the coset members of that index is chosen Coset 00 = {000, 111}, Coset 01 = {001, 110}, Coset 10 = {010, 101}, Coset 11 = {011, 100}. After creating the codebook, 2bit chunk of M and R bit chunk of S is taken. And the least significant 3 bits of the sub-block of the host signal S is depicted for embedding. The 3 bits value of S is quantized to W : W (S, M ) = arg minZ∈Coset M k Z − S k which W is at most one bit differ from - 117 - Chapter 5. Data Hiding and Distributed Source Coding M . The distance metric is chosen as hamming distance. And this insertion of 2 bits within block length R continues until embedding all the data. As an example, assume that the 2 bits length message 01 is being embedded into the least 3 significant bits of S which is 010. The element which has the minimum hamming distance between 010 and the elements of Coset 01 is chosen as the quantification output, which is W = 110 in this case. At the decoder side, the extraction of the watermark is straightforward such that the knowing the codebook and insertion frequency R, the coset index that the received block data resides in is decoded as the embedded data. 5.4.2.2 Distributed Source Coding Coder-Decoder Pair (F 2 − G2) For F 2 − D2 pair, we use Syndrome Coding using LDPC. The Carrier codes the watermarked signal bits W by using a 2/3 rate LDPC code as explained in Chapter2.4, and only the parity vector z is transmitted to the Bob. In the decoder end G2, with the help of noisy state information ŝ shared by Bob, the Carrier applies an iterative belief propagation decoding process (See Chapter-2.4.3 for details). 5.4.2.3 Experimental Setup In our experiments, we fix R = 20 and embed a 50 bit message message M into 4000 bit signal S which is distributed Bernoulli(1/2). Then using 2/3 LDPC binning scheme as explained in Chapter-2.4 the W is compressed to a 2000 bit length signal and transmitted to the decoder. The decoder performs a modified belief propagation decoding using the parity bits of W and the side information Ŝ = S ⊕ T where T is binary string with bernoulli(p1 ) distribution. The performance of the system for a block length 4000 is compared with the performance of DSC system without any embedding explained in Chapter-2.4.4. As seen in Figure-5.10, decoding bit error rate of the LDPC decoder versus entropy of the correlation noise H(p1 ) is plotted. The dashed curve corresponds to the case where there is no embedding in to the S where the other corresponds to the compression performance of the W after 1/20 bit per sample embedding rate. The embedding process has a performance loss of 0.02 bit per sample if we compared with no embedding case which is acceptable. 5.5 Conclusion In this chapter, both theoretical and practical analysis of IDH and DSC system is done. In theoretical part, strong information theoretical results are achieved such as the derivation of rate distortion function for the non-trust Carrier and the capacity formula of the overall embedding system. We also concluded interesting remarks on these theoretical findings such as absence of the noisy state information of the Bob to the Carrier encoding stage does not change the rate distortion curve. Similarly, the absence of the original host signal to the Bob does not neither change the capacity of - 118 - 5.5. Conclusion Figure 5.10: Embedding performance for 1/200 bit per sample with a compression of 2 : 1 of the watermarked string using 2/3 rate LDPC code with block length 4000. Minimum 0.02 bit per sample entropy rate loss with respect to no-embedding case. the embedding system. Moreover, practical code designs for gaussian case and BSC case are proposed with the help of our proposed DSC method in Chapter-2 and IDH method in Chapter-3. - 119 - Chapter 5. Data Hiding and Distributed Source Coding - 120 - Conclusion Strongly motivated by the duality between the source coding and the channel coding with state information, we would like to propose a system that contains data hiding and efficient compression functionalities. Moreover the theoretical limits of the proposed system should be studied while evaluating the proposed practical design with respect to the theoretical limits. This subject intersects a wide range of signal processing fundamentals such as error correcting codes, vector quantification, likelihood marginalization, iterative decoding while the analysis of the system limits is related strongly to information theory. The contributions of this dissertation can be grouped as theoretical findings and practical code designs. Information Theoretical Contributions In this dissertation, the theoretical rate distortion function and embedding capacity bounds are derived for the infinite alphabet gaussian case. Our theoretical contributions can be itemized as follows: 1. Maximum achievable rate of the communication system in Figure-4.1 in page 86, where the channel state information S is partially available to the encoder as S1 and partially available to the decoder as S2 is derived in Chapter-4. This general setup is reduced into more simpler cases and each case is analyzed in detail. 2. The capacity of the communication system in Figure-5.9 in page 113 is evaluated where the state information S is perfectly available to the encoder as S and partially available to the decoder as Ŝ, and the channel outputs the compressed signal Ŵ . 3. Rate distortion function of the communication system in Figure-5.6 in page 110 is derived where the compression of S + X is done with a noisy version S + T is accessible to the decoder. - 121 - Conclusion Table 5.2: Channel Coding with State Information Problems Problem Gel’fand and Pinsker (1980) Costa (1983) Cover and Chiang (2002) Dikici et al. Chapter-4(General Case) Dikici et al. Chapter-4(Case-B) Public Watermarking Encoder State S1 S S S1 S1 Channel State Sa S S S S Decoder State S2 S2 S2 Type of source Discrete Gaussian Discrete Gaussian S S S2 Gaussian S - - Discrete or Gaussian Table-5.2 briefly gives the existing theoretical studies in the field of channel coding with side information, and compares with our theoretical contributions in this area. The problems are defined as the channel state Sa , its availability to the encoder and to the decoder while the type of the state can be drawn from discrete or continuous alphabet sets. Hence the rows four and five correspond to our information theoretical contributions no 1 and no 2 respectively. Similarly, Table-5.3 positions our contribution no 3 with respect to the source coding with channel information problems. The encoder input, the decoder’s side information and the type of sources investigated in each problem are given. Table 5.3: Source Coding with State Information Problems Problem Slepian and Wolf (1973) Encoder access S Decoder access S+T Wyner and Ziv (1976) S S+T DIKICI et al. (Chapter-5) S+X S+T Type of source Lossless rate Discrete Case R(D) function BSC and Gaussian R(D) function Gaussian Proposed Practical Code Designs Our proposed practical code designs can be grouped into two categories such as distributed source coding and data hiding. In DSC, we proposed a Slepian-Wolf coder based on LDPC binning which have a - 122 - Conclusion performance gap of 0.08 bits per channel use with respect to the maximum correlation noise variance for 2 : 1 rate compression. Moreover this coding method is applied to an image compression system where Low pass DWT coefficients assume to be known to the decoder as side information. In Data Hiding, we proposed a low embedding rate robust image watermarking based on Miller et al (2004) system with using DWT coefficients of the image and a perceptual shaping for embedding process. Furthermore a high embedding rate system is proposed with the concatenation of a good source code based on TCQ and a good channel code based on LDPC. The system operates at an AWGN variance 1.5 dB away from the theoretical limits for an embedding rate of 1/2 bit per channel use. Finally, the combination of our Slepian-Wolf coding scheme and Superposition data hiding scheme is used to evaluate our theoretical findings in Chapter-5. Perspectives The perspectives can be grouped as the application point of view and the theoretical point of view. For the application perspectives, several improvements on the proposed schemes and possible can be achieved. For instance, the utilization of the max-lloyd quantizer for the Slepian-Wolf coder in Chapter-5.4.1.2 can be replaced by a more effective quantization code. Moreover, in high embedding rate practical design in Chapter3.5.2.2, the 2-level PAM coding of the channel code C1 can be done with a more efficient way by considering also the side information available to the encoder. Moreover, a practical code design for the general case given in Chapter-4 can be proposed using a modified version of the informed high embedding rate code in Chapter-3.5. One of the practical application based on the proposed schemes in this dissertation can be the transmission of high resolution image or video given that a coarse version is publicly freely available. The second stream enhances the coarse version if the receiver purchased the key embedded on to the second stream. Another application could be the embedding of some meta-data into image for indexing purposes. Finally, in theoretical point of view, two main directions that we will continue to study can be given as: • Our information theoretical contributions can be extended to the case where state information is not i.i.d. but drawn from gauss-markov source. By deriving so, a more realistic theoretical limits for image and video signals can be found. • The theoretical setup of communicating with non-trust Carrier in Chapter-5 can be extended to the encrypted domain such that Alice transmits her signal to Bob using an encryption and the Carrier try to compress the encrypted domain signal with a fidelity criterion. - 123 - Conclusion - 124 - Appendix - 125 - Appendix A Achievable Rate Region Calculations for two partially side information known to the encoder and decoder respectively A.1 Derivation of the Achievable Rate Region Recalling that Y = X + S + Z and S2 = S + T , the joint distribution of U, Y, S2 is a multivariate Gaussian distribution f (U, Y, S2 ) ∼ N (0, BΣBt ), where U Y =B· S2 and where X S Z θ T I αI 0 αI 0 B = I I I 0 0 . 0 I 0 0 I Then, (P + α2 (Q + L))I (P + αQ)I αQI . (P + αQ)I (P + Q + N )I QI B · Σ · Bt = αQI QI (Q + K)I - 127 - (A.1) (A.2) (A.3) Chapter A. Achievable Rate Region Calculations for two partially side information known to the encoder and decoder respectively Hence, the joint entropy1 of the random variables (U ; Y ; S2 ) is h(U ; Y ; S2 ) = 1/2 ln (2πe)3 BΣBt . (A.4) The relevant mutual informations can be calculated to yield I(U ; Y, S2 ) = h (U ) + h(Y ; S2 ) − h(U ; Y ; S2 ) = h(X + αS + αθ)) + h(X + S + Z; S + T ) + − − h(U ; Y ; S2 ) 1 = ln (2πe)(P + α2 (Q + L)) 2 1 ln (2πe)2 ((P + Q + N )(Q + K) − Q2 ) 2 1 ln (2πe)3 (P QK(1 − α)2 + N K(P + α2 (Q + L)) + α2 L(P Q + P K + QK + N Q) + P N Q) 2 (A.5) and similarly 1 I(U ; S1 ) = h(U ) + h(S + θ) − h(U ; S + θ) = ln 2 A.2 P + α2 (Q + L) P . (A.6) Maximization of the Rate The rate function in Equation 4.5 has the form: 1 D R(α) = ln , 2 Aα2 + Bα + C (A.7) where A, B, C and D are constants depending of the values P,Q,K and L. The denominator of the ln term is a quadratic polynomial and is minimized when α = −B/2A. Then, maximization of R(α) with respect to α has the form: 4AD 1 . (A.8) R(−B/2A) = ln 2 4AC − B2 Since the term D is expressed as C+[...], the rate can be written as: 4A(C + [...]) − B2 + B2 4A[...] + B2 1 1 R(−B/2A) = ln = ln 1 + 2 2 4AC − B2 4AC − B2 (A.9) and then it is straightforward to obtain the rate by replacing A, B, C and [...] by their values. 1 See Appendix A.3. - 128 - A.3. Entropy of Multivariate Gaussian Distribution A.3 Entropy of Multivariate Gaussian Distribution It is well known that for X multivariate Gaussian distribution X ∼ N (µ, Σ) with mean µ and covariance matrix Σ, then, fX (x1 , x2 , .., xn ) = 1 (2π)n/2 |Σ| 1 exp(− (x − µ)T Σ−1 (x − µ)) 2 (A.10) where |Σ| is the determinant of the covariance matrix. Joint entropy of f is: h(f ) = Z ∞Z ∞ −∞ −∞ Z ... ∞ f (x) ln(f (x)) dx −∞ 1 (n + n ln(2π) + ln |Σ|) 2 1 = ln ((2πe)n |Σ|) . 2 = (A.11) Moreover if Y is a linear transformation of X such that Y = BX, Y is also multivariate Gaussian distribution with X ∼ N (Bµ, BΣ−1 Bt ). - 129 - Chapter A. Achievable Rate Region Calculations for two partially side information known to the encoder and decoder respectively - 130 - Appendix B Codes and Degree Distributions for Generating LDPC Matrices You can find below the LDPC degree distributions used for Distributed Source Coding in Chapter-2 and Informed Data Hiding in Chapter-3. B.1 Degree Distributions of rate 2/3 code, for 2 : 1 compression rate in DSC • Regular code: λ(x) and ρ(x) are given as λ(x) = x2 , (B.1) ρ(x) = x5 . (B.2) and • Irregular Code: λ(x) and ρ(x) are given as λ(x) =0.41584493083218x + 0.32456702571975x2 + 0.17761981591744x6 + 0.0025725519244473x8 + 0.0046654731946759x18 + 0.039272974694212x20 + 0.015612811744969x21 + 0.0017256946022807x26 + 0.01811872137005x99 , (B.3) and ρ(x) = 0.80851063829787x17 + 0.19148936170213x18 . - 131 - (B.4) Chapter B. Codes and Degree Distributions for Generating LDPC Matrices B.2 Degree Distribution of rate 1/2 code, for Informed Data Hiding λ(x) and ρ(x) are given as λ(x) =0.4811081282955x + 0.31433341715558x2 + 0.15356804095148x6 + 0.050990413597444x19 , (B.5) and ρ(x) = x7 . - 132 - (B.6) Appendix C Publications of the author Publications Related to the Thesis In Preparation • «Dirty Paper Coding with Partial State Information». • «Joint Data Hiding and Wyner-Ziv Coding, Theory and Practice».. International Conferences and Workshops • Dikici, C., Idrissi, K. and Baskurt, A. «Dirty-paper writing based on LDPC codes for Data Hiding». International Workshop on Multimedia Content Representation, Classification and Security (MRCS), pages 114–120, LNCS, September 2006. • Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding of Still Images». European Signal Processing Conference (EUSIPCO). September 2006. • Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding with Partially Available Side Information». SPIE Security, Steganography, and Watermarking of Multimedia Contents VIII , volume 6072, 60721E, February 2006. • Dikici, C., Guermazi R., Idrissi, K. and Baskurt, A. «Distributed Source Coding of Still Images». European Signal Processing Conference (EUSIPCO) VIII , September 2005. - 133 - Chapter C. Publications of the author National Conferences • Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage informé pour le codage distribué». CORESA, September 2006. National Plenary • Dikici, C., «Codage et tatouage avec information adjacente». GDR ISIS,Théme D: Télécommunications: Journée Plénière, Paris, December 2006. Other Publications • Dikici, C. and Bozma, I. «Video Coding Based on Pre-attentive Processing». SPIE Real-Time Imaging, volume 5671, pages 212–220, January 2005. • Dikici, C., Civanlar, R. and Bozma, I. «Fovea based Coding for Video Streaming». International Conference on Image Analysis and Recognition (ICIAR), LNCS, volume 3211, pages 285-294, Porto., September 2004. • Dikici, C., Alp, U., Ayaz, H., Karadeniz, M., Civanlar, R. and Bozma, I. «Fovea based Real-Time Video Processing and Streaming». Proc. of Signl Processing and Applications Conference (SIU), Istanbul , 2003 [in turkish]. • Alp, U., Ayaz, H., Karadeniz, M., Dikici, C. and Bozma, I. «Remote Control of a Robot over the Internet.». Proc. of Signl Processing and Applications Conference (SIU), Istanbul , 2003 [in turkish]. • Sarac, I., Dikici, C. and Sankur, B. «New framing protocol for IP over SONET / SDH». Proc. of 1 st Communication Conference, Ankara, 2001 [in turkish]. - 134 - Appendix D Cited Author Index This index lists the names of all authors cited in the references of this dissertation. It is designed to enable the reader to locate the chapters of this book in which the work of specific authors is discussed. Entries refer the reader to a page number. Thus, the entry “Cheng, S. 49, 50, 53” means that Cheng, S. is cited in pages 49, 50 and 53 respectively. Aaron, A. M. xi, 3, 47–50, 53, 54 Cheng, S. 49, 50, 53 Acikel, O. F. 4, 47 Chiang, M. 4, 66, 85–87, 101, 107, 122 Amraoui, A. 51, 75 Chou, J. 4, 47, 66, 101 Anderson, J. B. 4, 47 Chung, S. Y. 4, 26, 47, 51, 75 Cocke, J. 4, 22, 29 Bahl, L. 4, 22, 29 Cohen, A. S. 65 Bajcsy, J. 47 Costa, M. 3, 5, 7, 20, 64, 84, 85, 88, 101, 122 Baskurt, A. 38, 54, 101 Bauml, R. 66 Cover, T. M. 4, 14, 43, 66, 85–87, 101, 107, 122 Benedetto, S. 4, 47 Cox, I. J. 20, 64–66, 68, 73, 123 Bennatan, A. 67, 73 Craver, S. 61 Berger, T. 43 Dikici, C. 38, 54, 101 Berrou, C. 4, 47 Divsalar, D. 4, 47 Bilgin, A. 69 Doberty, L. 45, 50 Boliek, M. P. 69 Doërr, G. J. 66, 68, 73, 123 Boutros, J. 75 Dragotti, P. L. 50 Burshtein, D. 67, 73 Duhamel, P. 84, 88, 91 Caire, G. 67, 73, 75 Eggers, J. J. 4, 66, 101 Chen, B. 20, 64–66 Erez, U. 65, 67, 91 Bastug, A. 66 - 135 - Cited Author Index Ficher, T. R. 21, 49, 74 Levický, D. 70 Foriš, P. 70 Liu, Z. 49 Forney, G. D. Jr. 4, 47, 74 Liveris, A. D. xi, 47–50, 53, 54 Gallager, R. G. 4, 25, 47 Mackay, D. J. C. 4, 25, 47 Gamal, A. E. 84, 88 Majumbar, A. 50 Garcia-Frias, J. 47 Maor, A. 101 Gehrigand, N. 50 Marcellin, M. W. 21, 49, 69, 74 Gel’fand, S. I. 3, 64, 84, 101, 122 McKellips, A. L. 20, 65 Georghiades, C. N. xi, 47–50, 53, 54 Memon, N 62 Girod, B. xi, 3, 4, 47–50, 53, 54, 61, 66, 101 Merhav, N. 101 Glavieux, A. 4, 47 Mihcak, M. K. 61 Gormish, M. J. 69 Miller, M. L. 20, 64–66, 68, 73, 123 Guermazi, R. 38 Mitran, P. 47 Guillemot, C. 49, 53 Montorsi, G. 4, 47 Hartung, F. 60, 61 Moulin, P. 61, 65, 84, 87, 101, 108 Heegard, C. 84, 88 Holliman, M. 62 Idrissi, K. 38, 54, 101 Ishwar, P. 50 Mihcak, K. M. 93, 101 Narayanan, F 49 Neal, R. M. 4, 25, 47 Oohama, Y. 112 O’Sullivan, J. A. 65 Jelinek, F. 4, 22, 29 Ozonat, K. 54 Kerckhoffs, A. 61 Pearl, J. 27 Klein Gunnewiek, R. 50 Pereira, S. 62 Koval, O. 93, 101 Pérez-González, F. 93, 101 Kuhn, M. 62 Petitcolas, F. A. P. 61, 62, 72 Kusuma, J. 45, 50 Pinsker, M. S. 3, 64, 84, 101, 122 Kutter, M. 60 Pollara, F. 4, 47 Lagendijk, R. L. 50 Lajnef, K. xi, 47, 49, 53, 54 Lan, C. F. 49 Lapidoth, A. 65 Pradhan, S. S. 4, 45, 47–50, 66, 101 Pun, T. 93, 101 Puri, R. 50 Le Gall, D. 55, 69 Ramchandran, K. 3, 4, 45, 47–50, 66, 101 Le Guelvouit, G. 66 Rane, S. 50 - 136 - Cited Author Index Raviv, J. 4, 22, 29 Urbanke, R. L. 4, 26, 30, 47, 51, 75 Rebollo-Monedero, D. 49, 50 Varodayan, D. 48 Richardson, T. J. 4, 26, 30, 47 Villasenor, J. 70 Ryan, W. E. 4, 47 Viterbi, A. 22 Salehi, M. 84 Voloshynovskiy, S. 93, 101 Sankur, B. 66 Schonberg, D. 48 Wang, Y 84, 87, 101, 108 Setton, E. 49, 50 Watson, A. B. 67, 68, 70 Shamai, S. 49, 65, 67, 73, 91, 101 Westerlaken, R. P. 50 Shannon, C. E. 2, 84 Wolf, J. 3, 6, 19, 38, 39, 45, 101, 108, 122 Shokrollahi, M. A. 4, 26, 30, 47 Wornell, G. W. 20, 64–66 Siohan, P. 49, 53 Wyner, A. 3, 6, 38, 41–43, 45, 101, 108, 112, 122 Slepian, D. 3, 6, 19, 38, 39, 45, 101, 108, 122 Solomon, J. A. 70 Xiong, Z. xi, 47–50, 53, 54 Stankovic, V. 50 Yang, E. H. 101 Stone, H. S. 61, 62 Yang, G. Y. 70 Su, J. K. 4, 61, 101 Yang, Y. 50 Sun, W. 101 Yeo, B. L. 61 Tabatabai, A. 55, 69 Yeung, M. M. 61 Tepe, K. E. 4, 47 Thitimajshima, P. 4, 47 Thomas, J. A. 14, 43 Tzschoppe, R. 66 Ungerboeck, G. 21, 45, 74 Zaidi, A. 84, 88, 91 Zamir, R. 49, 65, 67, 91 Zhao, Y. 47 Zhu, X 50 Ziv, J. 3, 6, 38, 41–43, 101, 108, 112, 122 - 137 - Cited Author Index - 138 - Bibliography Aaron, A. M. and Girod, B. «Compression with Side Information Using Turbo Codes». In DCC ’02: Proceedings of the Data Compression Conference (DCC ’02), page 252. IEEE Computer Society, Washington, DC, USA. 2002. Aaron, A. M., Setton, E. and Girod, B. «Towards practical Wyner-Ziv coding of video». In Proceedings of the IEEE Image Processing, ICIP , volume 2,3, pages 869–872. 2003. Acikel, O. F. and Ryan, W. E. «Punctured turbo-codes for BPSK/QPSK channels». IEEE Trans. Commun., 47(9):1315–1323. 1997. Amraoui, A., Chung, S. Y. and Urbanke, R. L. «LTHC: Ldpcopt.» http: //lthcwww.epfl.ch/research/ldpcopt/. Access Date: Oct 2007. 2003. Bahl, L., Cocke, J., Jelinek, F. and Raviv, J. «Optimal decoding of linear codes for minimizing symbol error rate (Corresp.)». IEEE Trans. Inform. Theory, 20(2):284–287. 1974. Bajcsy, J. and Mitran, P. «Coding for the slepian-wolf problem with turbo codes». In GlobeCom’01, San Antonio. 2001a. Bajcsy, J. and Mitran, P. «Design of fractional rate FSM encoders using Latin squares». In IEEE Int. Symp. Inform. Theory - Recent Results Session, Washington. 2001b. Bastug, A. and Sankur, B. «Improving the payload of watermarking channels via LDPC coding». IEEE Signal Processing Lett., 11(2):90–92. 2004. Benedetto, S., Divsalar, D., Montorsi, G. and Pollara, F. «Serial concatenation of interleaved codes: Performance analysis, design, and iterative decoding». IEEE Trans. Inform. Theory, 44(5):909–926. 1998. Bennatan, A., Burshtein, D., Caire, G. and Shamai, S. «Superposition coding for side-information channels». IEEE Trans. Inform. Theory, 52(5):1872–1889. 2006. - 139 - BIBLIOGRAPHY Berger, T. Rate-Distortion Theory: A mathematical basis for data compression. Prentice-Hall. 1971. Berrou, C. and Glavieux, A. «Near optimum error correcting coding and decoding: turbo-codes». IEEE Trans. Commun., 44(6):1261–1271. 1996. Berrou, C., Glavieux, A. and Thitimajshima, P. «Near Shannon limit errorcorrecting coding and decoding: Turbo-Codes». In IEEE International Conference on Communications, Geneve. 1993. Boutros, J. and Caire, G. «Iterative multiuser joint decoding: United framework and asymptotic analysis». IEEE Trans. Inform. Theory, 48(7):1772–1793. 2002. Chen, B. and Wornell, G. W. «Digital watermarking and information embedding using dither modulation». In IEEE Second Workshop on Multimedia Signal Processing, pages 273–278. 1998. Chen, B. and Wornell, G. W. «Provably robust digital watermarking». In SPIE: Multimedia Systems and Applications II (part of Photonics East 99), Boston, volume 3845, pages 43–54. 1999. Chen, B. and Wornell, G. W. «Quantization index modulation: A class of provably good methods for digital watermarking and information embedding». IEEE Trans. Inform. Theory, 47(5):1423–1443. 2001. Chou, J., Pradhan, S. S. and Ramchandran, K. «A robust blind watermaking scheme based on distributed source coding principles». In ACM Multimedia, pages 49–56. 2000. Chou, J., Pradhan, S. S. and Ramchandran, K. «Turbo and trellis-based constructions for source coding with side information». In IEEE Data Compression Conf. (DCC), Snowbird, UT . 2003. Chung, S. Y. On the Construction of Some Capacity-Approaching Coding Schemes.. Ph.D. thesis, MA: MIT Press. 2000. Chung, S. Y., Forney, G. D. J., Richardson, T. J. and Urbanke, R. L. «On the design of Low-Density Parity-Check codes within 0.0045dB of the Shannon limits». IEEE Commun. Lett., 5(2):58–60. 2001a. Chung, S. Y., Richardson, T. J. and Urbanke, R. L. «Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approximation». IEEE Trans. Inform. Theory, 47(2):657–670. 2001b. Cohen, A. S. and Lapidoth, A. «The Gaussian watermarking game». IEEE Trans. Inform. Theory, 48(6):1639–1667. 2002. Costa, M. «Writing on dirty paper (Corresp.)». IEEE Trans. Inform. Theory, 29(3):439–441. 1983. - 140 - BIBLIOGRAPHY Cover, T. M. and Chiang, M. «Duality between channel capacity and rate distortion with two-sided state information». IEEE Trans. Inform. Theory, 48(6):1629– 1638. 2002. Cover, T. M. and Thomas, J. A. Elements of information theory. WileyInterscience, New York, NY, USA. 1991. Cox, I. J. and Miller, M. L. «The First 50 Years of Electronic Watermarking». EURASIP Journal on Applied Signal Processing, 2002(2):126–132. Doi:10.1155/S1110865702000525. 2002. Cox, I. J., Miller, M. L. and McKellips, A. L. «Watermarking as communications with side information». Proceedings of the IEEE (USA), 87(7):1127–1141. 1999. Craver, S., Memon, N., Yeo, B. L. and Yeung, M. M. «Resolving Rightful Ownerships with Invisible Watermarking Techniques: Limitations, Attacks, and Implications». IEEE Journal on Selected Areas in Communications, 16(4):573– 586. 1998. Dikici, C., Guermazi, R., Idrissi, K. and Baskurt, A. «Distributed Source Coding of Still Images». In Proc. of European Signal Processing Conf. EUSIPCO, Antalya. 2005. Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding of Still Images». In Proc. of European Signal Processing Conf. EUSIPCO, Florence. 2006a. Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding with Partially Available Side Information». In Proc. of SPIE Electronic Imaging, volume 6072. 2006b. Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage Informé pour le Codage Distribué». In Proc. of CORESA. 2006c. Eggers, J. J., Bauml, R., Tzschoppe, R. and Girod, B. «Scalar costa scheme for information embedding». IEEE Trans. Signal Processing, 51(4):1003–1019. 2003. Erez, U., Shamai, S. and Zamir, R. «Capacity and lattice strategies for canceling known interference». IEEE Trans. Inform. Theory, 51(11):3820–3833. 2005. Forney, G. D. J. and Ungerboeck, G. «Modulation and coding for linear Gaussian channels». IEEE Trans. Inform. Theory, 44(6):2384–2415. 1998. Gallager, R. G. Low-Density Parity-Check Codes.. Ph.D. thesis, MA: MIT Press. 1963. - 141 - BIBLIOGRAPHY Garcia-Frias, J. and Zhao, Y. «Compression of correlated binary sources using turbo codes». IEEE Commun. Lett., 5(10):417–419. 2001. Garcia-Frias, J. and Zhao, Y. «Compression of binary memoryless sources using punctured turbo codes». IEEE Commun. Lett., 6(9):394–396. 2002. Gehrigand, N. and Dragotti, P. L. «Distributed Compression in Camera Sensor Network». In IEEE International Workshop on Multimedia Signal Processing, Siena, Italy. 2004. Gel’fand, S. I. and Pinsker, M. S. «Coding for Channel with Random Parameters». Prob. Contr. Inform. Theory, 9(1):19–31. 1980. Girod, B., Aaron, A. M., Rane, S. and Rebollo-Monedero, D. «Distributed video coding». In Special Issue on Video Coding and Delivery, Proceedings of the IEEE , volume 93, pages 71–83. 2005. Hartung, F. and Kutter, M. «Multimedia watermarking techniques». Proc. IEEE , 87(7):1079–1107. 1999. Hartung, F., Su, J. K. and Girod, B. «Spread Spectrum Watermarking: Malicious Attacks and Counterattacks». In SPIE Electronic Imaging, Security and Watermarking of Multimedia Contents, pages 147–158. 1999. Heegard, C. and Gamal, A. E. «On the capacity of computer memory with defects». IEEE Trans. Inform. Theory, 29(5):731–739. 1983. Holliman, M. and Memon, N. «Counterfeiting attacks on oblivious block-wise independentinvisible watermarking schemes». IEEE Trans. Image Processing, 9(3):432–441. 2000. Kerckhoffs, A. «La cryptographie militaire». Journal des sciences militaires, 9(1):5–38. 1883. Kuhn, M. and Petitcolas, F. A. P. «Stirmark». http://www.petitcolas.net/ fabien/watermarking/stirmark/. Access Date: Oct 2007. 2000. Kusuma, J., Doberty, L. and Ramchandran, K. «Distributed compression for sensor networks». In IEEE Intl. Conf. on Image Processing (ICIP),Thessaloniki, Greece, volume 1, pages 82–85. 2001. Lajnef, K. Etude du codage de sources distribuées pour de nouveaux concepts en compression vidéo. Ph.D. thesis, Thèse de doctorat en Traitement du Signal, Université de Rennes 1. 2006. Lajnef, K., Guillemot, C. and Siohan, P. «Distributed coding of three binary and Gaussian correlated sources using punctured turbo codes». EURASIP Journal on Applied Signal Processing, 86(11):3131–3149. ISSN 0165-1684. 2006. - 142 - BIBLIOGRAPHY Le Gall, D. and Tabatabai, A. «Sub-band coding of digital images using symmetric short kernel filters and arithmetic coding techniques». In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages 761–764. 2000. Le Guelvouit, G. «Trellis-coded quantization for public-key watermarking». In IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 2005. Levický, D. and Foriš, P. «Human Visual System Models in Digital Image Watermarking». In Radioengineering, volume 13, pages 38–43. 2004. Liu, Z., Cheng, S., Liveris, A. D. and Xiong, Z. «Slepian-Wolf Coded Nested Lattice Quantization for Wyner-Ziv Coding: High-Rate Performance Analysis and Code Design». IEEE Trans. Inform. Theory, 52(10):4358–4379. 2006. Liveris, A. D., Lan, C. F., Narayanan, F., Xiong, Z. and Georghiades, C. N. «Slepian-Wolf coding of three binary sources using LDPC codes». In International Symposium on Turbo Codes and Related Topics. 2003a. Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Compression of binary sources with side information at the decoder using LDPC codes». IEEE Commun. Lett., 6(10):440–442. 2002a. Liveris, A. D., Xiong, Z. and Georghiades, C. N. «A Distributed Source Coding Technique For Highly Correlated Images Using Turbo-Codes». In IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Orlando. 2002b. Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Distributed compression of binary sources using conventional parallel and serial concatenated convolutional codes». In IEEE Data Compression Conf. (DCC), Snowbird, UT . 2003b. Mackay, D. J. C. Information Theory, Inference, and Learning Algorithms. Cambridge University Press. 2003. Mackay, D. J. C. and Neal, R. M. «Near Shannon limit performance of low density parity check codes». Electronics Letters, 33(6):457–458. 1997. Maor, A. and Merhav, N. «On Joint Information Embedding and Lossy Compression». IEEE Trans. Inform. Theory, 51(8):2998–3008. 2005. Marcellin, M. W. and Ficher, T. R. «Trellis Coded Quantization of Memoryless and Gauss-Markov Sources». IEEE Trans. Commun., 38(1):82–93. 1990. Marcellin, M. W., Gormish, M. J., Bilgin, A. and Boliek, M. P. «An Overview of JPEG-2000». In Data Compression Conference, pages 523–544. 2000. Merhav, N. and Shamai, S. «On joint source-channel coding for the WynerZiv source and the Gel’fand-Pinsker channel». IEEE Trans. Inform. Theory, 49(11):2844–2855. 2003. - 143 - BIBLIOGRAPHY Miller, M. L., Doërr, G. J. and Cox, I. J. «Applying informed coding and embedding to design a robust high-capacity watermark». IEEE Trans. Image Processing, 13(6):792–807. 2004. Moulin, P. and Mihcak, M. K. «The parallel-Gaussian watermarking game». IEEE Trans. Inform. Theory, 50(2):272–289. 2004. Moulin, P. and O’Sullivan, J. A. «Information-Theoretic Analysis of Information Hiding». IEEE Trans. Inform. Theory, 49(3):563–593. 2003. Moulin, P. and Wang, Y. «Capacity and Random-Coding Exponents for Channel Coding With Side Information». IEEE Trans. Inform. Theory, 53(4):1326–1347. 2007. Oohama, Y. «Gaussian Multiterminal Source Coding». IEEE Trans. Inform. Theory, 43(6):1912–1923. 1997. Ozonat, K. «Lossless distributed source coding for highly correlated still images». 2000. Pearl, J. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible Inference. Morgan Kaufmann. 1988. Pereira, S. «Checkmark». http://watermarking.unige.ch/Checkmark. Access Date: Oct 2007. 2001. Petitcolas, F. A. P. «Watermarking schemes evaluation». IEEE Trans. Signal Processing, 17(5):58–64. 2000. Pradhan, S. S., Chou, J. and Ramchandran, K. «Duality between source coding and channel coding and its extension to the side information case». IEEE Trans. Inform. Theory, 49(5):1181–1203. 2003. Pradhan, S. S., Kusuma, J. and Ramchandran, K. «Distributed compression in a dense micro-sensor network». IEEE Signal Processing Mag., 19(3):51–60. 2002. Pradhan, S. S. and Ramchandran, K. «Distributed Source Coding Using Syndromes (DISCUS): Design and Construction». In DCC ’99: Proceedings of the Conference on Data Compression, page 158. IEEE Computer Society, Washington, DC, USA. 1999. Pradhan, S. S. and Ramchandran, K. «Distributed source coding: Symmetric rates and applications to sensor networks». In IEEE Data Compression Conf. (DCC), Snowbird, UT . 2000. Puri, R., Majumbar, A., Ishwar, P. and Ramchandran, K. «Distributed video coding in wireless sensor networks». IEEE Signal Processing Mag., 23(4):94–106. 2006. - 144 - BIBLIOGRAPHY Puri, R. and Ramchandran, K. «PRISM: A new robust video architecture based on distributed compression principles». In Allerton Conf. Communication Control, and Computing, Allerton, IL. 2002. Rebollo-Monedero, D. and Girod, B. «Design of optimal quantizers for distributed coding of noisy sources». In IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Philadelphia. 2005. Richardson, T. J., Shokrollahi, M. A. and Urbanke, R. L. «Design of capacity-approaching irregular low-density parity-check codes». IEEE Trans. Inform. Theory, 47(2):619–637. 2001. Richardson, T. J. and Urbanke, R. L. «The capacity of Low-Density ParityCheck codes under message-passing decoding». IEEE Trans. Inform. Theory, 47(2):599–618. 2001a. Richardson, T. J. and Urbanke, R. L. «Efficient encoding of low-density paritycheck codes». IEEE Trans. Inform. Theory, 47(2):638–656. 2001b. Salehi, M. «Capacity and Coding for Memories with Real-Time Noisy Defect Information at Encoder and Decoder». In Proceedings of the IEE Communication, Speech and Vision, volume 139, pages 113–117. 1992. Schonberg, D., Pradhan, S. S. and Ramchandran, K. «LDPC Codes Can Approach the Slepian-Wolf Bound for General Binary Sources». In 40th Allerton Conf. Communication Control, and Computing, Allerton, IL, pages 576–585. 2002. Shannon, C. E. «Channels with side information at the transmitter». In IBM J. of Research and Development, volume 2, pages 289–293. 1958. Shannon, C. E. «Coding theorems for a discrete source with a fidelity criterion». In IRE Nat. Conv. Rec., Pt. 4 , pages 142–163. 1959. Slepian, D. and Wolf, J. «Noiseless coding of correlated information sources». IEEE Trans. Inform. Theory, 19(4):471–480. 1973. Stankovic, V., Yang, Y. and Xiong, Z. «Distributed Source Coding for Multimedia Multicast Over Heterogeneous Networks». IEEE Journal of Selected Topics in Signal Processing, 1(2):220–230. 2007. Stone, H. S. «Analysis of attacks on image watermarks with randomized coefficients». 1996. Su, J. K., Eggers, J. J. and Girod, B. «Illustration of the duality between channel coding and rate distortion with side information». In 34th Asilomar Conf Signals, Systems and Computers, Pacific Grove, CA, USA, Oct. 29-Nov. 1 . 2000. - 145 - BIBLIOGRAPHY Tepe, K. E. and Anderson, J. B. «Turbo codes for binary symmetric and binary erasure channels». In IEEE International Symposium on Information Theory, page 59. 1998. Ungerboeck, G. «Channel Coding with Multilevel/Phase Signals». IEEE Trans. Inform. Theory, 28(1):55–67. 1982. Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive distributed source coding using low-density parity-check codes». In 39th Asilomar Conf Signals, Systems and Computers, Pacific Grove, CA, USA. 2005. Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive codes for distributed source coding». EURASIP Journal on Applied Signal Processing, 86(11):3123–3130. 2006. Viterbi, A. «Error bounds for convolutional codes and an asymptotically optimum decoding algorithm». IEEE Trans. Inform. Theory, 13(2):260–269. 1967. Voloshynovskiy, S., Koval, O., Pérez-González, F., Mihcak, K. M. and Pun, T. «Data-hiding with host state at the encoder and partial side information at the decoder». URL http://vision.unige.ch/publications/postscript/ 2005/VoloshynovskiyKovalPerezGonzalezMihcakPun_SP2005.pdf, (preprint). 2004. Watson, A. B. «DCT quantization matrices visually optimized for individual images». In SPIE Human Vision, Visual Processing, and Digital Display IV , volume 1913, pages 202–216. 1993. Watson, A. B., Yang, G. Y., Solomon, J. A. and Villasenor, J. «Visibility of wavelet quantization noise». IEEE Trans. Image Processing, 6(8):1164–1175. 1997. Westerlaken, R. P., Klein Gunnewiek, R. and Lagendijk, R. L. «TurboCode Based Wyner-Ziv Video Compression». In Twenty-sixth Symposium on Information Theory in the Benelux , pages 113–120. 2005. Wyner, A. «Recent results in the Shannon theory». IEEE Trans. Inform. Theory, 20(1):2–10. 1974. Wyner, A. and Ziv, J. «The rate-distortion function for source coding with side information at the decoder». IEEE Trans. Inform. Theory, 22(1):1–10. 1976. Xiong, Z., Liveris, A. D. and Cheng, S. «Distributed source coding for sensor networks». IEEE Signal Processing Mag., 21(5):80–94. 2004. Yang, E. H. and Sun, W. «Combined Source Coding and Watermarking». In Information Theory Workshop, Proceedings of the IEEE , pages 322–326. 2006. - 146 - BIBLIOGRAPHY Zaidi, A. and Duhamel, P. «On coding with a partial knowledge of the state information». In Proceedings of the IEEE 39th Asilomar conference on Signals, Systems and Computers, pages 657–661. 2005. Zamir, R. and Shamai, S. «Nested linear/ lattice codes for Wyner-Ziv encoding». In IEEE Information Theory Workshop, Killarney, Ireland , pages 92–93. 1998. Zamir, R., Shamai, S. and Erez, U. «Nested linear/lattice codes for structured multiterminal binning». IEEE Trans. Inform. Theory, 48(6):1250–1276. 2002. Zhu, X., Aaron, A. M. and Girod, B. «Distributed compression for large camera arrays». In IEEE Workshop on Statistical Signal Processing, St Louis, Missouri . 2003. - 147 - BIBLIOGRAPHY - 148 - INSA de LYON Informed Watermarking and Compression of Multi-Sources Technological advances in the fields of telecommunications, multimedia and the diverse choice of portable handhelds during the last decade, derive to create novel services such as sharing of multimedia content, video-conferencing or content protection, where all running on low-power devices. Hence alternative low complexity coding techniques need to be developed for replacing conventional ones. Coding with state information, a potential solution to shifting the encoder complexity to the decoder, has two main applications: 1)Distributed Source Coding(DSC) for compressing a source given a correlated version of it is available only to the decoder. 2)Informed Data Hiding(IDH) for embedding a watermark to a host signal where the host signal is available only to the encoder. For each problem stated above, practical code designs that operate close to the theoretical limits are proposed. The combination of good error correcting codes such as Low Density Parity-Check (LDPC) Codes and good quantization codes such as Trellis Coded Quantization (TCQ) are used at the design of the proposed capacity approaching codes. Moreover, the theoretical achievable rate limits for a relaxed IDH setup, such that a noisy observation of the host signal is available to the decoder is derived. Finally, motivated by the strong duality between DSC and IDH, a hybrid scheme that uses both data hiding and compression is proposed. In addition to the derivation of theoretical channel capacity and rate distortion function, a complete framework is proposed. Keywords: Coding with State Information, Compression, Watermarking, Distributed Source Coding, Writing on Dirty Paper, Low Density Parity Check Codes, Trellis Coded Quantization. Tatouage informé et Compression Multi-sources Les avancées technologiques qu’ont connu les télécommunications, le multimédia et les systèmes mobiles ont ouvert la porte à l’émergence, puis au développement de nouveaux services tels que le partage de bases de donées multimedia, la vidéo-conférence ou la protection des contenus, tout en utilisant des systèmes à faible puissance. D’où la nécessité de disposer de nouvelles techniques de codage à complexité réduite. Les techniques de codage exploitant la présence d’une information parallèle peuvent constituer une solution potentielle permettant de déporter la complexité de codage vers le décodeur. Celles-ci s’appliquent notamment à deux principes de codage : 1) Le codage de source distribué (Distributed Source Coding DSC ) pour compresser un signal donné, sachant qu’un autre signal corrélé à celui d’origine est disponible au niveau du décodeur. 2) La dissimulation de données informée (Informed Data Hiding IDH ) permettant d’insérer un message dans un signal hôte, ce dernier n’étant connu qu’au codeur. Pour chacune de ces deux techniques, nous proposons des solutions qui approchent les limites théoriques. Nous combinons pour cela des techniques performantes tant de codage canal, de type LDPC, que de quantification de type Treillis (TCQ). Par ailleurs, nous étudions les limites théoriques pouvant être atteintes par IDH, dans le cas où une version bruitée du signal hôte est disponible au décodeur. Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons un schéma pratique hybride complet mettant en œuvre les deux techniques, ainsi qu’une étude théorique de la fonction débit / distorsion et de la capacité d’un tel système. Mots clés : Codage avec information adjacente, compression, tatouage, codage de sources distribuées, LDPC, TCQ. Laboratoire d’InfoRmatique en Images et Systèmes d’information, UMR 5205 CNRS