Informed watermarking and compression of multi

Transcription

Informed watermarking and compression of multi
INSA de Lyon
N ◦ 07 ISAL 0093
2007
THESIS
presented to obtain the degree of
Doctor of Philosophy
in Computer Science
A dissertation presented by
Çağatay Dikici
December 3, 2007
Informed Watermarking and
Compression of Multi-Sources
prepared at LIRIS
under the supervision of
Atilla Baskurt and Khalid Idrissi
The thesis jury is composed of:
Reviewers:
M.
M.
Jean Marc Chassery
Bülent Sankur
(Directeur de Recherche CNRS)
(Professeur)
Examiners:
Mme.
M.
M.
M.
M.
M.
Christine Guillemot
Fabrice Meriaudeau
William Puech
Florent Dupont
Khalid Idrissi
Atilla Baskurt
(Directrice de Recherche INRIA)
(Professeur)
(Maître de conférences, HDR)
(Maître de conférences, HDR)
(Maître de conférences)
(Professeur)
Abstract
Informed Watermarking and Compression of Multi-Sources, (December 2007)
Çağatay Dikici, B.S., Bogazici University;
M.S., Bogazici University;
Technological advances in the fields of telecommunications, multimedia and the
diverse choice of portable handhelds during the last decade, derive to create novel
services such as sharing of multimedia content, video-conferencing or content protection, where all running on low-power devices. Hence alternative low complexity
coding techniques need to be developed for replacing conventional ones. Coding
with state information, a potential solution to shifting the encoder complexity to the
decoder, has two main applications:
1)Distributed Source Coding(DSC) for compressing a source given a correlated
version of it is available only to the decoder.
2)Informed Data Hiding(IDH) for embedding a watermark to a host signal where
the host signal is available only to the encoder.
For each problem stated above, practical code designs that operate close to the
theoretical limits are proposed. The combination of good error correcting codes such
as Low Density Parity-Check (LDPC) Codes and good quantization codes such as
Trellis Coded Quantization (TCQ) are used at the design of the proposed capacity
approaching codes.
Moreover, the theoretical achievable rate limits for a relaxed IDH setup, such
that a noisy observation of the host signal is available to the decoder is derived.
Finally, motivated by the strong duality between DSC and IDH, a hybrid scheme
that uses both data hiding and compression is proposed. In addition to the derivation
of theoretical channel capacity and rate distortion function, a complete framework
is proposed.
Keywords: Coding with State Information, Compression, Watermarking, Distributed Source Coding, Writing on Dirty Paper, Low Density Parity Check Codes,
Trellis Coded Quantization.
-i-
- ii -
Résumé
Tatouage informé et Compression Multi-Sources, (Décembre 2007)
Çağatay Dikici, B.S., Bogazici University;
M.S., Bogazici University;
Les avancées technologiques qu’ont connu les télécommunications, le multimédia
et les systèmes mobiles ont ouvert la porte à l’émergence, puis au développement
de nouveaux services tels que le partage de bases de donées multimedia, la vidéoconférence ou la protection des contenus, tout en utilisant des systèmes à faible puissance. D’où la nécessité de disposer de nouvelles techniques de codage à complexité
réduite. Les techniques de codage exploitant la présence d’une information parallèle peuvent constituer une solution potentielle permettant de déporter la complexité
de codage vers le décodeur. Celles-ci s’appliquent notamment à deux principes de
codage :
1) Le codage de source distribué (Distributed Source Coding DSC ) pour compresser un signal donné, sachant qu’un autre signal corrélé à celui d’origine est
disponible au niveau du décodeur.
2) La dissimulation de données informée (Informed Data Hiding IDH ) permettant
d’insérer un message dans un signal hôte, ce dernier n’étant connu qu’au codeur.
Pour chacune de ces deux techniques, nous proposons des solutions qui approchent les limites théoriques. Nous combinons pour cela des techniques performantes tant de codage canal, de type LDPC, que de quantification de type Treillis
(TCQ). Par ailleurs, nous étudions les limites théoriques pouvant être atteintes par
IDH, dans le cas où une version bruitée du signal hôte est disponible au décodeur.
Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons un
schéma pratique hybride complet mettant en œuvre les deux techniques, ainsi qu’une
étude théorique de la fonction débit / distorsion et de la capacité d’un tel système.
Mots clés : Codage avec information adjacente, compression, tatouage, codage de
sources distribuées, LDPC, TCQ.
- iii -
- iv -
Acknowledgements
to my family
First of all, I would like to express my deepest sense of gratitude to my supervisors
Prof. Atilla Baskurt and Khalid Idrissi for their patient guidance, encouragement
and excellent advices throughout this study.
I am grateful to Prof. Christine Guillemot for her enthusiasm, sharing her fruitful ideas on information theory, her valuable assistance for maturing my theoretical
foundation on source-channel coding, and her hospitality during our collaboration.
I would have special thanks to Caroline Fontaine for her advice and valuable discussions.
I am thankful to my thesis reviewers Prof. Bülent Sankur and Prof. Jean-Marc
Chassery. They provided me a critical reading, valuable suggestions and constructive
remarks which have been very important for the improvement of this dissertation.
I would like to thank my other committee members Mr. Fabrice Meriaudeau, Mr.
William Puech and Mr. Florent Dupont.
All LIRIS members, especially the 3 stimulator: the judo zen Guillaume Lavoué,
the sailor Julien Ricard, and the theater boy Nicolas Zlatoff. I would like to thank
my interns Benoît, Damien, David et Stephane. Also la Migraine team: Rémi, Greg,
Elise, Fab, Antho and Claris for their weekly motivation.
Finally, great thanks to Laurent, Eléonore and my family who encourage me to
finalize this dissertation.
-v-
- vi -
Contents
I
Problem Statement and Preliminaries
Introduction
1
1 Preliminaries
II
xvii
13
1.1
Notations and Conventions . . . . . . . . . . . . . . . . . . . . . . .
14
1.2
Entropy and Mutual Information . . . . . . . . . . . . . . . . . . . .
15
1.3
Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
1.4
Source Coding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.5
Channel Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.6
Distributed Source Coding . . . . . . . . . . . . . . . . . . . . . . . .
19
1.7
Writing on Dirty Paper
. . . . . . . . . . . . . . . . . . . . . . . . .
20
1.8
Message Passing Algorithm . . . . . . . . . . . . . . . . . . . . . . .
20
1.9
Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . . . . . . .
21
1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . . . . . . . .
25
1.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Contributions
35
2 Distributed Source Coding
37
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
2.2
Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.3
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
2.4
Practical Code Design . . . . . . . . . . . . . . . . . . . . . . . . . .
50
2.5
Practical Application for Still-Image Coding . . . . . . . . . . . . . .
54
2.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
3 Informed Data Hiding
59
- vii -
CONTENTS
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
3.2
Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.3
Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
3.4
Proposed Scheme-1: Extension to Cox Miller . . . . . . . . . . . . .
68
3.5
Proposed Scheme-2: Superposition Coding . . . . . . . . . . . . . . .
73
3.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
4 Dirty Paper Coding with Partial State Information
83
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
4.2
Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
4.3
Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
4.4
Capacity/rate gain / loss analysis . . . . . . . . . . . . . . . . . . . .
92
4.5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
5 Data Hiding and Distributed Source Coding
99
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2
Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3
Contribution 1: Capacity Analysis for Multivariate Gaussian Source
5.4
Contribution 2: Practical Code Design . . . . . . . . . . . . . . . . . 115
5.5
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Conclusion
108
121
A Achievable Rate Region Calculations for two partially side information known to the encoder and decoder respectively
127
A.1 Derivation of the Achievable Rate Region . . . . . . . . . . . . . . . 127
A.2 Maximization of the Rate . . . . . . . . . . . . . . . . . . . . . . . . 128
A.3 Entropy of Multivariate Gaussian Distribution . . . . . . . . . . . . . 129
B Codes and Degree Distributions for Generating LDPC Matrices
131
B.1 Degree Distributions of rate 2/3 code, for 2 : 1 compression rate in DSC131
B.2 Degree Distribution of rate 1/2 code, for Informed Data Hiding . . . 132
C Publications of the author
133
D Cited Author Index
135
- viii -
List of Figures
1
A multimedia communication setup for a low-power device which has
data hiding and efficient compression capability. . . . . . . . . . . . .
1
2
A point to point source-channel coding setup. . . . . . . . . . . . . .
2
3
Coding with state information. . . . . . . . . . . . . . . . . . . . . .
3
4
Coding of two correlated sources. . . . . . . . . . . . . . . . . . . . .
6
5
Costa’s “Writing on Dirty Paper” setup. . . . . . . . . . . . . . . . .
7
6
Channel Coding with state informations. . . . . . . . . . . . . . . . .
8
7
Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . .
10
8
Chapter Dependencies of this dissertation. . . . . . . . . . . . . . . .
12
1.1
The Venn diagram of the relationship between entropy and mutual
information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2
Binary entropy function H(a) versus a. . . . . . . . . . . . . . . . . .
18
1.3
Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a. 18
1.4
A compression system. . . . . . . . . . . . . . . . . . . . . . . . . . .
19
1.5
A communication system. . . . . . . . . . . . . . . . . . . . . . . . .
19
1.6
Counting problem on a straight line. . . . . . . . . . . . . . . . . . .
21
1.7
A 1/2 recursive systematic convolution code with memory 2 and generator matrix (3, 5) in octal digits. . . . . . . . . . . . . . . . . . . .
22
State transition of the recursive systematic convolutional code (1, 3)
in octal digits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
Output points and corresponding partitions for 2 bits per sample. . .
23
1.10 Viterbi decoding of a vector with length 4. . . . . . . . . . . . . . . .
24
1.11 Bipartite graph representation of the parity check matrix H. . . . . .
27
1.12 Belief propagation on bipartite graph H. . . . . . . . . . . . . . . . .
28
1.13 Performance comparison of the error rates of (3, 6) regular LDPC code,
turbo code and optimized irregular LDPC code. The channel is Binary
input, additive white gaussian noise. . . . . . . . . . . . . . . . . . .
31
1.8
1.9
- ix -
LIST OF FIGURES
c 2007 piyalemadra.com,
1.14 LDPC coding example. Cartoon copyright used with permission. (a) Original binary cartoon with size 100 × 100
with 0s correspond to white and 1s correspond to black pixels. The
ratio between the number of black pixels and the number of total pixels
is 0.2445. (b) Visualization of the cartoon coded with 1/2 systematic
LDPC code such that the output of the encoder contains the original
image and its parity checks with size 100 × 100. (c) Throughout the
transmission, both the cartoon and its parity check bits are exposed
to bit errors such that the error probability of a received bit is 0.07. .
32
1.15 LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c)
The original cartoon is decoded without any error after 10 iterations.
33
2.1
16 Cases of correlated source coding. . . . . . . . . . . . . . . . . . .
39
2.2
Lines and points of Table-2.2. . . . . . . . . . . . . . . . . . . . . . .
41
2.3
Admissible Slepian-Wolf rate region R for the case {1011}. . . . . . .
41
Wyner-Ziv Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
2.4
2.5
Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}. 42
∗
(D),
RX|Y
and H(pz ) − H(D) versus D curves for
Graph of RX|Y (D),
∗
pz = 0.28. For binary symmetric case, RX|Y
(D) has a rate loss with
respect to RX|Y (D) except the points (H(pz ), 0) and (0, pz ), and there
is no rate loss for these two points. . . . . . . . . . . . . . . . . . . .
44
2.7
Wyner-Ziv Setup for Gaussian case. . . . . . . . . . . . . . . . . . . .
44
2.8
2 : 1 rate DSC compression using a 2/3 convolutional code. . . . . .
46
2.9
2 : 1 rate DSC compression code design using two systematic 4/5
convolutional codes with an interleaver and iterative MAP decoding.
Blocks π correspond to a pseudo-random interleaver, and the block
π −1 is the corresponding deinterleaver. For the Log-Likelihood Rap(x=1|y)
tio(LLR) calculations log( p(x=0|y)
), the correlation noise level and the
received side information Y is used. An iterative decoding is done
using Soft-Input Soft-Output (SISO) decoder. . . . . . . . . . . . . .
48
2.10 2 : 1 rate DSC compression code design using two systematic 2/3
rate parallel concatenation convolution codes and 1/2 rate puncturing
matrices P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
2.11 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code.
49
2.12 Eight output points and corresponding partitions for 4 subset. . . . .
50
2.13 Wyner Ziv Coding as a concatenation of a good quantization code and
a Slepian-Wolf Coder. . . . . . . . . . . . . . . . . . . . . . . . . . .
50
2.14 Our proposed 2 : 1 rate DSC compression code design using LDPC
codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
2.6
-x-
LIST OF FIGURES
2.15 Decoding bit error rate versus entropy rate of the correlation noise
power H(p1 ) graph for 2 : 1 rate Slepian Wolf compression comparison.
The simulations for LDPC is made for input length 4000 length regular
LDPC matrix and 104 length irregular LDPC matrix. The graph
also contains the S-W limit, the best performances achieved using
convolutional code (Aaron and Girod, 2002), punctured turbo code
(Lajnef, 2006), and irregular LDPC with length 105 (Liveris et al,
2002a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
2.16 Encoder and decoder structure. The source is compressed using LDPC
binning, the side information Y available to the decoder is the image reconstructed from low frequency (LL2) wavelet composition, and
joint decoding of the two received signal. . . . . . . . . . . . . . . . .
55
2.17 Construction of the Side Information. The Low-Low wavelet composition of the second level is transmitted only. Decoder reconstructs
the side information by setting all other coefficients to 0. . . . . . . .
56
2.18 Left: Side Info at the receiver; Center: First iteration output of the
decoded image; Right: decoding output after 5 iterations. . . . . . .
57
3.1
Channel Coding with State Information Setup. . . . . . . . . . . . .
62
3.2
Watermarked image. . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
3.3
Costa setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.4
Informed embedding of Miller et al. on DCT coefficients of still images. 67
3.5
Proposed informed embedding setup on DWT coefficients of still images. 68
3.6
Analysis and Synthesis steps of Le Gall DWT. . . . . . . . . . . . . .
69
3.7
Wavelet composition of Lena image. . . . . . . . . . . . . . . . . . .
70
3.8
100 bit message M is inserted into Lena image using LH2, HL2 and
HH2 DWT coefficients. No perceptual shaping is applied. . . . . . .
71
The same 100 bit message M is inserted into Lena image using perceptual shaping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
3.10 Comparison of 40 bit length M embedding process into asia image
with and without perceptual shaping. . . . . . . . . . . . . . . . . . .
72
3.11 Superposition of 2 codes. . . . . . . . . . . . . . . . . . . . . . . . . .
74
3.9
3.12 Embedding process of the message M into the work s using superposition coding. An LDPC coding of M to find the channel code c1 is
followed by TCQ coding of αs − c1 to find the source code c0 . The
watermarked signal c0 +c1 +(1−α)s is sent through the attack channel. 76
3.13 Superposition watermarking extraction by BCJR and LDPC decoding
iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
3.14 Embedding 40 bit length payload to Cameraman image. . . . . . . .
80
- xi -
LIST OF FIGURES
3.15 Maximum level of attacked images that the secret message can be still
decoded perfectly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
4.1
Channel Coding with state informations. . . . . . . . . . . . . . . . .
86
4.2
P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1},
{1, 0}, {1, 1}, {0, ∞} and {1, ∞}. The rate of transmission R(α)
is calculated in nats per unit transmission (Maximum value 0.3466
nats/transmission corresponds to 1 bit/transmission). . . . . . . . .
89
Capacity gain (between RCase-B (α) and RCosta (α)) versus SNR, for
different α values where P = 1, SSR= −6 dB and various 10 log(Q/K)
values, with perfect knowledge of the channel state information at the
encoder (L = 0). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
Maximum achievable rate loss (between RCase E (α) and RCosta (α))
versus SNR, for different α values where P= 1, SSR= −6 dB and
various 10 log(Q/L) values. . . . . . . . . . . . . . . . . . . . . . . .
96
Maximum achievable rate gain or loss (between R(α) and RCosta (α))
versus SNR, for different α values where P= 1, SSR= −6 dB and
various 10 log(Q/K) values, with partial knowledge of the channel
state information at the encoder (L=1). . . . . . . . . . . . . . . . .
97
4.3
4.4
4.5
5.1
A Communication System between Alice and Bob via a non-secure
Carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2
Data Hiding + Source Coding Scheme. . . . . . . . . . . . . . . . . . 104
5.3
Channel Coding with two sided state information scheme
5.4
Rate Distortion theory with side information at the decoder: WynerZiv Setup. Scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5
Multivariate Gaussian Channel of IDH DSC scheme. . . . . . . . . . 109
5.6
Multivariate Gaussian Case: Carrier point of view. . . . . . . . . . . 110
5.7
Gaussian test channel that achieves lower bound found in Equation
5.21. Input: Ŵ ∼ N (0, Q + D1 − D2 ), output: Ŵ ∼ N (0, Q + D1 ). . 111
. . . . . . 107
5.8
Equivalent setup of the test channel in Figure-5.7 by using an addition
and a multiplication operator. . . . . . . . . . . . . . . . . . . . . . . 112
5.9
Equivalent Scheme of Gaussian Channel. . . . . . . . . . . . . . . . . 113
5.10 Embedding performance for 1/200 bit per sample with a compression
of 2 : 1 of the watermarked string using 2/3 rate LDPC code with
block length 4000. Minimum 0.02 bit per sample entropy rate loss
with respect to no-embedding case. . . . . . . . . . . . . . . . . . . . 119
- xii -
List of Tables
2.2
Achievable rate regions according to the Slepian-Wolf Theorem . . .
40
3.2
Robustness test of the proposed algorithm for the image “asia.pgm”.
40 bit message is embedded into asia image with DWT perceptual
shaping. For each attack listed below, the corresponding maximum
attack that the secret message M can be decoded without any error.
80
4.2
Special Cases of the proposed channel coding setup. . . . . . . . . .
88
5.2
Channel Coding with State Information Problems . . . . . . . . . . . 122
5.3
Source Coding with State Information Problems . . . . . . . . . . . . 122
- xiii -
LIST OF TABLES
- xiv -
List of Abbreviations
Notation
AWGN
BCJR
BSC
CCSI
C-SNR
DCT
DISCUS
DSC
DRM
DWT
ECC
G-P
IDCT
IDH
IDWT
i.i.d.
l.c.e.
LDPC
LDPCA
LLR
LR
MAC
MAP
ML
MSE
PAM
QIM
RSC
r.v.
SCSI
SISO
SLDPCA
Description
Additive White Gaussian Noise
Bahl, Cocke, Jelinek and Raviv algorithm
Binary Symmetric Channel
Channel Coding with State Information
Correlation-Signal to Noise Ratio
Discrete Cosine Transform
DIstributed Source Coding Using Syndromes
Distributed Source Coding
Digital Rights Management
Discrete Wavelet Transform
Error Correcting Codes
Gel’fand-Pinsker
Inverse Discrete Cosine Transform
Informed Data Hiding
Inverse Discrete Wavelet Transform
independent identically distributed
lower complex envelop
Low Density Parity Check
Low Density Parity Check Accumulate
Log-Likelihood Ratio
Likelihood Ratio
Multiple-Access Channel
Maximum A Posteriori
Maximum Likelihood
Mean Squared Error
Pulse Amplitude Modulation
Quantized Index Modulation
Reed Solomon Codes
random variable
Source Coding with State Information
Soft-Input Soft-Output
Sum Low Density Parity Check Accumulate
- xv -
LIST OF TABLES
Notation
SNR
S-W
TCM
TCQ
TTCQ
W-Z
Description
Signal to Noise Ratio
Slepian-Wolf
Trellis Coded Modulation
Trellis Coded Quantization
Turbo Trellis Coded Quantization
Wyner-Ziv
- xvi -
Part I
Problem Statement and
Preliminaries
- xvii -
Introduction
Consider the communication setup in Figure-1, which strongly stimulates the foundation of this dissertation. A low-power multimedia device such as mobile phone has
various functionalities i.e. an embedded camera, WiFi or 3G network connection.
Regardless of the limited power and bandwidth, telecommunications operators want
to deploy multimedia applications like video-conferencing in which the mobile device
has to handle several tasks like: Capturing sound and video, compressing them with
some fidelity criteria, and sending over the network at the uplink side; and receiving
the stream, decompressing the content and displaying at the downlink side. During
multimedia content transmission, one would like to hide seamless (in the sense of not
easily detectable by human audio-visual system) extra information for enhancing the
multimedia content or simply for Digital Right Management (DRM) reasons.
Figure 1: A multimedia communication setup for a low-power device which has data
hiding and efficient compression capability.
The state of art conventional audio-video compression standards exploit the redundancy of the data only at the encoder, and with the help of entropy coding, can
compress it close to the theoretical limits. Hence in classical compression techniques
the encoder is more complex than the decoder. One of the objectives of our work
is to shift the encoder complexity to an intermediate powerful server and transcode
the data to a conventional compression stream and send to the receiver for a simple
-1-
Introduction
decoding end.
The second problematic is to hide information in the multimedia data at the
sender side with a fidelity criterion such that you can modify the host multimedia
content not more than an acceptable noise level, transmit the modified version of the
content to the receiver, and be able to extract the hidden information at the receiver
end without access to the original multimedia data. Since the original multimedia is
only accessible to the sender, this setup is also known as “blind watermarking”.
For the formalization of blind watermarking and the compression problem stated
above in a rigorous manner, we will briefly explain the coding concept expressed by
Shannon and coding with state information.
Coding Concept
Figure 2: A point to point source-channel coding setup.
Consider the communication system in Figure-2. A signal generated by an information source need to be transmitted to a receiver through a channel where the
channel is generally imperfect hence creates errors during transmission. The aim of
the transmitter and receiver pair is to minimize its resources such as transmitting
power and the number of channel uses, while guaranteeing signal reconstruction with
a given fidelity. One can try to minimize the number of bits for representing the input source, which corresponds to compression. On the other hand, a redundant data
need to be added for recovering the errors during transmission. Hence the compression process is called as source coding while the error correction codes are called
as channel coding in the literature. The duality between the source and channel
coding is studied since Shannon (1959), where in source coding the redundancy of
the input source is removed; in channel coding case, a controlled redundant data is
-2-
Introduction
added in order to correct the transmission errors.
Coding with State Information
Figure 3: Coding with state information.
In this section, we extend the basic source coding and channel coding setup
by introducing a state information S that determines the output of the channel.
This state information can be accessible perfectly or partially to the transmitter,
to the receiver or both depending on the setup. We are mainly interested in two
communication problems with state information in this dissertation in order to find
the solution to the data hiding and source coding problem for low-power devices.
The first setup is the “channel coding with state information (CCSI) known to the
transmitter”. Since only the transmitter has access to the state information, and not
the receiver in this setup, blind watermarking application problem can be posed as
a CCSI known to the transmitter problem. Gel’fand and Pinsker (1980) and
Costa (1983) have valuable contributions in this field.
The second one is the “source coding with state information (SCSI) known to the
receiver”. This setup considers the theoretical compression rate limits of a source if
the state information accessible to the receiver is correlated with the input source.
Even though the theoretical foundation of the SCSI known to the receiver has dated
back to 1970s by Slepian and Wolf (1973) and Wyner and Ziv (1976), we need
to wait until 2000s to develop practical applications for efficiently source coding
on low-power devices and sensors (Puri and Ramchandran, 2002; Aaron and
Girod, 2002). Despite the random binning argument and coset creation of Slepian
and Wolf (1973) for the proof of the achievable rate limits (which is not practically
applicable), good error correcting codes can be employed for a sub-optimal solution.
-3-
Introduction
Moreover, like classical source-channel coding duality, Cover and Chiang
(2002); Pradhan et al (2003); Su et al (2000) have showed the strong duality between the source and channel coding with state information.
We employ the error correcting coding techniques to tackle the two problems
stated above. The main idea of Error Correcting Codes(ECC) is to add redundancy
to the data to be transmitted in an appropriate manner which serves to detect
and correct the erroneous parts from the channel. There exists two classes of ECC:
“convolutional codes” and “block codes”. Some examples of block codes are Hamming
codes, Reed Solomon Codes(RSC) and Low Density Parity Check(LDPC) Codes
Gallager (1963). They all use a parity check matrix to create the redundancy,
and have good error correcting capabilities. However if we take into consideration
the performance for larger blocks of data and soft decoding possibility, LDPC has
more advantages (Mackay, 2003).
Convolutional codes are constituted by a finite state machine such that the output of the state machine depends on the current sample and the current state. A
trellis path is a sequence of state transitions. Since convolutional codes do not permit
all possible state transitions, a sequence of state transitions derived by a convolutional code is a valid trellis path. The decoder wants to calculate the most probable
valid trellis path in either Maximum Likelihood(ML) sense or Maximum a posteriori(MAP) sense. The decoding can be done also in the sense of hard decision or
soft decision where soft decision can be used for iterative decoding. Berrou and
Glavieux (1996); Berrou et al (1993) have proposed a coding algorithm (turbo
code) based on concatenation of convolutional codes with an interleaver and have
used an optimal decoding algorithm of linear codes which is known as BCJR 1 in
an iterative manner which operates close to the theoretical limits. Other techniques
have been proposed for improving the performance of the turbo codes such as puncturing ( Acikel and Ryan (1997)) and interleaving ( Benedetto et al (1998);
Tepe and Anderson (1998)).
After the invention of turbo codes, LDPC codes are reinvented by Mackay and
Neal (1997) using belief propagation algorithm. An LDPC code can be represented
as a bipartite graph with variable nodes and check codes which are connected with
edges. Variable nodes need to satisfy all the check node equations such that the
modulo-2 sum of the variable nodes connected to a check node need to be 0. The
bipartite graph can be regular or irregular depending on the number of edges connected to each variable node or check node are same or not. There exist studies for
both regular and irregular code performances in Mackay and Neal (1997) and
in Richardson and Urbanke (2001a); Chung et al (2001a); Chung (2000);
Chung et al (2001b) respectively. With a carefully design of the the bipartite
graph, irregular codes outperform the regular ones. Richardson et al (2001)
have proposed a density evolution method for the design of LDPC codes which per1
The name of the BCJR algorithm is came from the initials of Bahl, Cocke, Jelinek and Raviv,
where it is proposed in Bahl et al (1974).
-4-
Introduction
forms 0.13 dB away from the theoretical limits, surpassing the best codes known so
far (turbo codes).
In this dissertation, we propose a complete system which does combined distributed source coding and data hiding. After a detailed research on state of art
DSC and Data Hiding schemes, we apply a high-performance DSC method (based
on LDPC) and a high-performance data hiding method (based on Trellis Coded
Quantization (TCQ) and LDPC). For the derivation of the theoretical bounds of the
system that we proposed, we extend the Costa (1983)s work on “Writing on Dirty
Paper” by realizing a partial state information at the encoder and another partial
state information at the decoder and we analyze the maximum achievable rates of
this setup. This extension reduces to 6 different cases where 4 of them are already
known and 2 of them are novel and have interesting application areas.
Afterward, the combination of data hiding and Distributed Source Coding system
is studied. Based on a practical application scenario, the theoretical rate distortion
and channel capacity terms of this setup are derived. Actually the rate-distortion
function of our setup is an extension of the Wyner-Ziv theorem with an appropriate
correlation relation between the state information and the input source. Moreover,
for the channel capacity, we use one of the special case in our findings on “Dirty Paper
Coding with Partial State Information”. A practical code design is given applying
LDPC and BCJR decoding on TCQ.
Summary of Contributions
The contributions of this dissertation can be summarized as follows. Our first major
contribution is in the field of combined Data Hiding and Distributed Source Coding.
We derive the rate distortion function and capacity formula of the embedding process
for gaussian input case, and we propose a practical code design using LDPC and TCQ
which operates close to theoretical limits.
Our second major contribution is in the area of channel coding with partial side
informations in gaussian input channels. The maximum achievable rates are derived
for channel coding with side information partially/perfectly available to the encoder
and partially/perfectly available to the decoder. Hence it is the extension of Costa’s
“writing on dirty paper” setup, and this contribution is employed for the calculation
of the channel capacity in combined Data Hiding and Distributed Source Coding
problem.
Moreover, we have proposed a Slepian-Wolf Coding Scheme based on LDPC
code which operates 0.08 bit per channel use away from the theoretical limits. This
proposed system is applied to a still-image coding system where the image is coded
such that the low pass DWT coefficients are available to the decoder.
Finally, our contributions in Informed Data Hiding field can be summarized as
proposition of two embedding method, the first one is a low-rate embedding method
-5-
Introduction
for DWT coefficients of still-images using a perceptual shaping, and the second one
is high-rate embedding method for continuous synthetic data using superposition of
a good source code C0 based on TCQ and a good channel code C1 based on LDPC.
By applying an iterative decoding between BCJR algorithm and belief propagation,
even for an AWGN attack noise which is 1.5 dB away from the theoretical limits,
the embedded message can be decoded with an error rate of Pe ≤ 10−5 .
Now we give brief explanation of our contributions in the order of appearance in
this dissertation.
Distributed Source Coding
Slepian and Wolf (1973) have derived the compression rate limits of separate
encoding and joint decoding of correlated sources drawn from discrete alphabet (See
Figure-4). After the extension of this theorem with a distortion constrained by
Wyner and Ziv (1976), the first practical code designs can be seen during late
2000s with the idea of encoding on low-power devices such as sensors.
Figure 4: Coding of two correlated sources.
According to distributed source coding, the statistical dependency of the two
correlated sources can be exploited at the decoder. For instance one of the sources
is coded with a low rate error correcting codes, and only the parity checks are sent
through the channel. The second source assumed to be the noisy version of the first
source available at the decoder, hence the decoder tries to correct the noisy part on
the second source using the parity checks of the first source.
In this dissertation, we propose a practical code design for Slepian-Wolf problem
which is based on LDPC codes for discrete alphabet input. We use 2/3 rate LDPC
codes which operates close to theoretical limits. A 2/3 rate LDPC corresponds to a
2 : 1 compression because of the ratio between (input source):(parity checks). The
LDPC encoding and decoding used for DSC is as in the traditional LDPC codes
described in Chapter-1.10 and some decoding modifications are applied for DSC as
described in Chapter-2.4.
Our system operates on a correlation noise entropy which is 0.08 bit per channel
use away from the Slepian-Wolf limit for 2 : 1 compression rate. We also compare the
-6-
Introduction
performance of our method with the existing systems. Please note that the system
that we developed in this part is employed for the joint data-hiding compression
system in Chapter-5.
We also apply our proposed coding schemes for still-image compression applications such that in this setup a refinement coding is made using LDPC codes while low
frequency Discrete Wavelet Transform (DWT) component of the image is accessible
to the decoder.
Blind Watermarking
Suppose that the secret message that we embed within a cover signal or image is
the information that we want to transmit, then blind watermarking problem can
be viewed as Channel Coding with Side Information known to the encoder (See
Figure5). Hence the theoretical limits of the secret message embedding rate can be
calculated for noncausal memoryless systems. More interesting results are found by
Costa (1983), such that for the gaussian input case, the capacity of the system is
independent of the cover data S, and there is no capacity loss due to the unavailability
of the cover data at the decoder side.
Figure 5: Costa’s “Writing on Dirty Paper” setup.
We develop two data hiding schemes, the first one achieves low embedding rate
which modifies DWT coefficients of the still-image by using trellis coding and controlling the embedding strength based on perceptual sensitivity of the DWT coefficients.
The performance of our proposed method in terms of error probability of the message
extraction under several attacks are given.
The second proposed scheme is focused on a high rate embedding performance by
combining a good source code and a good channel code. During embedding process
of the secret message, we employ LDPC encoding. Moreover, in order to respect
the embedding criterion, 6 level output TCQ is used. Hence an LDPC code and a
TCQ code are concatenated for the data hiding process. During transmission, the
watermarked signal is exposed to an AWGN attack channel. At the decoder side,
the received signal is decoded with belief propagation decoding for the LDPC side
and BCJR decoding for the TCQ side. Since two of the decoding methods give soft
-7-
Introduction
output probabilities, the decoding process is done in an iterative manner. For low
SNR values, our system operates 1.5 dB away from the Costa’s limit. As in the DSC
case, the blind watermarking system using superposition coding is one of the main
block of the overall data-hiding compression system that we proposed in Chapter-5.
Dirty Paper Coding with Partial State Information
As described in the previous section, Costa derives the capacity region of a channel
coding with state information problem known only to the decoder. However, for
some reasons, partial information on the state of the channel can be available to the
encoder or to the decoder (need not to be the same). Hence we derive the capacity
of the gaussian channel with a state information, where the state information is
partially known to the encoder and to the decoder as in Figure-6. Unlike the Costa’s
Figure 6: Channel Coding with state informations.
case, maximum achievable rate of this system depends on the state S also. Our
contributions can be listed as
• The analytic expression of the maximum achievable rate is found to be
P (QK + QL + KL)
1
∗
∗
,
max R(α ) = R(α ) = ln 1 +
α
2
N (QK + QL + KL) + QLK
(1)
which is obtained for α∗ = P QK/(P QK + QN K + L(P Q + P K + QK + N Q +
N K)).
-8-
Introduction
• The general setup can be reduced to 6 different cases where 4 of them are
well known (Case A,C,D,F) and 2 of them are new(Case B,E). The maximum
achievable rates are calculated for all 6 cases. The two new cases are compared
with the Costa’s setup.
• In order to achieve the maximum achievable rate, the encoder need to know
the channel variance parameters. However in real world applications, the exact parameters are not always known to the encoder. We analyzed the rate
gains/losses for 6 cases and the general setup when the encoding is done for
a non-optimum operating point. We also compare the gain/loss analysis with
respect to Costa’s setup.
This general setup is relevant for diverse practical applications such as watermarking under desynchronization attacks and point-to-point communication over
fading channel where receiver has an estimate of the channel state.
Informed Data Hiding and Distributed Source Coding
We employ all the contributions up to this point in order to build a system with an
Informed Data Hiding (IDH) and Distributed Source Coding (DSC). Motivated by
the application scenario in Figure-5.1 on page 102, we derive the channel capacity
and rate distortion function of a point to point communication system between Alice
and Bob supplied by a non-trust Carrier. Alice sends a secret message by inserting
it into a cover data with a power constrained, knowing that a correlated version of
the cover data is accessible to Bob. Because of the non-secure transmission, Alice
does not share her original copy, and transmits only the watermarked signal. In
Carrier point of view, he wants to minimize his bandwidth while respecting a quality
of services, so wants to compress Alice’s message given that Bob shares his noisy
copy at the decoding end. Our main contributions can be listed as
• The analytic expression of the rate distortion function of the Carrier in nats
per channel use is found to be
RW |Ŝ (D2 ) =
(
1
2
ln
0,
D1
D2
+
QK
(Q+K)D2
QK
Q+K ,
QK
D1 + Q+K
0 < D2 < D1 +
D2 ≥
.
(2)
• The embedding capacity of the overall system in nats per channel use is found
to be
D1 (D1 + Q − D2 )
1
.
C = ln 1 +
2
D2 (D1 + Q)
-9-
(3)
Introduction
Figure 7: Data Hiding + Source Coding Scheme.
• A practical code design for gaussian case is proposed using the concatenation
of the systems proposed in Chapter-2 and Chapter-3, hence a data hiding
method using superposition of source and channel code, with a compression
method using DSC principles are used in a unique system. The decoding is
done as belief propagation iterations for decompressing the watermarked signal,
and BCJR-belief propagation iterations for extracting the hidden mark. The
performance of the system is compared with the theoretical bounds derived in
this Chapter.
• A toy example for Discrete Case is proposed, and performance analysis is given
without analyzing the theoretical limits.
Organization of the Dissertation
This dissertation is constituted from two Parts and five Chapters. Part-I is titled as
“Problem Statement and Preliminaries”, and includes a general introduction and a
chapter introducing preliminary notions such as the definition of information theoretical elements, basic source and channel coding concepts and details of two powerful coding techniques that are used in this dissertation: Trellis Coded Quantization(TCQ) and Low Density Parity Check(LDPC) Codes (Chapter-1).
Part-II is dedicated to the contributions of this dissertation and contains four
chapters. In Chapter-2, we review the theoretical background for source coding with
side information such as Slepian-Wolf theorem and Wyner-Ziv theorem, and explain
- 10 -
Introduction
state of art Distributed Source Coding implementations in literature. We then introduce our practical code design for Slepian-Wolf problem which is based on LDPC
codes for discrete alphabet input and we compare this scheme with existing systems.
Finally, the extension of our practical code design for still-image compression using
LDPC codes for binning and low frequency DWT coefficients as the side information
available to the decoder.
In Chapter-3, we give the theoretical background of channel coding with side
information such as Gel’fand Pinsker theorem and Costa’s “writing on dirty paper”
setup, and depict existing Informed data hiding implementations in literature. Then
we presented our proposed informed data hiding methods, a low embedding rate on
DWT coefficients of still-images and a high embedding rate using superposition of a
good source code (TCQ) and a good channel code (LDPC).
In Chapter-4, we give our information theoretic contributions on channel coding
with side information. We extended the Costa’s “channel coding with side information perfectly known to the encoder” setup to “channel coding with side information
partially known to the encoder and partially known to the decoder (which need not
to be the same)”. The maximum achievable rate is calculated for this setup. This
global setup can be reduced to 6 different sub-cases, where 4 of them are well-known
setups, 2 of them are new. We analyze all 6 sub-cases and the general setup, and
compare with the Costa’s initial setup.
Now we have all ingredients to construct our final system in Chapter-5, in order
to make source-channel coding, to hide information within a host signal and compress
it using distributed compression techniques. The problem is formalized as the point
to point communication between Alice and Bob by using a non-trust carrier. Alice
wants to hide some information into her original copy and send it by the carrier.
Carrier wants to compress this watermarked data, and the only thing he has is a
noisy copy of the original data shared by Bob at the receiver end. We derive the
rate-distortion function of the Carrier, and the capacity of embedding system for the
gaussian input case. Surprisingly, the absence of the noisy copy of Bob to Carrier
encoder does not affect to the rate-distortion function of the Carrier. Similarly, the
absence of the Alice’s original copy to Bob does not affect to the embedding capacity
formula. After these theoretical findings, we proposed a practical code design for
the gaussian case by using our system proposed for DSC in Chapter-2 and high rate
embedding system proposed for IDH in Chapter-3. The chapter is finalized with a
practical code proposition for Binary Symmetric Channel.
The dependencies between the chapters can be seen in Figure-I.
- 11 -
Introduction
1
Introduction and Preliminaries
2
Distributed Source Coding
3
Informed Watermarking
4
Dirty Paper Coding with Partial State Information
5
Data Hiding and Distributed Source Coding
Figure 8: Chapter Dependencies of this dissertation.
- 12 -
Chapter 1
Preliminaries
Contents
1.1
Notations and Conventions
. . . . . . . . . . . . . . . . .
14
1.2
1.3
Entropy and Mutual Information . . . . . . . . . . . . . .
Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
18
1.4
1.5
Source Coding . . . . . . . . . . . . . . . . . . . . . . . . .
Channel Coding . . . . . . . . . . . . . . . . . . . . . . . .
19
19
1.6
1.7
1.8
Distributed Source Coding . . . . . . . . . . . . . . . . . .
Writing on Dirty Paper . . . . . . . . . . . . . . . . . . . .
Message Passing Algorithm . . . . . . . . . . . . . . . . .
19
20
20
1.9
Trellis Coded Quantization (TCQ) . . . . . . . . . . . . . 21
1.9.1 Viterbi algorithm . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9.2 BCJR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.10 Low Density Parity Check (LDPC) Codes . . . . . . . . 25
1.10.1 Decoding with belief propagation . . . . . . . . . . . . . . . 27
1.10.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 27
1.10.1.2 Initialization . . . . . . . . . . . . . . . . . . . . . 28
1.10.1.3 Check node iteration . . . . . . . . . . . . . . . . 29
1.10.1.4 Variable node iteration . . . . . . . . . . . . . . . 29
1.10.1.5 Final Guess . . . . . . . . . . . . . . . . . . . . . . 29
1.10.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.10.3 Performance of 1/2 LDPC codes . . . . . . . . . . . . . . . 30
1.10.4 A visual example . . . . . . . . . . . . . . . . . . . . . . . . 31
1.11 Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . .
- 13 -
31
Chapter 1. Preliminaries
In this Chapter, we introduce the notations used throughout this dissertation,
and define the information theoretical utilities such as entropy, differential entropy
and mutual information. After a brief explanation of source coding and channel
coding limits with the help of the entropy related quantities, we give a practical
source coding example with Trellis Coded Quantization (TCQ) and one channel
coding example with LDPC coding. You can also find the explanation of decoding
algorithms such as Viterbi decoding, BCJR decoding and belief propagation.
1.1
Notations and Conventions
Throughout this dissertation, we will use standard concepts and results from Information theoretic quantities, that can be found, for example in Cover and Thomas
(1991). Random variables will be denoted by capital letters, specific values they
may take will be denoted the corresponding lower case letters, and the calligraphic
font is used for sets. Similarly, random vectors, their realizations, and alphabets will
be denoted respectively, by boldface capital letters, boldface lowercase letters, and
calligraphic letters subscripted by the corresponding dimension.
Thus, for example, X n will denote a random n-vector (X1 , ..., Xn ), and xn =
(x1 , ..., xn ) is a specific vector value in X n , the n-th Cartesian power of X , which
is drawn from independent and identically distribution(i.i.d.). For a pair of discrete
random variables (X, Y ) with a joint distribution p(x, y), the entropy of X is denoted
by H(X), the conditional entropy of X given Y by by H(X|Y ), the joint entropy by
H(X, Y ), and the mutual information by I(X; Y ). A more detailed description on
the entropy related quantities can be found in Chapter-1.2.
A distortion measure d is a mapping from the set X × Y into the set of nonnegative reals d : X × Y → R+ . Two distortion functions used in this chapter
are:
• Hamming (probability of error) distortion which is given by
0, if x = y,
d(x, y) =
1, if x 6= y,
(1.1)
and corresponds also to the probability of error distortion, since Ed(X, Y ) =
P r(X 6= Y )
• Square error distance which is given by:
d(x, y) = (x − y)2
(1.2)
The distortion between two sequences d(x, y) of length n is given by:
n
d(x, y) =
1X
d(xi , yi )
n
i=1
- 14 -
(1.3)
1.2. Entropy and Mutual Information
1.2
Entropy and Mutual Information
Entropy is one of the key elements of the information theory. Borrowed from thermodynamics, it is known as the uncertainty of a random variable. It is measured in
nats (natural log base) or in bits (log2 base). Before the definition of the entropy,
we introduce Shannon information content.
Assume a discrete random variable X drawn from a finite set x ∈ X with a
probability mass function p(x) = P r{X = x}.
Definition 1.1 Information content of an outcome x is defined to be
1
i(x) = log2
= − log2 (p(x)) .
p(x)
(1.4)
Definition 1.2 Entropy is defined to be the average Shannon information content
of an outcome:
X
H(X) ≡ E{i(x)} = −
p(x) log2 (p(x))
(1.5)
x∈X
with the convention for p(x) = 0, p(x) log2 (p(x)) ≡ 0, since limθ→0+ (θ log2 (θ)) = 0.
Now we introduce joint and conditional entropy, and mutual information.
Definition 1.3 Joint Entropy H(X, Y ) of a pair of discrete r.v. X, Y drawing from
(x, y) ∈ X × Y, with joint probability mass function p(x, y) = P r{X = x, Y = y} is
XX
H(X, Y ) = −
p(x, y) log2 (p(x, y)) .
(1.6)
x∈X y∈Y
Definition 1.4 If (x, y) ∼ p(x, y), then conditional entropy H(Y |X) is
H(Y |X) = −
=−
=−
X
x∈X
X
p(x)H(Y |X = x)
p(x)
x∈X
X
p(y|x) log2 (p(y|x))
y∈Y
XX
p(x, y) log2 (p(y|x)) .
(1.7)
x∈X y∈Y
The relation between the joint and conditional entropy can be expressed as:
H(X, Y ) = H(X) + H(Y |X)
= H(Y ) + H(X|Y )
- 15 -
(1.8)
(1.9)
Chapter 1. Preliminaries
Definition 1.5 The relative entropy or Kullback-Leibler divergence between two probability mass function p(x) and q(x) is
D(pkq) =
X
p(x) log2
x∈X
p(x)
q(x)
(1.10)
Definition 1.6 Given (x, y) ∼ p(x, y), the mutual information I(X; Y ) is the relative entropy between the joint distribution and the product distribution p(x)p(y)
I(X; Y ) = D(p(x, y)kp(x)p(y))
XX
p(x, y)
=
p(x, y) log2
p(x)p(y)
(1.11)
x∈X y∈Y
Some of the relationships between the entropy and mutual information is
I(X; X) = H(X),
(1.12)
I(X; Y ) = H(X) + H(Y ) − H(X, Y ),
(1.13)
I(X; Y ) = H(X) − H(X|Y ),
I(X; Y ) = H(Y ) − H(Y |X),
I(X; Y ) = I(Y ; X).
(1.14)
(1.15)
(1.16)
Figure 1.1: The Venn diagram of the relationship between entropy and mutual information.
The Venn diagram shown in Figure-1.1 expresses the relationship between H(X),
H(Y ), H(X, Y ), H(X|Y ), H(Y |X) and I(X; Y ).
In this dissertation, we also use entropy and mutual information of more than
two random variables. Now we define the chain rules in order to calculate entropy
related functions for more than 2 random variables.
- 16 -
1.2. Entropy and Mutual Information
Definition 1.7 (Chain Rule for entropy) Let Random variables X1 , X2 , .., Xn be
drawn according to p(x1 , x2 , .., xn ), then
H(X1 , X2 , .., Xn ) =
n
X
i=1
H(Xi |Xi−1 , .., X1 ).
(1.17)
Definition 1.8 Conditional mutual information of random variables X and Y given
Z is
I(X; Y |Z) = H(X|Z) − H(X|Y, Z).
(1.18)
Definition 1.9 (Chain Rule for mutual information):
I(X1 , X2 , .., Xn ; Y ) =
n
X
i=1
H(Xi ; Y |Xi−1 , .., X1 ).
(1.19)
Information content definition can be extended for continuous random variables
drawn from an infinite set. Let X continuous r.v. with probability density function
f (x) drawn from a support set S.
Definition 1.10 Differential entropy is defined to be
Z
1
f (x) log2
h(X) =
dx.
f (x)
S
(1.20)
Below, we give numerical examples for entropy and differential entropy of several
probability distribution functions.
Example 1.1 (Binary Distribution) Entropy of a r.v. X from finite set X ∈ {0, 1}
where p(0) = a and p(1) = 1 − a is :
def
H(X) = a log2 (1/a) + (1 − a) log2 (1/(1 − a)) = H(a),
(1.21)
where 0 ≤ a ≤ 1.
The graph of H(a) versus a is shown in Figure-1.2. Please note that H(a) maximizes
for a = 1/2.
Example 1.2 (Uniform Distribution) Consider a random variable distributed uniformly between 0 and a as seen in Figure-1.3. Then its differential entropy is:
Z a
1
log2 a dx = log2 a.
(1.22)
h(X) =
a
0
- 17 -
Chapter 1. Preliminaries
Figure 1.2: Binary entropy function H(a) versus a.
Figure 1.3: Uniform density function p(x) versus x where p(x) = 1/a for 0 ≤ x ≤ a.
Example 1.3 (Gaussian Distribution) Consider a random variable with a gaussian
√ distribution 2X ∼ N (0, P ) hence having probability density function f (x) =
1/ 2πP exp (−x /2P ). Then its differential entropy is:
Z ∞
1
(1.23)
f (x) log2 (f (x))dx = log2 2πeP.
h(X) =
2
−∞
Remark 1.1 For a given probability density function with variance P , the gaussian
distribution has the greatest differential entropy.
1.3
Causality
A system is called causal if its output depends only on its past and its present
inputs. Otherwise, if the output depends also on future inputs, then it is defined to
be noncausal. During this dissertation, we focus on the non-causal systems.
- 18 -
1.4. Source Coding
1.4
Source Coding
Definition 1.11 We define the rate distortion function of a discrete memoryless
system in Figure-1.4 with a fidelity criterion d(X, X̂) ≤ D as
R(D) =
min
I(X; X̂),
(1.24)
p(x̂|x):E{d(X,X̂)≤D}
where the minimum is taken over all conditional distributions p(x̂|x) for which the
joint distribution p(x, x̂) satisfies the expected distortion constraint.
Figure 1.4: A compression system.
Thus, the Rate-Distortion function gives the minimum rate R needed to have a
compression of the input with a maximum distortion level D.
1.5
Channel Coding
Definition 1.12 We define the channel capacity of a discrete memoryless system in
Figure-1.5 as
C = max I(X; Y ),
p(x)
(1.25)
where the maximum is taken over all possible input distributions.
If we analyze the operational meaning of the channel capacity, it can be expressed
as the highest rate in bits per channel use at which information can be sent with
arbitrarily low error probability.
Figure 1.5: A communication system.
1.6
Distributed Source Coding
In the area of compression of correlated multi-sources, Slepian and Wolf (1973)
have showed that with a separate encoding of each source and the joint decoding at
- 19 -
Chapter 1. Preliminaries
the receiving end has no rate loss with respect to the case of joint encoding and joint
decoding system. Even with the separate encoding, the joint decoder can exploit
the correlation between the sources. The idea is the each separate encoder partition
possible inputs into random subsets, and send only the index of the subset (known as
syndrome) and the decoder used channel coding principles in order to estimate the
sources from their syndromes. Motivated by the idea of developing low complexity
encoders for low-power handhelds, in this dissertation we propose to transmit the
parity checks of a high performance error correcting codes such as LDPC for the
separate compression of correlated sources.
1.7
Writing on Dirty Paper
Costa (1983) has introduced the terminology “Writing on Dirty Paper” for the problem of coding with state information at the encoder. The encoder can communicate
with the decoder using a signal X with a limited power P and try to sent a message
M given that the state information S of the channel is accessible only to the encoder.
Costa has showed that the non-availability of the state information to the decoder
does not affect the capacity of the system. The state information is assumed to be a
dirt. Instead of canceling out this dirt by using its limited power P , the encoder can
use its power in the direction of the dirt, and can achieve the same capacity where
the state information is accessible to the decoder. An auxiliary variable U is used
for encoding such that U = X + αS, where X is the output of the encoder, and α is
a multiplication constant between 0 and 1. If α is chosen as P/(P + N ), the rate of
the communication is maximized. Then it is enough to send the appropriate X by
modifying the αS to the closest U value which is indexed by the message M .
Chen and Wornell (1998) and Cox et al (1999) have firstly realized that
this setup can be used to determine the capacity of a blind watermarking problem. In
this dissertation, highly motivated by this setup, we search the capacity of a system
where the state information is partially available to the decoder. Moreover practical
code design for writing on dirty paper will be proposed.
1.8
Message Passing Algorithm
Message Passing is a simple and powerful algorithm that is used to resolve diverse
research problems from counting problem to marginalization problems. Since it is
fundamental for the belief propagation for LDPC decoding and BCJR algorithm for
turbo codes, we mention it with a simple counting problem on a straight line (See
Figure-1.6). Instead of dedicating one person to sum up all of the group, the head
and the tail of the line can send a message to their neighbors by simply saying 1,
And if a person receives a message from only one of his neighbors, it adds up 1 to the
message and transmits to his other-side neighbor. If it receives the messages of both
of two neighbors, the sum of the line can be found as the sum (left message+right
- 20 -
1.9. Trellis Coded Quantization (TCQ)
message+1). If there exists no loop, the message passing algorithm converges to the
exact solution. You can find the details of BCJR algorithm (backward-forward) and
belief propagation in the following sections.
Figure 1.6: Counting problem on a straight line.
1.9
Trellis Coded Quantization (TCQ)
Trellis Coded Quantization (TCQ) is a limit achieving vector quantization method
proposed by Marcellin and Ficher (1990). It uses Ungerboeck (1982)’s set
partitioning idea for Trellis Coded Modulation(TCM). Let X n a random n vector
(X1 , ..., Xn ) where each element of the vector is i.i.d. with probability density function P (X). We want to quantize this vector by m bits per sample hence to transmit
one of the 2m symbols per sample. The basic idea of the TCQ is such that the
elements of quantized data Y n constitutes a markov sequence, and while it is transmitted through a noisy channel, X n is the output of the channel. The aim is to find
the sequence Y n most probable given X n . First, the possible symbols are doubled to
2m+1 , then partitioned into 2k+1 subsets where k ≤ m. TCQ uses a rate k/(k + 1)
convolutional code to expand k input bits to k + 1 to select one of the k + 1 subset
and uses the rest m − k bits to select one of the 2m−k symbols in the selected subset.
Then by minimizing the MSE between X n and possible sequences Ŷ n , Y n is found.
Using a convolutional code and set partitioning has a better performance than conventional modulation techniques. The min-sum algorithm which is also know as
Viterbi algorithm can be applied to find the most probable sequence Y n .
Now we will give a brief example of trellis and a TCQ example with a 1/2 rate
convolutional code with memory 2. A systematic recursive convolutional code with
generator matrix (011, 101) in binary form or (3, 5) in octal digits can be seen in
Figure-1.7. The blocks D are the delay elements with unit time.
The convolutional code in Figure-1.7 corresponds to a state diagram as seen in
Figure-1.8 where the states are the 2 bit memory and state transitions are described
by the arrows marked as corresponding input output pairs ik /y1,k y0,k . If we map the
output sequences to 4 reconstruction levels D0 , D1 , D2 and D3 .
For instance for 2 bits per sample, the reconstruction levels are doubled as seen in
Figure-1.9. For time instant i and for each possible transitions Dk where 0 ≤ k ≤ 3,
- 21 -
Chapter 1. Preliminaries
Figure 1.7: A 1/2 recursive systematic convolution code with memory 2 and generator matrix (3, 5) in octal digits.
the MSE cost of selecting the closest element within the selected output is calculated
(Xi − Dk )2 . Starting by state 00 at time 0, the one bit input choses one of the
possible 2 subsets. Then using viterbi algorithm as explained in Chapter-1.9.1, the
most probable path pn1 is found. Then when at time t, pt governs chose one of the
four dictionary Dk by the convolutional code, extra 1 bit is needed to choose the
index of the sub-dictionary of Dk .
1.9.1
Viterbi algorithm
Viterbi algorithm or known as min-sum algorithm (Viterbi, 1967) tries to find
the most probable sequence within the valid codewords. For all time sequences
t = 1, .., n and all possible output level k = 1, .., 4, the MSE cost of each output
sequence (Xt − Dk )2 is calculated. Initializing the cost of the state 0 at t = 0 as 0
and the other states as ∞, each node transmits the current state cost plus the cost
of the arc chosen. In the next step, the node chooses the minimum cost message
among the messages it receives and send it to the next time step. At the end the
minimum cost of the overall codewords is found. Finally the most probable path
that minimizes the word error can be found by back-tracing the minimum sum path.
For instance Figure-1.10 shows all the possible paths of a trellis with length 4. Hence
Viterbi algorithm searches the minimum cost path among all possible paths.
1.9.2
BCJR
While viterbi algorithm is a maximum-likelihood decoding method which minimizes
the probability of the word error, minimizes the probability of the sequence error,
Bahl et al (1974) proposed an algorithm known also as BCJR, which can minimize
the symbol error probability. Borrowing from the message passing algorithm, BCJR
calculates the probability of a symbol given the observed sequence.
The state transitions of the Markov source are governed by the transition probabilities
pt (m|m′ ) = P r{St = m|St−1 = m′ },
- 22 -
1.9. Trellis Coded Quantization (TCQ)
0/D0
1/D3
0
1
0/D0
1/D3
0/D2
1/D1
2
3
0/D2
1/D1
Figure 1.8: State transition of the recursive systematic convolutional code (1, 3) in
octal digits.
Figure 1.9: Output points and corresponding partitions for 2 bits per sample.
and the output by the probabilities
qt (X|m′ , m) = P r{xt = X|St = m, St−1 = m′ }.
Since outputs are deterministic given the previous and current state, qt (X|m′ , m)
term only takes 0 or 1 depending on the possibility of that transition.
for 1 ≤ t ≤ τ . The decoder receives the error sequence Y1τ and try to estimate
the a posteriori transition probabilities given the observation Y1τ i.e.
P r{St−1 = m′ ; St = m|Y1τ } = P r{St−1 = m′ ; St = m; Y1τ }/P r{Y1τ }.
(1.26)
For this purpose, it is more convenient to estimate the quantity σt (m′ , m) = P r{St−1 =
m′ ; St = m; Y1τ }.
- 23 -
Chapter 1. Preliminaries
Figure 1.10: Viterbi decoding of a vector with length 4.
Let we define the probability functions
αt (m) = P r{St = m; Y1t }
τ
βt (m) = P r{Yt+1
|St = m}
γt (m′ , m) = P r{St = m; Yt |St−1 = m′ }
σt (m′ , m) = P r{St−1 = m′ ; St = m; Y1τ } = αt−1 (m′ ) · γt (m′ , m) · βt (m).
Now
γt (m′ , m) =
=
1
X
U =0
1
X
U =0
P r{St = m|St−1 = m′ } · P r{ut = U |St = m, St−1 = m′ } · P r{Yt |U }
pt (m, m′ ) · qt (U |m′ , m) · R(Yt |U )
(1.27)
is calculated for each possible transitions and for t = 1, 2, .., τ where R(Yt |U ) is the
appropriate symbol transition probabilities of the channel.
Then, for t = 1, 2, .., τ
αt (m) =
=
=
=
M
−1
X
m′ =0
M
−1
X
m′ =0
M
−1
X
m′ =0
M
−1
X
m′ =0
P r{St−1 = m′ ; St = m; Y1t }
P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ , Y1t−1 }
P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ }
αt−1 (m′ ) · γt (m′ , m).
(1.28)
The boundary conditions of α0 (m) for t = 0 are
α0 (m) = 0, for m 6= 0.
α0 (0) = 1;
- 24 -
(1.29)
1.10. Low Density Parity Check (LDPC) Codes
Similarly, for t = 1, 2, .., τ − 1.
βt (m) =
=
=
M
−1
X
m′ =0
M
−1
X
m′ =0
M
−1
X
m′ =0
τ
P r{St+1 = m′ ; Yt+1
|St = m}
τ
P r{St+1 = m′ ; Yt+1 |St = m} · P r{Yt+2
|St+1 = m′ }
βt+1 (m′ ) · γt+1 (m, m′ ).
(1.30)
The boundary condition for βτ is
βτ (m) = 1/M,
(1.31)
since the termination state probability is equally distributed over all possible M
states.
Finally σ is calculated as
τ
σt (m′ , m) = P r{St−1 = m′ ; Y1t−1 } · P r{St = m; Yt |St−1 = m′ } · P r{Yt+1
|St = m}
= αt−1 (m′ ) · γt (m′ , m) · βt (m).
(1.32)
The recursive calculation σt (m′ , m) can be done in 4 steps given below.
1. Initialization of α0 (m) and βτ (m).
2. Calculation of γt (m′ , m) and αt (m) for all t = 1, 2, ..; τ and for all possible
transitions.
3. Recursively compute βt (m).
4. compute σt (m′ , m).
The pseudo-code of BCJR Algorithm can be find in Algorithm-1.
1.10
Low Density Parity Check (LDPC) Codes
Low Density Parity Check (LDPC) codes are first proposed by Gallager (1963)
and reinvented by Mackay and Neal (1997). A k/n rate linear binary (n, k)
LDPC Code is a block code that is defined by an (n − k) × n sparse parity check
matrix H, which has few numbers of 1s in every rows and columns (For instance
Equation 1.33). Another representation of the parity check code is by its bipartite
graph(See Figure 1.11) Mackay (2003).
- 25 -
Chapter 1. Preliminaries
Algorithm 1 BJCR Algorithm
Require: The received vector Y τ = {Y1 , Y2 , .., Yτ }
Ensure: σt (m′ , m)
Initialize α0 (m) and βτ (m) according to Equations-1.29 and 1.31
while t : 1 ≤ t ≤ τ do
calculate γt (m′ , m) as in Equation-1.27
calculate αt (m) as in Equation-1.28
end while
while t : τ − 1 ≥ t ≥ 1 do
calculate βt (m) as in Equation-1.30
end while
while t : 1 ≤ t ≤ τ do
calculate σt (m′ , m) as in Equation-1.32
end while
Return σt (m′ , m).



H=


1
1
0
0
0
0
1
1
0
0
1
0
0
1
0
0
1
0
1
0
1
0
1
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
1
1
0
0
1
1
0
0
0
1
0
1



.


(1.33)
An ensemble of the LDPC codes is described by the degree distribution polynomials
λ(x) and ρ(x) Richardson et al (2001); Chung (2000). λ(x) is given as
X
λ(x) =
λi xi−1 ,
(1.34)
i
and ρ(x) is defined as
ρ(x) =
X
ρj xj−1 ,
(1.35)
j
where λi is the fraction of edges that are incident on degree-i bit nodes and ρj is the
fraction of edges that are incident on degree-j check nodes. A code is to be regular
(wc ,wr ) if the degree polynomials are λ(x) = xwc −1 and ρ(x) = xwr −1 .The rate of
the LDPC code for a given pair of degree profiles is bounded by
R1
R ≥ 1 − R 01
0
ρ(x)dx
λ(x)dx
,
(1.36)
with equality if and only if the rows of the parity check matrix are linearly independent.
- 26 -
1.10. Low Density Parity Check (LDPC) Codes
Figure 1.11: Bipartite graph representation of the parity check matrix H.
1.10.1
Decoding with belief propagation
The transmitter sends a codeword x such that Ht x = 0. The receiver receives the
vector y with a transition probability p(y|x). The aim of the decoder is to find
maximum likelihood codeword xM L = arg maxx p(y|x).
If H does not include cycles, the sum product algorithm converges the exact
solution Pearl (1988). You can find below the sum-product algorithm on bipartite
graph.We use the following notation, also shown in Figure 1.12.
1.10.1.1
Definitions
• The set of bits n that participates the check m is N (m) ≡ {n : Hmn = 1}. For
example N (1) ≡ {1, 3, 5, 7} in Figure 1.11.
• The set of checks in which bit n participates is M(n) ≡ {m : Hmn = 1}. For
example M(1) ≡ {1, 2} in Figure 1.11.
- 27 -
Chapter 1. Preliminaries
Figure 1.12: Belief propagation on bipartite graph H.
• N (m)\n is the set N (m) with bit n excluded.
x is the probability of the n’th bit of the vector x is x given the informations
• qmn
obtained via checks other than check m.
x is the probability of check m satisfied if bit n of x is considered fixed at x
• rmn
and the other bits qmn′ : n′ ∈ N (m)\n.
• δqmn is difference between the probabilities n’th bit of the vector x is 0 and
1 given the informations obtained via checks other than check m, δqmn =
0 − q1 .
qmn
mn
• δrmn is the probability check m satisfied if bit n x is 0 given the informations
obtained via checks other than check m minus that of bit n x is 1, δrmn =
0 − r1 .
rmn
mn
1.10.1.2
Initialization
Depending on the vector y received from the channel and the channel model, the
likelihood probability p(xn |y) for each bit n are calculated. For instance, for a
memoryless binary symmetric channel with crossover probability ρ, p(x1 = 0|y1 =
0) = (1 − ρ) and p(x1 = 1|y1 = 0) = ρ.
0 and q 1 values are initialized with the corresponding likelihood probabilities
qmn
mn
0
1
received from the channel respectively, such that qmn
= p(xn = 0|y) and qmn
=
- 28 -
1.10. Low Density Parity Check (LDPC) Codes
p(xn = 1|y). Then each variable node sends the messages δqmn to its connected
check.
1.10.1.3
Check node iteration
a , which is an approxiEach check node sends a message to the connecting bit j, rij
mation to the probability that check i is satisfied given the symbol j is a:
a
rij
= P r{check i satisfy|xj = a},
(1.37)
The
0
rmn
≈
X
p(
xn′ :n′ ∈N (m)\n
X
xz = 0mod 2|xn = 0)
Y
x
′
n
qmn
′
(1.38)
N (m)\n
xz :z∈N (m)
a by first calculating δr
Then there is a shortcut for calculating rij
mn
δrmn =
Y
(1.39)
δqmn′ ,
n′ ∈N (m)\n
0 = 1/2(1 + δr
1
where rmn
mn ) and rmn = 1/2(1 − δrmn ). The δrmn can be calculated
efficiently by using backward-forward algorithm Bahl et al (1974).
1.10.1.4
Variable node iteration
1
0
values are calculated by using the output from the check
and qmn
In this step, qmn
node iteration.
Y
0
0
rm
(1.40)
qmn
= αmn p(xn = 0|y)
′n,
m′ ∈M(n)\m
and
1
qmn
= αmn p(xn = 1|y)
Y
1
rm
′n,
(1.41)
m′ ∈M(n)\m
0 + q 1 = 1.
where αmn is a normalization factor such that qmn
mn
1.10.1.5
Final Guess
Posterior probabilities of each bit can be calculated as
Y
0
rmn
,
qn0 = αn p(xn = 0|y)
(1.42)
m∈M(n)
and
qn1 = αn p(xn = 1|y)
Y
m∈M(n)
- 29 -
1
rmn
.
(1.43)
Chapter 1. Preliminaries
The estimate x̂ can be found by just thresholding the posterior probabilities
x̂n = arg max qni .
i
(1.44)
For the codeword decoding point, we can check if all the check nodes are satisfied
Hx̂ = 0mod2. If it is not a codeword the check-node and variable-node iterations
will be repeated respectively. The iterations halt even if a codeword is found or a
maximum number of iteration is reached.
1.10.2
Encoding
Assume an (n − k) × n sparse parity check matrix H in systematic form, such that
H = [P|IM ]. Then, the corresponding
generator matrix G is simply an n × k dense
matrix with the form G = IK |Pt , where P has dimension k × m and I is the
identity matrix. Hence from k input bit vector t a length n codeword vector x can
be calculated by simply a matrix product operation G × t = x. The method of
Richardson and Urbanke (2001b) can be used for fast encoding of LDPC codes.
1.10.3
Performance of 1/2 LDPC codes
In this part, we evaluate the error correcting capacity of rate 1/2 LDPC binary codes
for various block lengths and degree distribution polynomials.
Let xn/2 is a n/2 length binary string with Bernoulli(1/2). Using a 1/2 rate
LDPC coder, x is coded as a n bit length vector r. then it is modulated to R using
2 level Pulse Amplitude Modulation (PAM) as
Ri =
√
−√Q, if ri = 0
,
+ Q, if ri = 1
(1.45)
Then the AWGN channel outputs Y = R + Z where Z is i.i.d. r.v. ∼ N (0, N ).
Decoder initializes the likelihood ratio as
√
fN (Yi − Q)
p(ri = 1|Yi )
√
=
p(ri = 0|Yi )
fN (Yi + Q)
√
√
−(Yi − Q)2 −(Yi + Q)2
−
= exp
2N
2N
√ 2Yi Q
= exp
,
(1.46)
N
(1.47)
where fN corresponds to probability density function of a gaussian distribution with
0 mean and variance N . Then belief propagation decoding is done as explained
in Chapter-1.10.1. The performance comparisons of the decoding error rates are
given in Richardson et al (2001) for (3, 6) regular LDPC code, turbo code and
- 30 -
1.11. Conclusion
optimized irregular LDPC code (See Figure 1.13). Please note that the comparison
is made for a code length of 106 for all codes.
Figure 1.13: Performance comparison of the error rates of (3, 6) regular LDPC code,
turbo code and optimized irregular LDPC code. The channel is Binary input, additive white gaussian noise.
1.10.4
A visual example
In this section we give a visual example of LDPC coding by using a black and
white cartoon image as the input binary string to be coded. Let the image in
Figure1.14(a) is a binary string need to be transmitted through a noisy channel.
This 100 × 100 cartoon is composed from 1s and 0s that corresponds to black and
white pixels respectively. We add redundancy in order to detect and correct the
erroneous bits during the transmission. A 1/2 rate systematic regular LDPC code
with the degree polynomials λ(x) = x2 and ρ(x) = x5 is used to code the original
image (each information bit participates 3 checks and each check bit is calculated
by sum of 6 information bits,). Encoded image with its redundancy bits can be
seen in Figure1.14(b). Afterward the encoded bits are transmitted through a Binary
Symmetric Channel with crossover probability p(BSC) = 0.07. The decoder receives
the noisy image in Figure1.14(c) and uses the belief propagation method explained in
Chapter1.10.1 by taking into account of the apriory probability of the systematic bits
is known to be P (x = 1) = 0.2445 and the channel characteristic is p(BSC) = 0.07.
You can see the output of the belief propagation in Figure-1.15 after 1 iteration
(a), after 5 iterations (b), and after 10 iterations (c) in Figure1.15.
1.11
Conclusion
This chapter has introduced both theoretical and practical tools that will be used in
this dissertation. Entropy, mutual information definitions will be used to calculate
- 31 -
Chapter 1. Preliminaries
(a)
(b)
(c)
c 2007 piyalemadra.com,
Figure 1.14: LDPC coding example. Cartoon copyright used with permission. (a) Original binary cartoon with size 100 × 100 with 0s
correspond to white and 1s correspond to black pixels. The ratio between the number
of black pixels and the number of total pixels is 0.2445. (b) Visualization of the
cartoon coded with 1/2 systematic LDPC code such that the output of the encoder
contains the original image and its parity checks with size 100×100. (c) Throughout
the transmission, both the cartoon and its parity check bits are exposed to bit errors
such that the error probability of a received bit is 0.07.
the capacity calculations of the proposed systems in Chapter-4 and Chapter-5. Furthermore, high performance channel code LDPC and high performance source code
TCQ will be utilized in order to design of our practical codes for data hiding and
Slepian-Wolf source coding in the following Chapters.
- 32 -
1.11. Conclusion
(a)
(b)
(c)
Figure 1.15: LDPC decoding. (a) After 1 iteration. (b) After 5 iterations. (c) The
original cartoon is decoded without any error after 10 iterations.
- 33 -
Chapter 1. Preliminaries
- 34 -
Part II
Contributions
- 35 -
Chapter 2
Distributed Source Coding
Contents
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Theoretical Background . . . . . . . . . . . . . . . . . . . 39
2.2.1
2.2.2
2.3
Slepian-Wolf Coding of Discrete Sources . . . . . . . . . . . 39
Wyner-Ziv Theorem . . . . . . . . . . . . . . . . . . . . . . 41
Related Works . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.1 Code Design for Slepian-Wolf Coding . . . . . . . . . . . . 45
2.3.1.1
2.3.1.2
Convolutional Codes . . . . . . . . . . . . . . . . . 45
Turbo Codes . . . . . . . . . . . . . . . . . . . . . 47
2.3.1.3 LDPC Codes . . . . . . . . . . . . . . . . . . . . . 47
2.3.2 Code Design for Wyner-Ziv Coding . . . . . . . . . . . . . . 48
2.4 Practical Code Design . . . . . . . . . . . . . . . . . . . . 50
2.5
2.4.1
Input Constraints and Theoretical Correlation Noise Analysis for a Given Rate . . . . . . . . . . . . . . . . . . . . . . 50
2.4.2
2.4.3
2.4.4
LDPC Code Generation and Coset Index Calculation . . . 51
Modified Sum Product Algorithm . . . . . . . . . . . . . . 51
Experimental Setup and Performance Analysis . . . . . . . 53
Practical Application for Still-Image Coding . . . . . . . 54
2.5.1 Side Information . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5.1.1
2.5.1.2
2.6
Coset Creation . . . . . . . . . . . . . . . . . . . . 56
Iterative Decoding . . . . . . . . . . . . . . . . . . 56
2.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . 56
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
- 37 -
Chapter 2. Distributed Source Coding
A practical code design for Slepian-Wolf setup is proposed. Based on LDPC binning
techniques, a performance of only 0.08 bits/channel use away from the theoretical
limits is achieved. The system is applied to a still image coding scheme where the
decoder has access to the low-pass wavelet coefficients of the image, a complementary
coding is done based on DSC principles for the refinement of the side information at
the decoder.1
2.1
Introduction
Slepian and Wolf (1973) derived the achievable rate region for the problem of
lossless source coding with side information. Wyner and Ziv (1976) later showed
the rate distortion function for such a system. Early 2000s, a potential application of
Slepian-Wolf and Wyner-Ziv theorems was realized such that compression complexity
problems on low-power devices can be shifted to the decoder side, and the practical
code designs have been proposed based on channel coding principles. In this chapter,
we introduce the recent work on constructing practical codes for source coding with
side information using the framework of LDPC codes. The orientation of the chapter
is as follows. In Chapter-2.2, the details of the two stimulating theorems for the DSC
are given, Slepian-Wolf theorem for lossless compression of correlated sources, and
Wyner-Ziv theorem for the rate-distortion function of a source where a correlated
version of it is available only to the decoder. Then the prior work in order to design
practical codes on this area is given in Chapter-2.3. Chapter-2.4 gives the details
of our code design based on LDPC and compares our setup with the existing DSC
systems. Finally, the application of our code design applied on compression of stillimage is proposed in Chapter-2.5 where it includes distributed source coding of a
still image given that low-pass wavelet coefficients are available to the decoder.
2.1.1
List of Symbols
List of symbols that are used in this chapter can be find below.
X, Y
X̂, Ŷ
H(X)
H(X, Y )
H(X|Y )
RX , RY
D
Π, Π−1
Two i.i.d. correlated input sources.
Estimations at the decoder.
Entropy of X.
Joint entropy of X and Y .
Conditional entropy of X given Y .
Achievable rates of X and Y .
Distortion level.
Interleaver and deinterleaver.
1
The contents of this chapter have been presented partially in Dikici et al (2005) and Dikici
et al (2006b).
- 38 -
2.2. Theoretical Background
2.2
2.2.1
Theoretical Background
Slepian-Wolf Coding of Discrete Sources
The Slepian-Wolf theorem states the admissible rate regions for coding two correlated
i.i.d. sources X and Y which are drawn from a finite alphabet. The encoding and
decoding of these two correlated sources depends upon the information available
at the encoders and decoders. Figure-2.1 generalizes 16 different cases by simply
switching on and off of 4 switches S1 , S2 , S3 and S4 (Slepian and Wolf, 1973). A
state variable si is associated with switch Si taking the value 0 if the switch is opened,
1 if it is closed. The quadruple {s1 s2 s3 s4 } will be used to specify the settings of the
switches. The cases vary in novelty and interest. For example case {1111} is already
known since Shannon such that two correlated sources can be jointly compressed
with a total rate of RX + RY ≥ H(X, Y ). However the admissible regions of the
cases {0011} and {0001} are the most interesting results of the Slepian Wolf theorem.
Figure 2.1: 16 Cases of correlated source coding.
Table-2.2 lists twelve theorems whose implications in connection with Figure-2.2
give the admissible rate region R for the 16 cases. Certain lines and points on Figure2.2 are labeled with the names of the theorems in Table-2.2. The admissible region
of a setup is determined immediately with these lines and points and the theorems
f and g. Symbol x in the first column of the Table-2.2 states that the theorem holds
both when the corresponding switch is open and when it is closed.
For instance, in order to find the admissible region of the setting {1011}, Table2.2 states that the Theorems B E a c d e f g apply. The first two show that R can
not extend below the line B and nor below the line E of Figure-2.2. The next fourr
show that the points a c d and e lie in R. Theorem f shows that points above a on
the RY axis and points on B to the right of c lie in R. Finally Theorem g shows the
line segment ac is in R (See Figure-2.3 for the rate region of the case {1011}).
For the setting {0011}, the theorems A B E c d e f g hold. According to the first
three theorems, R can not extend left side of line A nor below the lines E and B.
The points c, e lie in R. Theorem f then shows that every point above d on A and
- 39 -
Chapter 2. Distributed Source Coding
Table 2.2: Achievable rate regions according to the Slepian-Wolf Theorem
s1 , s2 , s3 , s4
Theorem
name
0xxx
x0xx
xx0x
xxx0
xxxx
A
B
C
D
E
1xx1
x11x
xx1x
xxx1
xxxx
a
b
c
d
e
xxxx
f
xxxx
g
Theorem
It is necessary that:
RX ≥ H(X|Y )
RY ≥ H(Y |X)
RY ≥ H(Y )
RX ≥ H(X)
RX + RY ≥ H(X, Y )
It is sufficient that:
RX = 0
RY = H(X, Y ) + ǫXY
RX = H(X, Y ) + ǫXY
RY = 0
RX = H(X) + ǫX
RY = H(Y |X) + ǫY
RX = H(X|Y ) + ǫX
RY = H(Y ) + ǫY
RX = H(X) + ǫX
RY = H(Y ) + ǫY
ǫX , ǫY , ǫXY > 0
Bit stuffing:
(RX , RY ) ∈ R =⇒ (RX + δX , RY + δY ) ∈ R
δ X , δY ≥ 0
Limited time sharing:
If (RX , RY ) ∈ R =⇒ (RX ′ , RY ′ ) ∈ R and
RX + RY = H(X, Y ) and RX ′ + RY ′ = H(X, Y ),
then (RX ′′ + RY ′′ ) ∈ R,
RX ′′ = λRX + (1 − λ) + RX ′
RY ′′ = λRY + (1 − λ) + RY ′ 0 ≤ λ ≤ 1
every point to the right of c on B are also in R. By Theorem g, the line segment
dc is in R. The region R of Subfigure-2.4(a) is thus established. The novelty of the
Slepian-Wolf setup is that the minimum admissible regions of 2 separate encodersunique decoder({0011}) and unique encoder- unique decoder({1111}) overlaps on
the operating line segment dc. The admissible region of {0011} can be expressed as:
RX ≥ H(X|Y )
(2.1)
RY ≥ H(Y |X)
(2.2)
RX + RY ≥ H(X, Y )
(2.3)
Moreover, the setting {0001} corresponds to the “Coding with Side Information
at the decoder” or “Distributed Source Coding”, where a source X is compressed
such that a correlated version Y is accessible at the decoder. Table-2.2 shows that
Theorems A B C E d e f g all apply. Locating the lines ABCE on Figure-2.2, R can
- 40 -
2.2. Theoretical Background
Figure 2.2: Lines and points of Table-2.2.
Figure 2.3: Admissible Slepian-Wolf rate region R for the case {1011}.
not extend to the left of A nor below C. The point d is in R, then by Theorem f,
all the points to the right of d on C and every point above d on A are in R (See
Subfigure-2.4(b)).
2.2.2
Wyner-Ziv Theorem
Wyner and Ziv (1976) find the rate-distortion function of a source X given that
a correlated information Y available at the encoder, at the decoder or both where
distortion is defined as a nonnegative function d(X, X̂) ≥ 0. As seen in Figure-2.5,
two switches A and B controls the side information Y available to the encoder or the
decoder. Wyner-Ziv analyze the rate distortion of three cases:
• Switches A and B are open, i.e. no side information:
- 41 -
Chapter 2. Distributed Source Coding
(a) Case {0011}.
(b) Case {0001}.
Figure 2.4: Admissible Slepian-Wolf rate region R for the cases {0011} and {0001}.
Then the classical Shannon Theory yields
RX (D) =
min
I(X; X̂).
(2.4)
p(x̂|x):E{d(X,X̂)}≤D
• Switches A and B are closed, i.e. both the encoder and the decoder
have access to the side information:
In this case the rate distortion function is
RX|Y (D) =
min
I(X; X̂|Y )
(2.5)
p(x̂|x,y):E{d(X,X̂)}≤D
• Switch A is open and B is closed, i.e. only the decoder has access to
the side information:
Then Wyner-Ziv show that the rate is
∗
RX|Y
(D) =
min
p(z|x)p(x̂|s,z):E{d(X,X̂)}≤D
I(X; Z) − I(Y ; Z)
(2.6)
Wyner and Ziv (1976) show that:
∗
RX|Y (D) ≤ RX|Y
(D) ≤ RX (D)
(2.7)
For D = 0 the theorem is consistent with the Slepian Wolf Theorem such that
∗
RX|Y (0) = RX|Y
(0) = H(X|Y ).
Wyner-ziv derived the rate distortion function for binary symmetric case where
they assumed that X is the unbiased input to a BSC channel with crossover probability pz with 0 ≤ pz ≤ 0.5 and Y is the corresponding output. Y can be expressed as
Y = X ⊕Z where Z is bernoulli(pz ) distributed binary string and ⊕ is the addition in
- 42 -
2.2. Theoretical Background
Figure 2.5: Wyner-Ziv Setup.
∗
for the Hamming distance
modulo 2 arithmetic. The rate distortion function RX|Y
distortion measure is shown by Wyner and Ziv (1976) as:
∗
RX|Y
(D) = l.c.e. {H(pz ∗ D) − H(D), (pz , 0)} , 0 ≤ D ≤ pz
(2.8)
where l.c.e. is the lower convex envelop and pz ∗ D = pz (1 − D) + D(1 − pz ), and
H(λ) = −λ ln λ − (1 − λ) ln(1 − λ) is the entropy function for binary distribution
∗
∗
which defined in Chapter-1.2. As seen from the graph of RX|Y
in Figure2.6, RX|Y
=
∗
H(pz ∗ D) − H(D) for 0 ≤ D ≤ dc and RX|Y is a straight line segment between
(dc , H(pz ∗ dc ) − H(dc )) and (pz , 0). Hence, if we define g(D) = H(pz ∗ D) − H(D),
then dc is the solution of the equation
g(dc )
= g ′ (dc ),
d c − p0
(2.9)
where g ′ (dc ) is the derivative of g(D) with respect to D at point dc . You can also
find the graph of RX|Y (D), the rate distortion curve, where Y is accessible both at
the encoder and at the decoder. The analytic form of RX|Y (D) is known as Cover
and Thomas (1991):
H(pz ) − H(D), 0 < D ≤ pz ,
(2.10)
RX|Y (D) =
0,
D ≥ pz ,
∗
(D) only at two
Hence, for the binary symmetric source case, RX|Y (D) = RX|Y
(R, D) points: (H(X|Y ), 0) and (0, pz ). Otherwise there exists a rate loss with
∗
(D).
respect to the RX|Y (D), RX|Y (D) < RX|Y
A more interesting result is found for coding when switch A is open and B is
closed for the continuous Gaussian case, such that there is no rate loss with respect
to RX|Y (D) for all D values (Wyner and Ziv, 1976). Let X has i.i.d. gaussian
distribution with N (0, Q) and Y = X +Z where Z is gaussian i.i.d. with N (0, N ) and
∗
is independent of X (See in Figure2.7). Then, the rate distortion function RX|Y
(D)
is equal to the RX|Y (D) which has been calculated by Berger (1971):
- 43 -
Chapter 2. Distributed Source Coding
∗
(D), and H(pz ) − H(D) versus D curves for
Figure 2.6: Graph of RX|Y (D), RX|Y
∗
pz = 0.28. For binary symmetric case, RX|Y
(D) has a rate loss with respect to
RX|Y (D) except the points (H(pz ), 0) and (0, pz ), and there is no rate loss for these
two points.
∗
RX|Y
(D)
= RX|Y (D) =
(
QN
ln (Q+N
)D , 0 < D <
0,
D≥
1
2
QN
Q+N ,
QN
Q+N ,
Figure 2.7: Wyner-Ziv Setup for Gaussian case.
- 44 -
(2.11)
2.3. Related Works
because of the fact that the term I(X; Z|X̂, Y ) = 0 in the right hand-side of I(X; Z)−
I(Y ; Z) = I(X; X̂|Y ) − I(X; Z|X̂, Y ) in Equation-2.6 for the gaussian case.
2.3
Related Works
In this section, we give the existing practical code designs for the Slepian-Wolf coding
problem. Starting from Wyner’s proposition based on using parity check codes, we
will state the state of art techniques based on convolutional codes, turbo codes,
puncturing turbo codes and finally LDPC codes. Furthermore, we will state out the
Wyner-Ziv lossy compression design as a problem of quantization following with a
Slepian-Wolf coding, and mention the existing practical code designs.
2.3.1
Code Design for Slepian-Wolf Coding
Slepian and Wolf (1973) have proposed a coding scheme based on random binning
in their proofs. However, because of its non-constructive nature, it is not applicable
in practical code design. Wyner (1974) first proposed a coding scheme based on
good parity-check codes for the {0001} setup of the Slepian-Wolf Coding problem.
The idea is to partition the codeword space into cosets using ”good” (in the sense
that the codewords in the same coset are as far as possible) parity-check code H, and
transmit only the coset index s to the receiver. Then receiver can have an estimate
of the source by accessing its coset index s and the correlated input Y . Hence the
receiver tries to estimate X̂ by assuming that Y is the noisy observation of X and
try to eliminate the noise of Y by using the parity check information of X sent by
the encoder.
The two n-bit length binary source vectors X and Y can be modeled as Y =
X ⊕ U where ⊕ is the modulo 2 operation, U is a n-length binary string with
bernoulli distribution p1 . Assume that an n − k × n parity check matrix H partitions
the n dimensional vector space into 2k disjoint subspaces. The code vectors of X
must satisfy H · Xt = 0. Decoding is done by calculating the syndrome of Y ,
s = H · Yt = H · Ut . Then using a decoding function f (s), decoder finds the error
sequence and estimate X̂ = Y ⊕ f (s). The probability of error P r{X 6= X̂} =
P r{f (H · Ut ) 6= U} → 0 for n → ∞. For a practical code design using syndromes,
we need to wait till early 2000s.
2.3.1.1
Convolutional Codes
Pradhan and Ramchandran (1999) first used a channel coding technique known
as DIstributed Source Coding Using Syndromes (DISCUS) for the slepian-wolf problem (setup {0001}). Borrowing from Ungerboeck (1982)’s Trellis Coded Modulation (TCM) method, Pradhan and Ramchandran (1999); Kusuma et al (2001)
have proposed a 4-level and various number of state trellis-structured construction
- 45 -
Chapter 2. Distributed Source Coding
based framework. In order to obtain a 2 : 1 compression rate, a 2/3 systematic convolutional code is used and for an n bit input X, the convolutional code outputs n
bit X and n/2 bit coset index s, and only s is sent through the channel. The decoder
finds the sequence which is closest element to the Y within the received coset index
of X (See Figure-2.8).
Figure 2.8: 2 : 1 rate DSC compression using a 2/3 convolutional code.
Let us give an intuitive example for the binary case. Assume that X and Y are
3 bit long binary strings where the bits of X are drawn i.i.d. with P r{Xi = 0} =
P r{Xi = 1} = 0.5. The correlated information Y is drawn such that the hamming
distance between X and Y is at most 1. For instance, given X has the value 101
possible sequences of Y are 001, 100, 101 and 111.
The entropy and the conditional entropy of X and Y are found as H(X) = 3,
H(Y ) = 3, H(X|Y ) = 2. According to the S-W theorem, H(X|Y ) bit per channel
use is enough to transmit X without loss. First of all, let us assume the case where
Y is accessible both to the encoder and to the decoder. Since the encoder has
access to Y , it can code just the error pattern between X and Y and the decoder
can successfully decode X using the error pattern between X and Y . The possible
sequences of X given Y is four, hence two bit is sufficient to communicate without
any loss, which achieves the rate H(X|Y ).
However, in Slepian-Wolf setup {0001}, Y is not accessible to the encoder. With
a carefully design of a parity check code, X can be still sent with H(X|Y ) = 2 bits.
Assume the 2/3 parity check matrix
H=
1 1 0
1 0 1
,
(2.12)
where each row defines a parity check equation over modulo-2 summation of the
input bits. Hence the syndrome bits are calculated as c1 = x1 ⊕ x2 and c2 = x1 ⊕ x3 .
For the encoding of an X sequence 101 is done by calculating c1 c2 pairs c1 = 1 ⊕ 0
c2 = 1 ⊕ 1, hence 10. The decoder has access to the side information Y and the
syndrome index of X, and it decodes the most probable sequence of X̂. For a
received Y sequence, the decoder verifies if both of the check equations satisfy, and
by changing at most 1 bit of Y , X̂ is estimated. For instance, for Y = 100 and the
- 46 -
2.3. Related Works
?
syndrome 10, the decoder verifies whether the checks are satisfied: c1 = y1 ⊕ y2 = 1
?
and c2 = y1 ⊕ y3 = 0. Since only the c2 is not satisfied, flipping the value of the third
bit of Y is enough to satisfy both of the equations, so X̂ = 101 is estimated without
any error.
In Pradhan and Ramchandran (2000), a practical code design for the setup
{0011} of slepian-wolf problem has been proposed using two convolutional codes for
compressing X and Y separately, hence it can operate not only on the symmetric
regions like the setup {0001} or {0010}, but also the intermediate rate regions as in
the setup {0011}.
2.3.1.2
Turbo Codes
Afterward, more powerful channel coding techniques were employed for the coset
consctruction. First of all, Turbo code, which is invented by Berrou et al (1993)
and improved by Benedetto et al (1998); Tepe and Anderson (1998); Berrou
and Glavieux (1996), has been applied to the DSC problem by Garcia-Frias and
Zhao (2001). Bajcsy and Mitran (2001a) have used the parallel concatenation
of finite state machines using Latin squares proposed in (Bajcsy and Mitran,
2001b).
Aaron and Girod (2002) have used two parallel 4/5 rate systematic
convolutional code with an interleaver and transmit the parity bits to obtain a 2 : 1
rate compression. After the calculation of the likelihood ratios of the input-bits given
the side information Y and the parity bits, the estimation of X̂ is done with an
iterative manner by using MAP algorithm (See Figure-2.9). For a 2:1 compression
rate, Aaron and Girod (2002) achieves lossless compression with a correlation
noise entropy H(p1 ) ≤ 0.381 which corresponds a gap of 0.154 with respect to the
S-W limit.
Garcia-Frias and Zhao (2002) have employed the puncturing concept in Acikel
and Ryan (1997). Several other systems have been proposed using turbo codes
(Chou et al, 2003; Liveris et al, 2002b, 2003b). Lajnef (2006) proposed a
turbo coding based on puncturing, while he has obtained a 2 : 1 rate compression by
using two 2/3 rate parallel systematic convolutional codes with an interleaver. The
overall system has n/2 + n/2 parity bits. By using a puncturing matrix, half of the
parity bits are dropped and a compression rate 2 : 1 is obtained. By using the iterative SISO decoding, the system achieves a lossless transmission with a correlation
noise entropy H(p1 ) ≤ 0.4233 which is 0.0767 far away from the S-W limit.
2.3.1.3
LDPC Codes
As described in detail in Chapter-1.10, LDPC code is a powerful error correcting code
which is invented by Gallager (1963), reinvented and improved by Mackay and
Neal (1997); Richardson et al (2001); Chung et al (2001a). Because of its
good distance properties, Liveris et al (2002a) first used the LDPC codes in DSC
- 47 -
Chapter 2. Distributed Source Coding
Figure 2.9: 2 : 1 rate DSC compression code design using two systematic 4/5 convolutional codes with an interleaver and iterative MAP decoding. Blocks π correspond
to a pseudo-random interleaver, and the block π −1 is the corresponding deinterleaver.
p(x=1|y)
For the Log-Likelihood Ratio(LLR) calculations log( p(x=0|y)
), the correlation noise
level and the received side information Y is used. An iterative decoding is done using
Soft-Input Soft-Output (SISO) decoder.
field (See Figure-2.11). By using 2/3 rate irregular LDPC codes with long block
lengths like 106 , and a compression rate of 2 : 1, Liveris et al (2002a) achieve
lossless transmission with a correlation noise entropy H(p1 ) ≤ 0.466 which is 0.034
far away from the S-W limit, so far the best probability of error rates obtained for a
given correlation noise in the literature.
Then LDPC is used in Schonberg et al (2002) for coding the general SlepianWolf problem ({0011}) by replacing the convolutional codes of Pradhan and Ramchandran (2000) by LDPC codes.
In Varodayan et al (2005, 2006) also proposed a S-W coding scheme based
on LDPC Accumulate (LDPCA) and Sum LDPC Accumulate (SLDPCA) codes.
2.3.2
Code Design for Wyner-Ziv Coding
Since for the gaussian input case, Wyner-Ziv theorem states out that there is no
loss in rate whether the side information is accessible to the encoder or not, the
researchers have focused their effort on the design of DSC codes close to the S-W
limit. The state of art practical designs assume the wyner ziv problem as the concatenation of a good source code (quantization) which achieves good rate distortion
performance, and a S-W coder, which achieves lossless compression with side information(See Figure-2.13). The input X is first quantized by a good source code such
- 48 -
2.3. Related Works
Figure 2.10: 2 : 1 rate DSC compression code design using two systematic 2/3 rate
parallel concatenation convolution codes and 1/2 rate puncturing matrices P .
Figure 2.11: 2 : 1 rate DSC compression using a systematic 2/3 rate LDPC code.
as TCQ (Marcellin and Ficher, 1990), nested lattices as in Zamir and Shamai
(1998) or lloyd-max based quantizer as in Rebollo-Monedero and Girod (2005).
Then, the quantized stream is coded with a S-W lossless coder which is based on
a systematic turbo code (Aaron et al, 2003) or a systematic LDPC code (Liu
et al, 2006). In Pradhan and Ramchandran (1999), an 8-level max-lloyd quantization of the input source where the outputs are labelled into 4 subsets D0 , D1 ,
D2 and D3 (See Figure-2.12), their convolutional code based S-W coder described in
Chapter-2.3.1.1 is performed 7 dB away from the 1 bit/sample Wyner-Ziv distortion
limits for a Correlation-Signal to Noise Ratio (C-SNR) level of 12 dB between X and
Y.
Furthermore, the S-W and W-Z problems are extended to three sources in Liveris et al (2003a) and Lajnef et al (2006).
Moreover the S-W and W-Z coding paradigms are applied on video coding (Puri
- 49 -
Chapter 2. Distributed Source Coding
Figure 2.12: Eight output points and corresponding partitions for 4 subset.
and Ramchandran, 2002; Puri et al, 2006; Girod et al, 2005; Aaron et al,
2003; Westerlaken et al, 2005; Liveris et al, 2002b), on sensor networks
(Xiong et al, 2004; Pradhan and Ramchandran, 2000; Kusuma et al, 2001;
Pradhan et al, 2002),multiple description coding (Stankovic et al, 2007) and
on multiple-camera arrays (Zhu et al, 2003; Gehrigand and Dragotti, 2004).
Figure 2.13: Wyner Ziv Coding as a concatenation of a good quantization code and
a Slepian-Wolf Coder.
2.4
Practical Code Design
In this section, we describe our proposed LDPC based S-W coding scheme in detail.
Based on LDPC coding for the syndrome calculations, we used a modified productsum algorithm (or belief propagation) for the decoding.
2.4.1
Input Constraints and Theoretical Correlation Noise Analysis
for a Given Rate
Let X n = {X1 , X2 , ..Xn } be the sequence of n-length binary string with i.i.d. random
variables P r{Xi = 1} = P r{Xi = 0} = 0.5 noncausally available to the encoder.
Similarly U n = {U1 , U2 , ..Un } be the sequence of n-length binary string with i.i.d.
random variables P r{Ui = 1} = (1 − P r{Ui = 0}) = p1 where 0 ≤ p1 ≤ 0.5 The
side information Y n = {Y1 , Y2 , ..Yn } noncausally available to the decoder is modeled
as Yi = Xi ⊕ Ui where ⊕ is the modulo-2 sum operation. The entropy of X and
Y can be calculated as H(X) = H(Y ) = H(0.5) = 0.5 log2 (2) + 0.5 log2 (2) = 1
bit per channel use. Now the conditional entropy of X given Y is found to be
H(X|Y ) = H(Z) = H(p1 ) = p1 log2 (1/p1 ) + (1 − p1 ) log2 (1/(1 − p1 ) bit per channel
use. Hence, for instance for a fixed compression rate of X n as 1/2 bit per channel
- 50 -
2.4. Practical Code Design
use which corresponds to n/2 bits, S-W theorem states that P r{X̂ 6= X} = 0 for the
correlation probability p1 such that H(X|Y ) = H(p1 ) ≤ 1/2. In our experiments, we
fix the compression rate to 1/2 bit per channel use and find the maximum correlation
level p1 for an arbitrarily small probability of error like P r{X̂ 6= X} = 10−5 .
2.4.2
LDPC Code Generation and Coset Index Calculation
For these experiments, we generate the LDPC matrices using the degree polynomials found and distributed by the Communications Theory Lab (LTHC) at Ecole
Polytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The degree
distribution polynomials used in this dissertation can be found in Appendix-B.
In order to obtain a 2 : 1 compression rate of X, we use 2/3 rate systematic
LDPC codes where for n input bit, n/2 parity check bits are calculated (See Figure2.14). The encoder discards the systematic bits and transmit only the n/2 parity
bits. The decoder calculates the likelihood ratios and runs a modified Sum-Product
Algorithm as explained in the following section.
Figure 2.14: Our proposed 2 : 1 rate DSC compression code design using LDPC
codes.
2.4.3
Modified Sum Product Algorithm
The classical LDPC decoding is done by belief propagation or known as Sum-Product
Algorithm as described in Chapter-1.10. The algorithm is designed for a channel
where each transmitted data is exposed to the same channel characteristics. However,
in S-W coding using syndromes all the received data do not exposed to the same
- 51 -
Chapter 2. Distributed Source Coding
channel. While there is a correlation noise between X and Y ; the syndrome of X sent
by the encoder does not contain any error. Hence we modify the decoding algorithm
for S-W starting by the likelihood ratio calculations.
For a rate 2/3 systematic LDPC code, n-bit input is coded as a total of 3n/2bit where n bits are the systematic input bits, the rest of n/2 bits are the parity
bits z which satisfies the equation 0 = H · [R]t , where 3n/2-length vector R =
{z1 , z2 , .., zn/2 , X1 , X2 , .., Xn }. The decoder receives the syndrome vector z and the
side information Y = X ⊕U as explained in Chapter-2.4.1. You can see in Figure-2.14
3n/2 variable nodes corresponds to the circles at the left-hand side of the decoder.
We group the variable nodes into two: the check bits in blue and the systematic bits
in pink. Since the check bits do not exposed to error, we initialized the likelihood
ratios of the blue circles as:
p(Ri = 1|zi )
=
p(Ri = 0|zi )
∞, if zi = 1
0, if zi = 0
(2.13)
for i = 1, 2, .., n/2. The likelihood ratios of the systematic bits are calculated for
i = n/2 + 1, n/2 + 2, .., 3n/2 as
p(Ri = 1|Yi−n/2 )
=
p(Ri = 0|Yi−n/2 )
(
(1−p1 )
p1 ,
p1
(1−p1 ) ,
if Yi−n/2 = 1
if Yi−n/2 = 0
(2.14)
The next step is the modifications on the definitions given in Chapter-1.10.1.1 by
grouping the variable bits as systematic variable bits and the parity variable bits. The
set N (m) which signifies the set of variable bits that participates the checksumming
of m is divided into two subsets N1 (m) and N2 (m) as the set of systematic bits and
parity bits that participate the check m respectively. N (m) = N1 (m) ∪ N2 (m).
x is redefined for calculation of the probability of check m satisfied
Moreover rmn
only for the systematic bit n if the systematic bit n of X is considered fixed at Xn
and the other systematic bits qmn′ : n′ ∈ N1 (m)\n.
The check node iteration δrmn in Equation-1.39 is modified as
P
Y
R
δqmn′ .
δrmn = (−1) i∈N2 (m) i
n′ ∈N
(2.15)
1 (m)\n
The variable node iteration equations are the same as mentioned in Equation-1.40
and Equation-1.41, and calculated only for the systematic variable nodes.
Similarly, the final guess step is calculated for the systematic variable nodes.
0 and q 0 values by using p(Ri =1|zi )
The decoding starts with the initialization of qmn
mn
p(Ri =0|zi )
p(R =1|Y
)
and p(Rii =0|Yi−n/2 ) calculated as in Equations-2.13 and 2.14. Then the check-node
i−n/2
and variable node iterations will be repeated until a valid codeword is found or the
maximum number iteration is reached. Finally the decoder calculates
- 52 -
2.4. Practical Code Design
qn0 = αn p(xn = 0|[z|y])
Y
0
rmn
,
(2.16)
Y
1
rmn
;
(2.17)
m∈M(n)
and
qn1 = αn p(xn = 1|[z|y])
m∈M(n)
and outputs the estimation x̂ by thresholding the posterior probabilities
j
.
x̂i = arg max qi+n/2
j
(2.18)
The bit error rate Pe of the system then calculated as:
Pn
(xi ⊕ x̂i )
Pe = i=1
,
(2.19)
n
P
where
is defined to be summation over real numbers while ⊕ is the modulo-2
summation. Hence Pe is the ratio between the number of error bits to total number
of bits.
2.4.4
Experimental Setup and Performance Analysis
In this section, we compare our proposed S-W system with respect to the existing
ones. We code the input in several block-lengths varying from 4 × 103 to 1 × 105 .
Please note that LDPC decoding process has better performance for higher block
lengths, however the decoding complexity of the decoder increases for longer block
lengths. So there is a trade off between the performance and the decoding complexity.
The length n input binary string X which is drawn from a bernoulli(0.5) distribution is coded with 2/3 systematic LDPC code generated with a degree distribution
as in Appendix-B. The noise binary string U which is drawn from a bernoulli(p1 ) is
modulo-2 added to the X to create Y = X ⊕ U , where 0 ≤ H(p1 ) ≤ 0.5. Recall from
the S-W theorem that X can be compressed with a rate RX ≥ H(X|Y ) = H(p1 ). In
our experiments, we fix RX = 0.5 and search the maximum error noise p1 that the
decoder can extract X with a low probability of error (Pe (X 6= X̂) → 105 ). Please
note that according to the S-W theorem, the theoretical limit of the p1 = 0.11 where
it corresponds to have an entropy of H(p1 ) = 0.5.
The simulation results can be seen in Figure-2.15. The best performance of
convolutional code, turbo code, punctured turbo code and irregular LDPC code
published are given as H(p1 ) = 0.35, H(p1 ) = 0.39, H(p1 ) = 0.42 respectively
(Aaron and Girod, 2002; Lajnef, 2006; Liveris et al, 2002a). Our length 4000
and length 104 regular LDPC codes perform with a low probability of decoding error
at H(p1 ) = 0.36 and H(p1 ) = 0.37 which reside between the best convolutional code
and the best turbo code. Our irregular 104 achieves H(p1 ) = 0.42 and has a similar
performance with respect to the best punctured turbo code performance in Lajnef
et al (2006). Xiong et al (2004) have achieved a better performance for a block
length of 105 with a higher complexity of decoding.
- 53 -
Chapter 2. Distributed Source Coding
Figure 2.15: Decoding bit error rate versus entropy rate of the correlation noise
power H(p1 ) graph for 2 : 1 rate Slepian Wolf compression comparison. The simulations for LDPC is made for input length 4000 length regular LDPC matrix and
104 length irregular LDPC matrix. The graph also contains the S-W limit, the best
performances achieved using convolutional code (Aaron and Girod, 2002), punctured turbo code (Lajnef, 2006), and irregular LDPC with length 105 (Liveris
et al, 2002a).
2.5
Practical Application for Still-Image Coding
In this section, we propose a compression scheme for still images, by exploiting the
theory of Distributed Coding of correlated multi-sources. Two corrupted versions
of an image are encoded separately but decoded jointly (Dikici et al, 2006a).
Our approach has two main results: i) use of decomposition of low-pass wavelet
coefficients for creating the Side Information, and ii) LDPC based coset creation using
the quantized version of the original image in pixel domain. In the case of coding for
mobile terminals, the proposed codec exploits the channel coding principles in order
to have a simple encoder with a low transmission rate and high PSNR.
The application of distributed source coding techniques on still images are not
trivial, because the image should be divided into two sources X1 , X2 which will be
encoded separately. One of the solution of that problem can be sub-sampling the
image in two images (Ozonat, 2000). However, we are interested in distributed
image compression given that a compressed version of that image is accessible at
- 54 -
2.5. Practical Application for Still-Image Coding
the decoder (See Figure-2.16). For instance Low-frequency component of the image
is accessible to the decoder as a side information and a low-power device wants to
improve the quality of this side information by using low-complexity coding techniques. We introduce an efficient distributed coding technique for still images, using
low-pass discrete wavelet transformation as the side information and LDPC coding
as the mapping of the cosets. In our setup, the low-pass component of the discrete
wavelet composition of the image is assumed to be accessible to the decoder as side
information X2 . For X1 , a uniform quantized version of the original image is used.
Instead of classical source encoding of X1 , after an LDPC coding, the coset index
of X1 is sent to the decoder. The decoder finds the value of the syndrome that is
closest to X2 .
Figure 2.16: Encoder and decoder structure. The source is compressed using LDPC
binning, the side information Y available to the decoder is the image reconstructed
from low frequency (LL2) wavelet composition, and joint decoding of the two received
signal.
We will explain the extraction of side information (Section-2.5.1), the coset calculation using quantization and LDPC coding (Section-2.5.1.1), iterative joint decoding
(Section-2.5.1) and give our experimental results (Section-2.5.1).
2.5.1
Side Information
Side Information is known as the information available to the decoder which is correlated with the original signal X, and it will be used at the decoder in order to
estimate X̂ with the help of received coset index. We use the following assumption
for the side information:
Let X(M, N ) be an M ×N gray level image matrix which has integer pixel values
within the range of 0 and 255. The image X is decomposed into its 2-level wavelet
coefficients employing the 5/3 tap filter set in Le Gall and Tabatabai (2000). The
side information image Y is reconstructed from the synthesis of DWT using only low
pass component LL2 and setting the rest of the coefficients to 0. The visualization
of the SI that is computed in the encoder can be seen in Figure-2.17.
- 55 -
Chapter 2. Distributed Source Coding
The correlation noise between the original image X, and the side information Y
can be modeled as a laplacian distribution f (X, Y ) = α2 e−α|X−Y | where α can be
estimated at the encoder using the residual error between the LL wavelet decomposition of the first level and that of second level. We observed that using the estimate
of α instead of calculating the real distance values does not significantly degrade the
performance of the system.
2.5.1.1
Coset Creation
The image X is quantized with n-bit uniform quantizer and the quantized bits are
coded with 2/3 rate LDPC coder as explained in Section-2.4.2. After discarding the
systematic output bits of the LDPC coder, only the parity bits (coset index) z are
sent to the decoder.
2.5.1.2
Iterative Decoding
After the assumption of the correlation noise between the side information Y and
the quantized signal Xq have a laplacian distribution with variance 2/α2 , by utilizing
the appropriate correlation noise variance α between quantized image Xq and the
side information Y , the likelihood function p(Xq |Y = y) is calculated at the decoder
for initializing the LDPC belief propagation decoding. Then, modified sum-product
algorithm is employed as explained in Chapter-2.4.3.
2.5.2
Experimental Results
The proposed algorithm is applied on
the image ’Lena’. In our experimental setup, we examine the effects of the
quantization and calculate the rate distortion operating points. The image
Lena is processed as the following steps:
• The input image is first linearly
quantized at 256, 128, 64, 32 and
16 levels respectively. Please recall
that 256 level quantization corresponds to lossless quantization because of the input image pixels are
8 bits depth.
Figure 2.17: Construction of the Side Information. The Low-Low wavelet composition of the second level is transmitted
only. Decoder reconstructs the side infor• The quantized bits are coded with mation by setting all other coefficients to
2/3 systematic LDPC coder which 0.
is generated pseudo-randomly with
an appropriate length n.
- 56 -
2.6. Conclusion
• The decoder has access to the side information Y reconstructed from LL2
DWT coefficients.
The effects of the decoding iteration can be shown in Figure-2.18. In this figure,
a 130 × 160 pixel subset of the outputs with a compression rate 16 : 5 are given.
The leftmost picture is the side information that is reconstructed such that all of the
wavelet coefficients except the ones that are LL received by the encoder are set to 0.
The decoding of the cosets after first iteration in the center, and the rightmost one
is the output of the decision based on the decoding after 5 iterations. The quality
improvement on the edges such as the face, shoulder and hair region can be seen.
Moreover the PSNR values of these three images are 28.9 dB 34.16 dB and 34.97 dB
respectively.
Figure 2.18: Left: Side Info at the receiver; Center: First iteration output of the
decoded image; Right: decoding output after 5 iterations.
2.6
Conclusion
This chapter has proposed a close to limit Slepian-Wolf lossless compression of an
input source while a correlated side information is accessible only to the decoder.
The parity bits of a systematic 2/3 rate LDPC code is used for achieving a 2 : 1
rate compression of the input. For the binary symmetric case, a correlation noise
entropy of H(pz ) = 0.39 and H(pz ) = 0.42 is achieved using regular length 4000
and irregular length 104 LDPC matrix respectively. Since the Slepian-Wolf limits of
2 : 1 compression rate for BSC channel corresponds to a correlation noise entropy of
0.5, the proposed system operates 0.08 bit per channel use away from the theoretical
limits. Furthermore this study shows the feasibility of such a multi-source coding
scheme for still images in which the Low-pass wavelet coefficients and LDPC binning
of the image are encoded separately, and decoded jointly at the receiver.
- 57 -
Chapter 2. Distributed Source Coding
- 58 -
Chapter 3
Informed Data Hiding
Contents
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.1.1 Types of watermark . . . . . . . . . . . . . . . . . . . . . . 60
3.1.2 Types of attack models . . . . . . . . . . . . . . . . . . . . 61
3.1.3 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2 Theoretical Background . . . . . . . . . . . . . . . . . . . 63
3.3
3.4
Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proposed Scheme-1: Extension to Cox Miller . . . . . .
3.4.1
3.4.2
66
68
Embedding on Discrete Wavelet Transform Coefficients . . 69
Perceptual Shaping for DWT . . . . . . . . . . . . . . . . . 70
3.4.3 Attack Channel . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Proposed Scheme-2: Superposition Coding . . . . . . . . 73
3.5.1
3.5.2
Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Code Construction . . . . . . . . . . . . . . . . . . . . . . . 74
3.5.2.1
3.5.2.2
3.6
Source Code C0 . . . . . . . . . . . . . . . . . . . . 74
Channel Code C1 . . . . . . . . . . . . . . . . . . . 75
3.5.3
3.5.4
3.5.5
Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Gaussian Attack Channel . . . . . . . . . . . . . . . . . . . 76
Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5.6
3.5.7
Details of Joint Iterative Decoding C0 and C1 . . . . . . . . 77
Simulation Results . . . . . . . . . . . . . . . . . . . . . . . 78
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . .
- 59 -
79
Chapter 3. Informed Data Hiding
We address the data hiding problem where the host signal is not accessible to the
decoder. Exploiting the theorem of Costa such that the non-availability of the host
signal to the decoder does not affect the capacity, we proposed two practical code
designs for informed data hiding problem. The first one is for embedding low-rate
message within DWT coefficients of still images using trellis coded modulation. With
the conjunction of perceptual shaping function during the embedding process, robustness of this method for several types of attacks are given. The second code design
is for embedding high rate message within continuous signals using a combination of
good source code (TCQ) and good channel code (LDPC). After AWGN attack channel, the receiver decodes the hidden message by BCJR and belief propagation decoding
with an iterative manner. For 1/2 data embedding rate, the hidden message can be
extracted with a low decoding error like Pe ≤ 10−5 , even after an attack channel
variance which is 1.5 dB away from the theoretical limits.
3.1
Introduction
In this chapter, we compare the existing informed data hiding techniques and propose
a high embedding rate informed data hiding method with a blind detection in order
to use our complete system which will be explained in Chapter-5. Before passing
into the details of the theoretical and implementation issues of informed data hiding,
we will define the basic notions of watermarking systems and will explain where our
work fits in.
Humans have been interested in hiding a information (message) within an innocent host signal (cover) since medieval time (Hartung and Kutter, 1999).
This hiding process is named differently depending on the application. For instance,
Steganography, origin from a Greek word means “covert communication”, stands for
a point-to-point secret communication that is not known by the third parties. Hence
the secret information need not to be robust after the manipulations. Watermarking, on the other hand, must satisfy the desideratum of robustness to the malicious
attacks. Even the third parties know the existence of the mark, it is hard to remove
the hidden message. To meet the robustness issue, the information embedding rate
in watermarking is much less than that of steganography. Data Hiding or Data Embedding resides between the steganography and watermarking, and the third parties
know that there exists a message that is embedded to the signal, but there is no need
to protect it. The idea is to embed a complementary information to the host data.
3.1.1
Types of watermark
The watermarking process can be grouped as robust or fragile. In robust watermarking, the mark can be detectable even after a severe processing. The attacker goal is
to make the detector unable to detect the mark while keeping a perceptual quality.
The example applications can be inserting a mark for detecting illegal use of a copy,
- 60 -
3.1. Introduction
or to find the distributer of the illegal copy by inserting a special message for each
copy which is known as fingerprint. On the other hand, fragile watermarking is used
for authentication, that is control of tampering. It can be used for DVD detectors
to authenticate the data by a detector, and with a small variations of the signal, the
detector has to be failed. The third parties want to change the watermarked data
while the detector still can extract the message or creating a valid work for a new
data.
3.1.2
Types of attack models
Because there exists different types of watermarking applications, the malicious attacks also vary. The overall system has been modeled as the game between the
watermarker and the attacker (Moulin and Mihcak, 2004). With the amount of
knowledge of the watermarker about his attacker, the watermarker tries to maximize
his capacity of the embedding rate while the attacker tries to minimize it. This
game-theoretical approach is used as a tool for calculating the capacity for a worst
case attack.
Below, we give several possible assumptions about the attacker (Craver et al,
1998):
• Attacker knows nothing.
• Attacker knows the algorithm. This is the most widely used assumption on
watermarking. The security depends on the key, and not on the algorithm. This
assumption is related with the Kerckhoffs law in cryptography which states that
a cryptosystem should be secure even if everything about the system, except
the key is publicly available (Kerckhoffs, 1883).
• Attacker has access to several watermarked data (Collusion attack). In this
model, the access can be different host signals coded with the same mark or
the same host coded with different marks (Stone, 1996).
• Attacker has access to the detector like a black box (Oracle attack). Several
attacks can be applied in order to remove the mark (Gradient descend attack,
sensitivity analysis attack. etc.).
There exists various types of attacks, mainly classified into four in Hartung
et al (1999) as :
• Simple Attacks are attacks that adds a noise on whole watermarked data without trying to identify and isolation of the mark. Some examples of this type are:
linear or non-linear filtering, compression, addition of noise and quantification.
• Synchronization Attacks are attacks that attempt to disable the detection of
the mark by geometric distortion, spatio-temporal shifts, zooming, rotation,
cropping (Petitcolas, 2000).
- 61 -
Chapter 3. Informed Data Hiding
• Fake Watermark Attacks are attacks that try to confuse the decoder by producing fake original data or fake watermarked data (Holliman and Memon,
2000).
• Removal Attacks attempt to analyze the watermarked data, estimate and remove the mark from the host data. Examples are denoising and collusion
attacks (Stone, 1996).
Softwares such as Stirmark (Kuhn and Petitcolas, 2000) and Checkmark
(Pereira, 2001) are publicly available for simulating various kinds of attacks on
still images.
In this chapter, we solely focused on blind watermarking problem, where the
cover data or image S, in which the hidden information M will be embedded, is
accessible only to the encoder but not to the decoder (See Figure-3.1). Since the
original cover image is not accessible to the receiver, the decoding process is called
blind decoding. We analyze this problem in information theoretical way. After
introducing the prior work in this field, we propose two coding schemes, the first one
is for low embedding rate of data on images. We use Discrete Wavelet Transform
coefficients of the host image for robust embedding. Up to 1000 bits can be efficiently
embedded into 256 × 256 images with an acceptable perceptual coding. Our second
work is a high rate informed data hiding rather then a digital watermarking, because
of our assumptions of continuous input and AWGN channel fits better on an IDH
system. In this work, we embed the secret data with a rate 1/2 per host sample, and
it performs close to theoretical limits on low SNR embedding regimes facing AWGN
attacks.
Figure 3.1: Channel Coding with State Information Setup.
3.1.3
List of Symbols
List of symbols that are used in this chapter can be find below.
M
M
M̂
Discrete Message to be transmitted (Watermark).
Alphabet of Watermark.
Decoded Watermark.
- 62 -
3.2. Theoretical Background
Pe
S
X
W
Z
Y
U
α
D
P, Q, N
C0
C1
3.2
Probability of decoding error.
State information.
Stegotexts.
Watermarked data.
Attack noise.
Received signal.
Auxiliary variable.
A constant for coding with side information.
Distortion level.
Variances of X, S and Z respectively.
Source code.
Channel code.
Theoretical Background
A simple watermark text “Art&Flowers” is inserted to the cover image which can
be easily seen by human eye in Figure3.2. A malicious user can easily remove this
watermark and can be use for his own purpose without a legal permission. Actually
in this example the watermark data is independent of the picture itself, such that
some of the inserted watermark resides within white background which can be seen by
human eye. Hence the watermarking process needs to satisfy three main constraints:
• Insertion strength to guarantee imperceptibility,
• Robustness to malicious attacks,
• Capacity to accommodate the secret message.
Figure 3.2: Watermarked image.
Embedding as a function of the host data and of the secret message is referred as
informed embedding, because of the participation of the host data in the embedding
- 63 -
Chapter 3. Informed Data Hiding
process (Cox and Miller, 2002). The watermarking problem has been firstly
realized as channel coding with side information (See Figure-3.3) by Chen and
Wornell (1998).
Gel’fand and Pinsker (1980) derived the capacity formula for a class of discrete
channels {X , p(y|x, s), Y, S} with a noncausal state S n = {S1 , S2 .., Sn }, Si i.i.d. ∼
p(s). A discrete message M with finite cardinality |M| where each possible values are
equally probable is encoded with a deterministic function f : M × S n → X n where
it satisfies a distortion measure E{d(f (M, S n ), 0)} ≤ P , then transmitted through
the channel with conditional probability function p(y|x, s). The decoding function
g : Y n → M̂ estimates M̂ . Then the average probability of error is
|M|
Pe =
1 X
P r {g(Y n ) 6= k|M = k} → 0 as n → ∞.
|M|
(3.1)
k=1
The greatest integer 2nC less then or equal to |M| can be sent per channel use. Then
the supremum of the rates C is defined to be the capacity of the channel and can be
calculated as:
C = max [I(U ; Y ) − I(U ; S)],
p(x,u|s)
(3.2)
where U is the auxiliary random variable with finite cardinality. The maximization
is over p(x, u|s).
The Gel’fand Pinsker setup has been extended to the continuous alphabet gaussian channel in Costa (1983). According to Costa’s setup (as seen in Figure-3.3(a)),
a message M drawn from discrete finite set M is sent through a channel with a limited power signal X : 1/nE{(X)2 } ≤ P , and the channel output is modeled as
Y = X + S + Z where S is an interference signal known by the encoder drawn
from ∼ N (0, Q) and Z is a noise component drawn from ∼ N (0, Z). The aim is
to find the theoretical upper bounds of the quantity of the secret information M
that can be transmitted through this channel with probability of decoding error
P (M 6= M̂ ) → 0. Surprisingly, Costa shows that the capacity of this channel is
independent of the interference signal S, and equals to
1
P
R = ln 1 +
,
(3.3)
2
N
which also equals to the rate in the case where S is accessible to both the encoder
and the decoder. The key point on achieving this rate without the accessibility of
S to the decoder is to use an auxiliary random variable U such that U = X + αS
where α is a constant α = P/(P + N ). For sending the index M , the encoder
searches within possible U s for that message M such that the difference between U
and scaled interference αS satisfies the power constraint 1/n(U − αS)2 ≤ P . Then
it sends X = U − αS on the channel. The channel outputs Y = X + S + Z, and the
- 64 -
3.2. Theoretical Background
decoder finds the closest U to Y and estimates M̂ as the index that U resides in. A
more detailed derivation of the capacity of Costa can be found in Chapter-4.
(a) Costa’s writing on dirty paper setup.
(b) Costa’s setup applying to watermarking problem.
Figure 3.3: Costa setup.
Cox et al (1999) have realized as in Figure-3.3(b) that if the channel state S
is assumed to be the host signal of the watermark, the work is defined to be S + X,
and Z is defined to be the attack noise; then this blind watermarking problem can be
modeled as Costa’s “writing on dirty paper”, which means that even if the original
host data is not accessible to the decoder, there is no loss on the capacity of the
channel.
Costa’s work has been extended to arbitrarily distributed interference by (Cohen and Lapidoth, 2002; Erez et al, 2005). The theoretical limits of watermark
systems have been studied in Moulin and O’Sullivan (2003); Chen and Wornell (1999) with taking the notion of privacy of the watermark with a key into
account.
- 65 -
Chapter 3. Informed Data Hiding
3.3
Prior Work
Costa’s research provides a theoretical solution using random binning argument but
this solution could not be implemented practically because of the complexity issue.
Quantization Index Modulation (QIM) proposed by Chen and Wornell (1998)
use lattice codes, where the message to be embedded divides the lattice into sublattices, and given the host signal, the aim is to quantize it using the proper sublattice. They improved QIM using the Costa’s approach and named it Distortion
Constrained QIM (DC-QIM) (Chen and Wornell, 2001). This system had been
a superior performance comparing to the spread-spectrum techniques, however the
drawback of this system is when the embedding rate is high, it is hard to efficiently
sub-divide the quantization lattice.
Chou et al (2000) have applied error correcting codes (ECC) for the coding
concept. They used the Distributed Coding Concept explained in Chapter-2 and
the duality between DSC and IDH (Cover and Chiang, 2002; Pradhan et al,
2003). A trellis based convolutional code has been used in order to partition the
space. Le Guelvouit (2005) has proposed a system based on Turbo TCQ with the
message forces the trellis to pass through certain path. Bastug and Sankur (2004)
have proposed LDPC codes to improve the quality of payload of the watermark.
Afterward, the combination of the good quantizer codes with the good channel
codes has been proposed by several researchers.
Eggers et al (2003) have proposed a system called “Scalar Costa Scheme”,
which is similar to QIM but differs from it by taking the watermark noise ratio
(WNR) into account. At the encoding process, the Costa’s α = P/(P + N ) has been
employed for a better performance while QIM supposed an infinite length coding
hence fixing α = 1.
Miller et al (2004) have developed an informed coding by guaranteeing a
robustness level. A modified trellis path is utilized in order to find the best embedding
noise correlated with the host signal. The coding process can be explained briefly in
four steps.
• Choice of the embedding region of the host signal:
Discrete Cosine Transform (DCT) coefficients of the host image is calculated for
each 8×8 blocks. Discarding the DC coefficient, the first twelve AC coefficients
are selected as seen in Figure-3.4(a).
• Informed Coding:
A trellis with a length equal to the number of bits to be sent is created, and
depending on the message bits M , all the arcs except the corresponding message
M is deleted from the trellis. Using the selected DCT coefficients and a pseudorandom key selected commonly by the encoder decoder pair, the signal which
is mostly correlated with the host image S is found.
- 66 -
3.3. Prior Work
(a) 12 selected DCT coefficients
to modify for embedding process.
(b) Geometric interpretation of the
embedding process. The aim is to
move the host image S to the closest
point within the target region that corresponds to the index M to be sent.
Figure 3.4: Informed embedding of Miller et al. on DCT coefficients of still images.
• Embedding with Perceptual Shaping:
In the embedding process, the cover image S is needed to be modified such that
the decoder can decode the correct embedded bits with high probability. The
process can be interpreted geometrically as in Figure-3.4(b). In this Voronori
Diagram, the space is divided into five different regions, each corresponding
to a message index. The message M to be sent is called as g (good index)
while the other regions as b index. Assume that the host signal S resides in
the region b1 , and embedding process modifies this image such that it falls into
the good region which satisfies perceptual quality and a robustness. Watson’s
metric is used for the modification of the DCT coefficients (Watson, 1993),
while the work image W must be decoded correctly under a fix distortion level.
Nested lattice code has been proposed by Zamir et al (2002) where there exists
two codes, one source code Λ1 and one channel code Λ2 such that the codewords of
Λ2 can be subset of the codes of Λ1 : Λ1 ⊃ Λ2 . However it is hard to generate nested
lattice codes with both good distance properties. Then Bennatan et al (2006)
proposed a coding method using Superposition of good source code C0 with a good
source C1 . Exploiting the duality between Multiple Access Channel (MAC) and
Writing on Dirty paper, they obtained 1.2 dB away performance for 1/4 embedding
rate using joint TCQ and LDPC coding.
- 67 -
Chapter 3. Informed Data Hiding
3.4
Proposed Scheme-1: Extension to Cox Miller
The algorithm of Miller et al (2004) suffers from the block visual artifacts because of the modification of DCT coefficients. Even after a perceptual shaping using
the Watson’s algorithm (Watson, 1993), the effect of embedding can be detectable
easily. In this section, we propose an informed embedding and coding technique similar to Miller et al (2004), but we employed Discrete Wavelet Transform (DWT)
for embedding process in order to minimize the block effects. Furthermore, a perceptual shaping based on DWT coefficients are applied for adjusting the embedding
strength depending on the sensibility of the human visual system to the altering of
the DWT coefficients.
Figure 3.5: Proposed informed embedding setup on DWT coefficients of still images.
The block diagram of the proposed system can be seen in Figure-3.4. After the
extraction of the DWT coefficients, the selected ones pass through the informed
coder and embedder. The informed coder finds the most correlated signal on the
modified trellis where the trellis path is fixed by the message bits M . Then, the
embedder modifies the host image in the direction of the correlated signal and the
output signal can be decoded correctly with a robustness measure. The embedding
is done by counting the perceptual effects of each coefficient into account. Detailed
explanation of the blocks can be found in the following Subsections.
- 68 -
3.4. Proposed Scheme-1: Extension to Cox Miller
3.4.1
Embedding on Discrete Wavelet Transform Coefficients
JPEG-2000 fixed two types of wavelet types in their standards, a reversible 5-3 tap
Le Gall filter and irreversible 9-7 tap Cohen-Daubechies-Fauvaue filter (Marcellin
et al, 2000). Since the first one has perfectly reconstructible, we used Le Gall filter
in our experiments where the low pass and high pass z transform is given as H0 (z)
and H1 (z) respectively (Le Gall and Tabatabai, 2000):
1
H0 (z) = z(1 + z −1 )2 (−z − z −1 + 4),
8
1
H1 (z) = z(1 + z −1 )2 .
2
(3.4)
(3.5)
The analysis and the synthesis steps for a 2-D image based on a 1-D Le Gall
filter can be explained as follows. The composition of wavelet coefficients are based
on levels, and each level there exist four frequency components, Low-Low (LL),
Low-High (LH), High-Low (HL) and High-High (HH). For each level, these four
components can be calculated by using down-sampling and applying the analysis
filters in H0 and H1 in horizontal and vertical directions (See Figure-3.6(a)). The
LL component can be used then to calculate the next higher level.
(a) Analysis.
(b) Synthesis.
Figure 3.6: Analysis and Synthesis steps of Le Gall DWT.
The reconstruction of the image from the DWT components can be done using
the synthesis filters given as
g0 (n) = (−1)n h1 [n],
n
g1 (n) = (−1) h0 [n].
(3.6)
(3.7)
Similar to the analysis process, each component is up-sampled following with the
application of the synthesis filters g0 and g1 both in vertical and horizontal directions
See Figure-3.6(b)). Two level DWT coefficients of the Lena image can be visualized
as in Figure-3.7.
- 69 -
Chapter 3. Informed Data Hiding
In our work, we choose LH2, HL2
and HH2 components of the DWT coefficients for the embedding process. Thee
reasons is that, as it has been shown,
maximum robustness is attained when
watermarks are embedded into wellpopulated bands. Since the first level
coefficients contain lots of zeros, we selected all of the second level coefficients
except LL2, the low pass one.
Moreover, for an objective comparison with Miller’s work with DCT,
we created the same length of trellis,
hence the same amount of coefficients Figure 3.7: Wavelet composition of Lena
must participate in the embedding pro- image.
cess. For obtaining a ration 12/64, the
amount of DCT coefficients over the total number of coefficients used in Miller et
al. ’s work, the combination of LH2, HL2 and HH2 levels has the same ratio of
3/16.
In a first experiment, we use the same informed encoding and embedding process
without any perceptual shaping, which means that the embedding strength on all
three level coefficients are same. As seen in Figure-3.8(a), even without any perceptual shaping, our embedding algorithm achieves a PSNR value of 39 dB for the image
Lena. Compared with the DCT domain embedding, instead of the block artifacts in
DCT, the embedding noise is distributed overall of the image. The difference between
the host image and the coded image in wavelet domain can be seen in Figure-3.8(b).
Please note that the components LH2, HL2 and HH2 are modified equally.
3.4.2
Perceptual Shaping for DWT
Watson (1993) has proposed a contrast masking method for perceptual quality
shaping based on DCT coefficients. Using a weight matrix T which contains the
weight of each DCT coefficient, and the local features of the image (Low-pass component) a metric defining the effect of each DCT coefficient can be calculated. This
metric is used for determining the perceptual shaping weights in the Miller et al. ’s
method. Moreover, the visual impact of the DWT components have been studied
in Watson et al (1997); Levický and Foriš (2004). The weights of LH and
HL are same because of the calculation of these two components includes a low-pass
and a high-pass filter. However the third component HH has been proved to be less
sensitive to the perturbations. After our subjective tests, we defined fix weighting
ratios matrix T for each components, 2/7 for LH, 2/7 for HL and 2/7 for HH. Then
as in DCT case, a metric for each DWT coefficient is calculated for determining the
embedding power for a better perceptual output.
- 70 -
3.4. Proposed Scheme-1: Extension to Cox Miller
(a) Coded. PSNR value of
39.0005 dB.
(b) DWT coefficient differences. MSE: 6.157; dmin,
dmax: −23, 26
Figure 3.8: 100 bit message M is inserted into Lena image using LH2, HL2 and
HH2 DWT coefficients. No perceptual shaping is applied.
You can see in Figure-3.9(a), the embedding the same amount of bits as in Figure3.8, but with the perceptual shaping described above. Because of the perceptual
shaping, the modifications are concentrated at the contours of the image (See Figure3.9(b)). Furthermore, the insertion into the HH2 component is 1/3 more than that
(a) Coded. PSNR value of
38.8 dB.
(b) DWT coefficient differences. MSE: 6.21; dmin,
dmax: −67, 75
Figure 3.9: The same 100 bit message M is inserted into Lena image using perceptual
shaping.
into LH2 and HL2 components. Comparing with the embedding without perceptual
shaping in Figure-3.8, a similar PSNR value is achieved in Figure-3.9 with errors
concentrated at the less-sensitive DWT coefficients.
- 71 -
Chapter 3. Informed Data Hiding
Another visual example of the effect of perceptual shaping can be seen in Figure3.10. The image asia is code comparison in Figure-3.10(a).
(a) Coded with DWT embedding without
any perceptual shaping. PSNR value of
39.4 dB.
(b) DWT coefficient differents of (a) :
2.95; dmin, dmax: −16, 19
(c) Coded with DWT embedding using
perceptual shaping. PSNR value of 40.237
dB.
(d) DWT coefficient differents of (b) :
6.157; dmin, dmax: −23, 26.
Figure 3.10: Comparison of 40 bit length M embedding process into asia image with
and without perceptual shaping.
3.4.3
Attack Channel
For the attack channel, we simulate various attacks from linear filtering to compression attacks using Stirmark (Petitcolas, 2000). Since the proposed embedding
method depends on the trellis length, hence on the image dimension, the attacks that
modify the image dimensions can easily de-synchronizes the system. Hence we do
not apply such types of attacks as cropping, geometric-distortion, affine transform
and rotation. The lists of attacks that we apply on the watermarked images: JPEG
compression, convolution filtering, median filtering, additive noise, PSNR (all pixel
values increased by the same quantity), rotation and scale, small random distortions,
- 72 -
3.5. Proposed Scheme-2: Superposition Coding
and auto-correlation.
3.4.4
Simulation Results
With the combination of coding on selected DWT coefficients and the embedding
with the perceptual shaping, we obtain a superior image quality with respect to Miller
et al. ’s work while preserving the same amount of robustness. For instance Figure3.14 in page 80 compares the embedding outputs of 40 bit length message M into
Cameraman image by using Miller’s algorithm (Figure-3.14(a)) and our algorithm
(Figure-3.14(b)).
Table-3.2 in page 80 shows the performance of the perceptually shaped asia image
face to several attacks. The right column indicates the maximum attack level that
the embedded message survives still at the decoding. Several attacked images that
the embedded message can be still decoded correctly can be found in Figure-3.15 at
the end of this chapter (page 81).
3.5
Proposed Scheme-2: Superposition Coding
The Miller et al (2004) algorithm works quite well for certain insertion rate
such as thousand bit per 256 × 256 image, however it could not embed for higher
rates because of the insufficient coefficients to fill out the trellis. For this reason, for
high-rate embedding systems such as 1 bit per 2 coefficients of the cover signal, we
developed a similar system as in Bennatan et al (2006). The coding is done by
superposition of a good channel code C1 and a good source code C0 . The receiver
makes an iterative decoding between the channel code estimation and the source code
estimation. We use LDPC coder as the channel code and the TCQ as the source
code.
3.5.1
Definition
Assume a source code C0 quantizes a continuous input source vector x P
= xn1 i.i.d.
having values in the range [−A,A] with a mean square distortion 1/n ni=1 x2i ≤
P . Moreover length-n channel code C1 can be constructed according to zero-mean
distribution with a variance Q (the value of Q is determined with a function of P
and attack noise power N given in Section-3.5.2) where Q < P .
The superposition code is defined as C = C0 + C1 , where addition being the
standard addition over the real-number field. C corresponds to the auxiliary variable
U of Costa. The aim is to find the vector c, that is closest to the scaled host signal
αs.
- 73 -
Chapter 3. Informed Data Hiding
(a) Code C0 for time instant t.
(b) Pulse Amplitude Modulation of
code C1 for time instant t.
(c) Code C0 + C1 for time instant t.
Figure 3.11: Superposition of 2 codes.
3.5.2
Code Construction
Code constructions close to theoretical limits are proposed for C0 and C1 . Here are
the detailed explanations of the two codes.
3.5.2.1
Source Code C0
C0 is designed to meet the fidelity P
criterion between the host signal s and the watermarked signal w such that 1/n ni=1 (si − wi )2 ≤ P . We select the quantization
code C0 as a Trellis Coded Quantization (TCQ) which has a 1/2 convolutional code
feedback polynomial (671, 1631) in octal digit (Please refer to Section-1.9 for more
information on TCQ). For an input in the range [−A,A], 6-level PAM output signals
[−5A/4, −3A/4, −A/4, A/4, 3A/4, 5A/4] are used by labeling them with the 4-level
output of the convolutional code as [D3 , D0 , D1 , D2 , D3 , D0 ] (See Figure-3.11(a)).
The reason of distributing the 6 PAM output signals not between [−A,A] is the
fact that for the side-points of the input there exists only one choice for the trellis,
which leads to a performance loss (Marcellin and Ficher, 1990).
Forney
and Ungerboeck (1998) have proposed several techniques including replication of
the output signal levels. According to our simulation results, our source code C0
can quantize an input x uniformly distributed in the range [−1,1] to QC0 (x) with a
mean-distortion of P = 0.062 where QC0 (x) is the reconstruction of the quantized
vector x. The rate distortion limit is 0.0585, which can be calculated as
R(D) = H(X) − H(D) ∼ log2 2A − 1/2 log2 (2πeD)
(3.8)
for R = 1. Hence the C0 can able to quantize the input source with a gap of 0.19 dB
from the theoretical limits.
- 74 -
3.5. Proposed Scheme-2: Superposition Coding
3.5.2.2
Channel Code C1
C1 is designed to spread the secret message M to a codeword such that 1 bit of the
codeword is embedded into one sample of the host signal. Since we want to achieve
a 1/2 embedding rate, we design an irregular LDPC code with rate 1/2 (Please refer
to Section-1.10 for more information on LDPC). The input of the LDPC code is n/2
bit-long message M and the output is
√ the√n bit codeword. The codeword is two
level PAM modulated with strength − Q, Q depending on the codeword bit value
is 0 or 1 (See Figure-3.11(b)). Exploiting the duality between MAC channel and
dirty paper coding, an optimum Q value can be calculated by Boutros and Caire
(2002) as:
Q = αP,
(3.9)
where P is the C0 ’s quantization MSE distortion level, α corresponds to Costa’s
P
α = P +N
and N is the noise variance of the attack channel.
For the LDPC coding, we generate the LDPC matrices using the degree polynomials found and distributed by the Communications Theory Lab (LTHC) at Ecole
Polytechnique Fédérale de Lausanne (EPFL) Amraoui et al (2003). The irregular 1/2 rate degree distribution polynomial used in this section can be found in
Appendix-B. It achieves a performance of a 0.11 dB away from the Shannon limit.
In order to visualize the superposition scheme for a time instant t, the possible
combinations of c0,t + c1,t can be seen in Figure-3.11(c).
3.5.3
Encoder
Given the n/2 bit message M and the n sample host s = {s1 , s2 , .., sn }, the encoder
searches the vector c = c0 + c1 that is closest to the scaled host vector αs where α
is a scaling constant equal to α = P/(P + N ).
The encoding process can be seen in Figure-3.12. The encoder starts with the
computation of c1 . 1/2 rate LDPC coding of the n/2 bit message M outputs n bit
codeword k composing from 0’s and 1’s. Then length-n vector c1 is found by 2-level
PAM with equation
√
−√Q, if ki = 0
c1,i =
,
(3.10)
+ Q, if ki = 1
where Q is a constant scalar Q = αP . Hence the variance of the c1 vector is equal
to Q.
The second step is to search the n-length c0 vector such that c0 + c1 is closest
to the αs. Since our TCQ coder can quantize a vector with a variance P , αs − c1
vector is given as input to a vector quantizer. The viterbi algorithm searches the all
possible paths on the trellis to find the minimum-error sequence. The output vector
of the quantizer is assigned to c0 as
c0 = QC0 (αs − c1 ).
- 75 -
(3.11)
Chapter 3. Informed Data Hiding
Figure 3.12: Embedding process of the message M into the work s using superposition coding. An LDPC coding of M to find the channel code c1 is followed by TCQ
coding of αs−c1 to find the source code c0 . The watermarked signal c0 +c1 +(1−α)s
is sent through the attack channel.
The superposition code c is then assigned to
c = c0 + c1 = QC0 (αs − c1 ) + c1 .
(3.12)
Since the quantification code QC0 assures a quantization error limited by P , the
encoder can find the embedding noise as:
x = c0 + c1 − αs.
(3.13)
The watermarked signal w is then w = s + x.
3.5.4
Gaussian Attack Channel
Stego signal w is subjected to additive channel noise Z which is i.i.d. N (0, N ).
Hence the attack channel outputs
y = w + z = x + s + z.
3.5.5
(3.14)
Decoder
The decoder searches the ĉ0 and ĉ1 pair such that the conditional probability
P (y|(ĉ0 + ĉ1 ))
(3.15)
is maximized. Since the encoding is done by computing c1 followed by the search
of c0 , the decoding iteration first starts with the estimation of ĉ0 and terminates
- 76 -
3.5. Proposed Scheme-2: Superposition Coding
with the estimation of ĉ1 . The main steps of the decoding process can be seen in
Figure-3.13. The receiver computes
ŷ = αy
= αs + αx + αz
= c0 + c1 − (1 − α)x + αz
= c0 + c1 + ẑ
(3.16)
(3.17)
where the Equation-3.16 follows from the Equation-3.13 and the effective noise ẑ is
defined to be ẑ = −(1 − α)x + αz, gaussian distributed with mean 0, and variance
σẑ2 as
σẑ2 = (1 − α)2 P + α2 N = αN,
(3.18)
because of the fact that α = P/(P + N ).
The decoding is done between a BCJR decoder and a LDPC belief propagation
decoder in which both outputs soft decision probabilities of ĉ0 and ĉ1 respectively.
The decoding is done with an iterative manner and the final guess is made from
P (ĉ0 ) after a certain iteration or a codeword k is found.
Figure 3.13: Superposition watermarking extraction by BCJR and LDPC decoding
iterations.
3.5.6
Details of Joint Iterative Decoding C0 and C1
The joint iterative decoding can be described in three steps; the two update rules
of plain-likelihood calculations, BCJR iteration and the LDPC iteration where the
detail of each step can be found below.
• Update rules of plain likelihood calculations:
- 77 -
Chapter 3. Informed Data Hiding
Plain-likelihood is the ratio between the probability of possible outcomes given
the observations. The likelihood calculations are done before starting each
BCJR or LDPC iterations in order to initiate the cost function of every path in
the trellis or the LDPC bipartite graph. There exist two likelihood calculations.
The first one is the n by 4 matrix v from direction Y to BCJR decoder, and
the second one is the n by 2 matrix r from direction Y to LDPC decoder. The
i’th elements of t’th row of v, vti corresponds to the likelihood of ĉ0,t = Di
given yt and r where Di is the i’th the output level of the TCQ coder closest
to the channel output yt . Each element of v can be calculated as
vti =
√
P2
rtb · fσẑ (ŷt − Di + (−1)b Q))
P4 b=1
P2
√
b Q))
i=1
b=1 rtb · fσẑ (ŷt − Di + (−1)
(3.19)
for t = 1, 2, .., n and i = 1, 2, 3, 4, where fσẑ is the probability density function
of a Gaussian r.v. N (0, αN ), and rt1 , rt2 are the messages coming from the
LDPC node iteration which signifies likelihood of t’th element of ĉ1 is 0 or 1.
At the beginning of the decoding, the LDPC decoder sends rt0 = rt1 = 1/2
which means there is no prior knowledge on the c1,t .
The b’th elements of t’th row of r, rtb corresponds to the likelihood of ĉ1,t =
(b − 1) given yt and v where Di is the i’th the output level of the TCQ coder
closest to the channel output yt . Each element of r can be calculated as
√
· fσẑ (ŷt − Di + (−1)b Q)
rtb = P4
P4
√
√
i=1 vti · fσẑ (ŷt − Di − Q) +
i=1 vti · fσẑ (ŷt − Di + Q)
P4
i=1 vti
(3.20)
for t = 1, 2, .., n and b = 1, 2. Similarly fσẑ is the probability density function
of a Gaussian r.v. N (0, αN ), and Di is the i’th level output of the TCQ coder.
• Iteration BCJR: The branch metrics of the trellis are initialized by the received vectors r for each sample. Then a BCJR iteration is done as explained
in Section-1.9.2, and the BJCR outputs the probability P (c0 |ŷ, r), which will
be mapped to the message matrix v.
• Iteration LDPC: The variable node likelihoods v are calculated as explained
in the previous item. Then 10 LDPC iteration is executed between the variable
nodes and the check nodes as explained in the Section-1.10. The LDPC decoder
outputs the likelihood probability P (c1 |ŷ, v), which will be mapped to the
message vector r.
3.5.7
Simulation Results
In our simulations we embed 105 bit M within a host signal S with length 2 · 105
i.i.d. uniformly distributed in the range [−1/α,1/α]. Since our embedding method
- 78 -
3.6. Conclusion
can achieve an MSE performance of P = 0.062, the theoretical maximum AWGN
variance can be calculated from Equation-3.3 with the values R = 1/2 and P = 0.062.
Hence in theory, maximum attack variance is found to be N = 0.062.
In our experiments, starting from N = 0.062 and decreasing the N by small
amount, we search the maximum AWGN variance N , that the probability of message
error is low enough (Pe ≤ 10−5 ). For each N value, we created 20 random host signal
and embed a random message M with appropriate α = P/(P + N ) value. After a
maximum 100 decoding iteration, the error rate in the decoding M̂ is calculated. We
achieve a decoding error rate of 3 · 10−6 for N = 0.0439 where there is a
0.062
10 log10
= 1.5 dB
(3.21)
0.0439
gap from the theoretical setup.
3.6
Conclusion
In this chapter, we have proposed two informed watermarking practical code design,
one for low rate data-embedding on DWT coefficients of the still images, and the
other for high rate data-embedding using the superposition of a good source code
(TCQ) and good channel code(LDPC) under AWGN attack channel.
In the low embedding-rate code design, up to 1000 bit message M is embedded
into LH2, HL2 and HH2 components of DWT transform coefficients using a trellis
where the valid trellis path is driven my the message M . Based on Watson perceptual
metric, the sensibility of each DWT coefficient of the host image is calculated and
the embedding process is done by taking this sensibility metric into account.
For the high embedding-rate code design, we use continuous alphabet synthetic
state information and embed the message M with a rate of 1/2 bit per channel
use. Embedding is done by the superposition of a good source code based on TCQ
and a good channel code based on LDPC codes. By using a iterative decoding
algorithm of BCJR for the source code and belief propagation for the channel code,
up to an AWGN attack noise that corresponds to 1.5 dB away from the theoretical
embedding limits. This high rate embedding system can be used in conjunction with
a compression system like in Chapter-2 to build a joint embedding and compression
system.
- 79 -
Chapter 3. Informed Data Hiding
(a) Coded with Miller et
al. . PSNR value of 31.5 dB.
(b) Coded with proposed
method and perceptual
shaping. PSNR value =
32.2 dB.
Figure 3.14: Embedding 40 bit length payload to Cameraman image.
Table 3.2: Robustness test of the proposed algorithm for the image “asia.pgm”. 40
bit message is embedded into asia image with DWT perceptual shaping. For each
attack listed below, the corresponding maximum attack that the secret message M
can be decoded without any error.
Stirmark 4.0
Attack Type
JPEG compression
Convolution filtering
Median filtering
Additive noise
Rotation and scale
Auto-correlation
PSNR
asia image
Maximum level
Quality factor of 12%
gaussian filter
3×3
3%
±0.25◦
3
by 100
- 80 -
3.6. Conclusion
(a) Convolution1 (gaussian).
(b) JPEG QF=12%.
(c) Median 3 × 3.
(d) Noise 3%.
Figure 3.15: Maximum level of attacked images that the secret message can be still
decoded perfectly.
- 81 -
Chapter 3. Informed Data Hiding
- 82 -
Chapter 4
Dirty Paper Coding with Partial
State Information
Contents
4.1
4.2
4.3
4.4
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 85
Problem statement . . . . . . . . . . . . . . . . . . . . . .
Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . .
4.3.0.1
4.3.0.2
Case A . . . . . . . . . . . . . . . . . . . . . . . . 90
Case B . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.0.3
4.3.0.4
Case C . . . . . . . . . . . . . . . . . . . . . . . . 90
Case D . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.0.5 Case E . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.0.6 Case F . . . . . . . . . . . . . . . . . . . . . . . . 92
Capacity/rate gain / loss analysis . . . . . . . . . . . . . . 92
4.4.1
4.4.2
4.5
85
87
For optimum values of α . . . . . . . . . . . . . . . . . . . . 92
For non optimum values of α . . . . . . . . . . . . . . . . . 93
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . .
- 83 -
94
Chapter 4. Dirty Paper Coding with Partial State Information
A generalization of the problem of dirty paper coding is considered in which (possibly
different) noisy versions of the state information, assumed to be i.i.d. Gaussian
random variables, are available at the encoder and at the decoder. This chapter
derives the maximum achievable rate formula for this general problem. This general
setup encompasses the cases where the state information is perfectly known either at
the encoder or decoder or at both. Moreover, it generalizes the analysis to cases where
the state information is known only partially. In addition, this chapter shows that in
realistic situations where the AWGN noise power is not known at the encoder, partial
information at the decoder can increase the maximum achievable rate with respect to
Costa’s coding setup.1
4.1
Introduction
The problem of coding for communication over a channel whose conditional probability distribution is controlled by a random state parameter finds applications in diverse
areas ranging from coding for information storage in a memory with defective cells,
data hiding, and coding for multiple input multiple output communication. The particular case where the state is known causally at the encoder only (no channel state
information at the receiver) has been first considered by Shannon in 1958 (Shannon, 1958). In Gel’fand and Pinsker (1980), Gel’fand and Pinsker consider the
channel coding problem with non-causal state information available at the transmitter. In their setup, the transmitter wishes to send a message M ∈ {1, ..., |M|} over a
memoryless channel defined by the transition probabilities p(y|x, s), where X and Y
are the channel input and output and S is an i.i.d. random variable representing the
sequence of states {S1 , . . . SN } of the channel known non-causally at the encoder but
unknown at the decoder. The general Gel’fand-Pinsker problem suffers from some
capacity loss when compared with channel coding with side information available at
both encoder and decoder. In Costa (1983), Costa has shown that there is no loss
in capacity if the channel state is additive white Gaussian interference ("dirt"). The
design of codes for approaching Costa’s capacity is known as the dirty paper coding
problem. The capacity loss is derived in Zaidi and Duhamel (2005) for additive
white Gaussian channel state S partially available at the encoder but not to the decoder. The capacity for information storage in a memory where the channel state is
perfectly available at the decoder but not to the encoder is derived in Heegard and
Gamal (1983). The authors in Moulin and Wang (2007) consider a generalized
Gel’fand-Pinsker coding problem and derive capacity formulas, as well as random
coding and sphere packing exponents.
In this chapter, we focus on the particular problem of dirty paper coding with
correlated partial state information at the encoder and at the decoder. The Gel’fandPinsker coding problem where (possibly different) noisy versions of the channel sequence are available at both sides has actually been first considered in Salehi (1992)
1
This chapter corresponds to a paper that will be soon submitted.
- 84 -
4.2. Problem statement
for a binary input - binary output channel. The targeted application was information storage in a memory with defecting cells. Here, the problem we focus on can
be regarded as a special case of a coding problem with two-sided state information
examined in Cover and Chiang (2002). The state information, the channel input
and output are assumed to be i.i.d. Gaussian random variables. The maximum
achievable rates formulas are derived for this general problem as a function of α
by expressing as in Costa (1983) U = X + αS where U is an auxiliary random
variable. This will give us the general capacity formula for the cases where there is
only a partial or a null side information at the encoder side, while there is a perfect
one at the decoder side. The analytic expressions of capacity/maximum achievable
rate gains and losses with respect to Costa’s set up are given for six particular cases
with optimum and non optimum values of the α parameter. It is shown that in the
general case, a capacity gain or loss can be obtained in a realistic situation where
the optimum α is not known.
4.1.1
List of Symbols
List of symbols that are used in this chapter can be find below.
M
M
M̂
Pe
R
X
S
S1
S2
θ, T
Z
Y
U
α
N
Σ
P, Q
L, N, K
4.2
Discrete Message to be transmitted (Watermark).
Alphabet of Watermark.
Decoded Watermark.
Probability of decoding error.
Communication rate.
Stegotexts.
State information.
Partial state information available to encoder.
Partial state information available to the decoder.
Additive random noise of S1 and S2 .
Channel noise.
Received signal.
Auxiliary variable.
A constant for coding with side information.
Gaussian distribution.
Covariance matrix.
Variances of X and S respectively.
Variances of θ, T and Z respectively.
Problem statement
Consider the communication problem shown in Figure 4.1. We use the same notation
as Costa (1983) throughout this chapter. An index M ∈ {1, ..., |M|} will be sent to
- 85 -
Chapter 4. Dirty Paper Coding with Partial State Information
the receiver in n uses of the channel, where |M| is the greatest integer smaller than
or equal to enR , and R is the rate in nats per transmission. Let S = (S1 , S2 , ..., Sn )
be the sequence of noncausal state of the channel for n transmissions, assumed to be
a sequence of independent identically distributed (i.i.d.) N (0, QI) random variables.
We consider the cases where this sequence of states is partially known to the encoder
S1 = (S1,1 , S1,2 , ..., S1,n ) and to the decoder S2 = (S2,1 , S2,2 , ..., S2,n ) noncausally
and expressed as S1 and S2 throughout this chapter. This problem can be cast
into a two-sided state information set-up close to the one considered in Cover
and Chiang (2002), where S is defined by a pair of independent and identically
distributed (i.i.d.) correlated state information (S1 , S2 ) available at the sender and
at the receiver respectively. The state information available at the encoder and at
the decoder is expressed in terms of the channel state as S1 = S + θ and S2 = S + T ,
where θ and T are i.i.d. random variables according to N (0, LI) and N (0, KI), and
I is the n × n identity matrix.
Figure 4.1: Channel Coding with state informations.
Based on M and SP
1 , the encoder sends a codeword X, which must satisfy the
power constraint (1/n) ni=1 Xi2 ≤ P . The channel output is given by Y = X +S+Z,
where the channel noise Z is i.i.d. according to N (0, N I). Upon receipt of Y and
S2 , the decoder creates an estimate M̂ (Y, S2 ) of the index M . Under the assumption
that the index M is uniformly distributed over {1, .., M }, the probability of error Pe ,
- 86 -
4.3. Achievable Rate
is given by
M
o
n
1 X
Pe =
P r M̂ (Y, S2 ) 6= k|M = k .
M
(4.1)
C = max [I(U ; Y, S2 ) − I(U ; S1 )],
(4.2)
k=1
The general formula for the capacity of this set-up in the case of finite alphabets is
given by Cover and Chiang (2002):
p(x,u|s1 )
where the maximum is over all joint distributions of p(u)p(s1 , s2 , x|u)p(y|x, s1 , s2 ),
where U is an auxiliary random variable with finite cardinality. But, in our case
the alphabets are continuous and the only general capacity expression that has been
stated is Moulin and Wang (2007):
C=
sup
min [I(U ; Y, S2 ) − I(U ; S1 )],
p(x,u|s1 ) p(y|x,s)
(4.3)
So, here we will be interested in the estimation of the maximum achievable rate
for particular distributions and constructions, and see that in some cases it can be
identified with the capacity.
The perfect codes can be created as in Cover and Chiang (2002) using the
random binning argument. First, en(I(U ;Y,S2 )−2ǫ) i.i.d. sequences of U are generated according to distribution p(u) and each of them is indexed as U (i) where
i ∈ {1, 2, ..., en(I(U ;Y,S2 )−2ǫ) } . Then these sequences are randomly distributed into
en(R−4ǫ) bins where R corresponds to the rate of the system. Given the state
S1 = S + θ and the message M ∈ {1, ..., |M|}, the encoder searches the codeword
U (i) within the bin indexed by M such that the pair (U (i), S1 ) is jointly typical.
Then it sends the corresponding X which is jointly typical with (U (i); S1 ). During
the transmission, the signal is exposed to the additive interference S and Z. The
receiver receives Y = X + S + Z from the channel and observes the noncausal state
information S2 = S + T . The decoder searches for the sequence U (i) such that
(U (i), Y, S2 ) is strongly jointly typical and assigns M̂ as the index of the bin containing the sequence U (i). All possible error events go to 0 as n → ∞ Cover and
Chiang (2002).
4.3
Achievable Rate
We assume that X, S, Z, θ and T are random variables with respective Gaussian
distributions N (0, P I), N (0, QI), N (0, N I), N (0, LI), and N (0, KI). Hence, the
joint distribution f (X, S, Z, θ, T ) is a multivariate Gaussian ∼ N (0, Σ) where the
covariance matrix Σ is:
- 87 -
Chapter 4. Dirty Paper Coding with Partial State Information



Σ=


PI 0
0
0
0
0 QI 0
0
0
0
0 NI 0
0
0
0
0 LI 0
0
0
0
0 KI



.


(4.4)
We consider U = X + αS1 = X + αS + αθ where α is a parameter to be determined.
The achievable rate is then function of the parameter α and is given by R(α) =
I(U ; Y, S2 ) − I(U ; S1 ), where2
1
P ((P + Q + N )(Q + K) − Q2 )
R(α) = ln
.
2
P QK(1 − α)2 + N K(P + α2 (Q + L)) + α2 L(P Q + P K + QK + N Q) + P N Q
(4.5)
Similarly to Costa (1983), the graphs of R(α) versus α are presented in Figure 4.2 where P = Q = N = 1 and for several {L,K} pairs such as {0, 0}, {0, 1},
{1, 0}, {1, 1}, {0, ∞} and {1, ∞}.
Maximizing R(α) over α, we get3
P (QK + QL + KL)
1
∗
,
max R(α) = R(α ) = ln 1 +
α
2
N (QK + QL + KL) + QLK
(4.6)
which is obtained for α∗ = P QK/(P QK +QN K +L(P Q+P K +QK +N Q+N K)).
Therefore, if the noise powers Q, N, L, K are known at the encoder, we can obtain
the maximum achievable rate given in Equation 4.6.
Table 4.2: Special Cases of the proposed channel coding setup.
CASES
General Case
Case A
Case B
Case C
Case D
Case E
Case F
Encoder
State S1
Partial
Perfect
Perfect
Perfect
Partial
Partial
∅
Decoder
State S2
Partial
Perfect
Partial
∅
Perfect
∅
Perfect
Rate loss
for αopt
Rloss General
0
0
0
0
Rloss Case E
0
Citation
Dikici et al. Section-4.3
Dikici et al. Section-4.3.0.2
Costa (1983)
Zaidi and Duhamel (2005)
Heegard and Gamal (1983)
The system can be further analyzed for six particular cases, as listed in Table 4.2.
Let us first recall that the capacity in the more favorable case, where there is a perfect
P
knowledge of S both at the encoder and decoder, is equal to C ∗ = 12 ln 1 + N
.
Costa showed that this capacity is achievable through Gaussian distributions and
2
3
See Appendix A.1 for the derivation of the achievable rate.
See Appendix A.2 for the method of derivation.
- 88 -
4.3. Achievable Rate
Figure 4.2: P = Q = N = 1, Graphs of R(α) for {L,K} pairs {0, 0}, {0, 1}, {1, 0},
{1, 1}, {0, ∞} and {1, ∞}. The rate of transmission R(α) is calculated in nats
per unit transmission (Maximum value 0.3466 nats/transmission corresponds to 1
bit/transmission).
the construction U = X + αS and that, while we keep a perfect knowledge of S at
the encoder, the capacity is still reached even if there is no side information at the
decoder side. Hence, this construction is particularly interesting, and our purpose
here is to study the maximum achievable rates it reaches in several other cases.
We will first consider, as Costa did, a perfect knowledge of S at the encoder side,
deriving cases A, B and C to distinguish the different amount of information at the
decoder side. Cases A ([perfect,perfect]) and C ([perfect,∅]) are not new, since they
correspond to the ones explored by Costa. In both, the maximum achievable rate is
equal to C ∗ and then the capacity is reached. The conclusion concerning the capacity
for Case B ([perfect,partial]) could be derived from Case C, as it is weaker, but we
give here the proper expression of the achievable rate, that was not stated by Costa,
and see in Section 4.4 that for non optimal values of α there is some possible gain.
- 89 -
Chapter 4. Dirty Paper Coding with Partial State Information
4.3.0.1
Case A
S1 = S, S2 = S. This corresponds to the encoder-decoder state pair [perfect, perfect]
where K → 0 and L → 0. Then the achievable rate is
1
P
RCase-A =
lim
R(α) = ln 1 +
(4.7)
K→0, L→0
2
N
which is independent of α and reaches C ∗ , hence showing that it is in fact the capacity, and that the capacity is achieved by this construction. Hence, there is no
need of an auxiliary variable U , and we simply have U = X. The graph of Case A
is presented in Figure 4.2 where P = Q = N = 1 and K = L = 0.
4.3.0.2
Case B
S1 = S, S2 = S + T . This corresponds to the encoder-decoder state pair [perfect,
partial] where L → 0. The achievable rate of the system is given by
1
RCase-B (α) = lim R(α) = ln
L→0
2
P (K(P + Q + N ) + Q(P + N ))
P QK(1 − α)2 + N K(P + α2 Q) + P N Q
. (4.8)
⋄
RCase-B (α) is maximized
for α∗ = P/(P + N ) which corresponds to a rate of
P
1
⋄
RCase-B (α ) = 2 ln 1 + N = C . Hence, here also the capacity can be reached
by this construction. It is not really surprising, since Costa showed (as we recall
in Case C, see below) that the capacity C ∗ can be reached by this construction,
when there is a perfect side information at the encoder and even if there is no side
information at the decoder. The graph of Case B is presented in Figure 4.2 where
P = Q = N = K = 1.
4.3.0.3
Case C
S1 = S, S2 = S + T which corresponds to the encoder-decoder state pair [perfect,
∅] where L → 0 and K → ∞.
The achievable rate becomes
1
RCase-C (α) =
lim
R(α) = ln
K→∞, L→0
2
P (P + Q + N )
P Q(1 − α)2 + N (P + α2 Q)
(4.9)
P
This rate is maximized for α⋄ = P/(P +N ) then giving RCase C (α⋄ ) = 12 ln 1 + N
=
∗
C . As Costa showed, the capacity is then reached. The graph of Costa’s limit can
be seen in Figure 4.2 for P = Q = N = 1.
- 90 -
4.3. Achievable Rate
Now, more interesting cases are the ones where the knowledge at the encoder
side is only partial. We will first consider in Case D the case where S is perfectly
known at the decoder side, and show that the maximum achievable rate still reaches
C ∗ . Then, we will consider in Case E the possibility for the decoder to access no
side information at all, and see that there is a loss in terms of maximum achievable
rate. At last, we will consider in Case F the case where there is no knowledge at the
encoder, but a perfect one at the decoder side, showing that the maximum achievable
rate hence reaches C ∗ .
4.3.0.4
Case D
S1 = S + θ, S2 = S. The encoder-decoder state pair is [partial, perfect] where
K → 0. The achievable rate in this case is
1
P (P + N )
RCase-D = lim R(α) = ln
.
(4.10)
K→0
2
α2 L(P + N ) + P N
The rate RCase-D is independent of the state power Q. It is maximized for α∇ = 0
P
= C ∗.
which corresponds to a maximum achievable rate of RCase-D (α∇ ) = 12 ln 1 + N
Actually, if the state is perfectly known to the decoder, but the encoder has a noisy
version of the state, the rate is maximized when we consider U = X, and the capacity
is still reached with this construction. The graph of RCase-D is given in Figure 4.2
for P = Q = N = L = 1.
4.3.0.5
Case E
S1 = S +θ, S2 = S +T . The encoder-decoder state pair is [partial, ∅] where K → ∞.
For this setup the rate is
P (P + Q + N )
1
,
RCase-E (α) = lim R(α) = ln
K→∞
2
P Q(1 − α)2 + N (P + α2 (Q + L)) + α2 L(P + Q)
(4.11)
It is maximized for α† = P Q/(P Q + QN + LP + LQ + LN ) which corresponds
to a rate of
P (Q + L)
1
†
.
(4.12)
RCase-E (α ) = ln 1 +
2
N (Q + L) + QL
The graph of Case E can be seen in Figure 4.2 for P = Q = N = L = 1. Please note
that there exists a loss in Case E with respect to Case A (RCase-E (α† ) < RCase-A ).
Here, we cannot state that RCase-E (α† ) corresponds to a capacity: it is the maximum
achievable rate for our construction. Zaidi et. al. (Zaidi and Duhamel, 2005)
analyze the capacity loss of a setup similar to the Case E such that the channel state
S is not perfectly available to the encoder and is defined by S = S1 + θ where in
our case S1 = S + θ. A practical code construction technique for this setup can be
found in Zamir et al (2002).
- 91 -
Chapter 4. Dirty Paper Coding with Partial State Information
4.3.0.6
Case F
S1 = ∅, S2 = S. The encoder-decoder state pair is [∅, perfect] where K → 0 and
α = 0. For this setup the rate is
1
P
RCase-F = lim R(0) = ln 1 +
.
(4.13)
K→0
2
N
Since there is no state information available at the encoder, the auxiliary variable
U is U = X. Please remark that the capacity is reached, stating its value for this
case, and showing that this construction enables to achieve it.
4.4
Capacity/rate gain / loss analysis
In this section, we analyze the capacity analysis of Dirty paper codes with partial
state information at the encoder and decoder sides given in Equation 4.5, and the
special cases of this setup which are given in Section 4.3. Moreover the rate gain/loss
between the special cases where the encoder does not have knowledge to the optimum
coding parameter α. Since the capacity/maximum achievable rate has a non-negative
value, the gain is calculated such that the capacity/maximum achievable rate of a
system is defined as max{0, R(α)}.
4.4.1
For optimum values of α
If the transmitter uses the optimum value of the α parameter for each setup, there
is no capacity gain nor loss for the particular cases A,B,C, D and F. The achievable
capacity in that case is given by
1
P
⋄
⋄
∇
RCase A = RCase B (α ) = RCase C (α ) = RCase D (α ) = ln 1 +
= C ∗.
2
N
(4.14)
In Case E, the optimum value of α yields a maximum achievable rate loss:
P QL
1
†
⋄
.
Rloss Case E = RCase-E (α ) − RCosta (α ) = − ln 1 +
2
N ((Q + L)(P + N ) + QL)
(4.15)
Similarly, for the optimum value of α, the maximum achievable rate loss for the
general case is
1
P QLK
∗
⋄
Rloss general = R(α )−RCosta (α ) = − ln 1 +
.
2
N ((P + N )(QK + QL + LK) + QLK)
(4.16)
- 92 -
4.4. Capacity/rate gain / loss analysis
4.4.2
For non optimum values of α
However, in actual systems, the transmitter does not have perfect knowledge of the
additive variances N, Q, L, and K, so can not always code with the optimum α
parameter. Assuming that the coding is done with a non-optimum α, we analyze the
rate gain or loss with respect to Costa’s coding setup [perfect,∅] . For instance, for
using the same non-optimal α, there exists a rate gain in Case B [perfect, partial]
with respect to Costa’s setup which is given by:
Cgain CaseB-C (α) = max{0, RCase-B (α)} − max{0, Rcosta (α)}

P Q(1−α)2 +N (P +α2 Q)
K(P +Q+N )+Q(P +N )
1

.
ln
if Rcosta (α) > 0

2 +N K(P +α2 Q)+P N Q)
(P +Q+N )
(P QK(1−α)
 2 P (K(P +Q+N )+Q(P +N ))
1
=
else if RCase-B (α) > 0
2 ln P QK(1−α)2 +N K(P +α2 Q)+P N Q



0
otherwise
(4.17)
Let us define the
to State Ratio (SSR) and Signal to Noise Ratio (SNR) as
Signal
P
P
.
SSR = 10 log10 Q and SNR = 10 log10 N
The graphs showing the capacity gains between Case B and Costa’s setup for
an SNR value ranging between −15 dB and +15 dB for different values of the α
parameter and of 10 log(Q/K) (∞, 6 dB, 2.1 dB and −1 dB) in Figure 4.3. We fix
P = 1, L = 0, SSR = −6 dB4 . We observe that, given the values of P, Q, K and
fixing α, the capacity is maximized for a certain SNR value such that RCase-B (α) =
Rcosta (α⋄ ), hence there is 0 capacity gain for that SNR value. However, for other SNR
values, there always exists a capacity gain with respect to Rcosta (α). It is also evident
that, given fixed P, Q, N values and an estimate of α, decreasing the 10 log(Q/K)
value decreases the capacity gain. Voloshynovskiy et al (2004) assumed the
statistics of the state modeled as a mixture of Gaussian distributions, to be available
at the decoder. When a noisy version of the state is available at the decoder (Case
B) with 10 log(Q/K) ∼
= 2 dB (See Figure 4.3(c)), the same rate gain with respect
to Costa’s Setup is observed as in Voloshynovskiy et al (2004). For higher
values of 10 log(Q/K), a higher capacity gain can be obtained with respect to the
only knowledge at the decoder of statistical distributions of the state information.
In Case E [partial, ∅], without optimum α at the transmitter, there is a maximum
4
Such low SSR values are relevant for practical application such as watermarking.
- 93 -
Chapter 4. Dirty Paper Coding with Partial State Information
achievable rate loss with respect to Costa’s setup, given by
Closs CaseE-C (α) = max{0, RCase-E (α)} − max{0, Rcosta (α)}

P Q(1−α)2 +N (P +α2 Q)
1

ln
if RCase-E (α) > 0

2 (Q+L))+α2 L(P Q+P K+QK+N Q)+P N Q
 2 P QK(1−α)2 +N K(P +α
2
2
P Q(1−α) +N (P +α Q)
1
=
else if Rcosta (α) > 0
2 ln
P (P +Q+N )



0
otherwise
(4.18)
The maximum achievable rate loss versus SNR graphs between Case E and
Costa’s setup can be found in Figure 4.4 where P = 1, SSR = −6 dB, 10 log(Q/L) =
2.1 dB and 6 dB, for an SNR value ranging between −15 dB and 15 dB.
Finally, without optimum α parameter, there exists a maximum achievable rate
gain or loss between the general case [partial,partial] and Costa’s setup [perfect,∅]
expressed as a function of P, N, Q, L, K and α. Figure 4.5 shows the maximum
achievable rate gain/loss versus SNR graph for an SNR value ranging between −15
dB and 15 dB. Please note that P = L = 1, SSR = −6 dB, 10 log(Q/K) = 2.1 dB
(for Figure 4.5(a)) and 10 log(Q/K) = 6 dB (for Figure 4.5(b)). Then, the maximum
achievable rate gain/loss is plotted for several alpha values: 0, 0.2, 0.4 and 0.6.
4.5
Conclusion
This chapter has analyzed the maximum achievable rate losses and gains for the
general setup where the partial state information is available at the encoder and at
the decoder under Gaussian interference. In particular, we derived the capacity for
the case [partial or ∅,perfect], showing that Costa’s construction enables to reach it;
this is not the case of [partial,partial or ∅], for which only a maximum achievable
rate has ben stated. We then analyzed the gain/loss in terms of achievable rates if
the optimal coding parameter α is not accessible to the encoder. This general setup
is relevant for practical applications such as watermarking under desynchronization
attacks and point-to-point communication over fading channel where the receiver
has an estimate of the channel state.
- 94 -
4.5. Conclusion
(a) Capacity gain for 10 log(Q/K)= ∞.
(c) Capacity gain for 10 log(Q/K)= 2.1 dB.
(b) Capacity gain for 10 log(Q/K)= 6 dB.
(d) Capacity gain for 10 log(Q/K)= −1 dB.
Figure 4.3: Capacity gain (between RCase-B (α) and RCosta (α)) versus SNR, for different α values where P = 1, SSR= −6 dB and various 10 log(Q/K) values, with
perfect knowledge of the channel state information at the encoder (L = 0).
- 95 -
Chapter 4. Dirty Paper Coding with Partial State Information
(a) Maximum achievable rate loss for 10 log(Q/L)= 2.1 dB.
(b) Maximum achievable rate loss for 10 log(Q/L)= 6 dB.
Figure 4.4: Maximum achievable rate loss (between RCase E (α) and RCosta (α)) versus
SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/L) values.
- 96 -
4.5. Conclusion
(a) Maximum achievable rate gain/loss for 10 log(Q/K)= 2.1 dB.
(b) Maximum achievable rate gain/loss for 10 log(Q/K)= 6 dB.
Figure 4.5: Maximum achievable rate gain or loss (between R(α) and RCosta (α))
versus SNR, for different α values where P= 1, SSR= −6 dB and various 10 log(Q/K)
values, with partial knowledge of the channel state information at the encoder (L=1).
- 97 -
Chapter 4. Dirty Paper Coding with Partial State Information
- 98 -
Chapter 5
Data Hiding and Distributed
Source Coding
Contents
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.1 List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.1.2
Formal Statement of Problem . . . . . . . . . . . . . . . . . 103
5.1.2.1 Data Hiding (F 1, G1) . . . . . . . . . . . . . . . . 105
5.1.2.2
5.1.2.3
5.2
Source Coding (F 2, G2) . . . . . . . . . . . . . . . 105
Summary of the overall setup . . . . . . . . . . . 106
5.1.3 Summary of Results . . . . . . . . . . . . . . . . . . . . . . 106
Theoretical Background . . . . . . . . . . . . . . . . . . . 107
5.2.1 Channel Coding with Side Information (CCSI) . . . . . . . 107
5.2.2 Source Coding with Side Information (SCSI) . . . . . . . . 108
5.3 Contribution 1: Capacity Analysis for Multivariate Gaussian Source . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3.1 Evaluation of the Rate Distortion Function of the Carrier . 109
5.3.2 Capacity of the channel . . . . . . . . . . . . . . . . . . . . 113
5.4 Contribution 2: Practical Code Design . . . . . . . . . . 115
5.4.1 Practical Code Design for the Multivariate Gaussian Case . 115
5.4.1.1
5.4.1.2
5.4.1.3
5.4.2
Data Hiding Coder-Decoder Pair (F 1 − G1) . . . 116
Distributed Source Coding Coder-Decoder Pair
(F 2 − G2) . . . . . . . . . . . . . . . . . . . . . . 116
Theoretical Limits and Performance Analysis of
the Proposed System . . . . . . . . . . . . . . . . 117
Practical Code Design for Discrete Case . . . . . . . . . . . 117
5.4.2.1 Data Hiding Coder-Decoder Pair (F 1 − G1) . . . 117
5.4.2.2
Distributed Source Coding Coder-Decoder Pair
(F 2 − G2) . . . . . . . . . . . . . . . . . . . . . . 118
- 99 -
Chapter 5. Data Hiding and Distributed Source Coding
5.5
5.4.2.3
Conclusion
Experimental Setup . . . . . . . . . . . . . . . . . 118
. . . . . . . . . . . . . . . . . . . . . . . . . . . 118
- 100 -
5.1. Introduction
We address the problem of combination of the Informed Data Hiding and Distributed
Source Coding within a single system. With the existing limited power devices such
as multi-sensor systems, PDAs etc., the researchers are attracted to low complexity data compression and watermarking applications. In this work, we provide an
original framework based on Distributed Source Coding (DSC) and informed data
hiding (IDH) which uses duality between the source and the channel coding with side
information. A mark M is inserted into an host signal S with a fidelity criterion
d(S, W ) ≤ D1 , and then the watermarked signal W is compressed; given that Ŝ, a
noisy version of the host signal available only to the decoder. The decoder estimates
both the message M̂ with a low probability of error Pe (M 6= M̂ ) ≤ 10−5 and the host
signal Ŵ with a fidelity criterion d(W, Ŵ ) ≤ D2 . The rate-distortion function of
the compression of the watermarked signal W and the capacity of the overall system
is derived for the Gaussian case. Moreover a practical code design based on Trellis
Coded Quantization (TCQ) and Low Density Parity Check (LDPC) codes is proposed
and is evaluated for both binary and gaussian input cases.1
5.1
Introduction
Both the Gel’fand Pinsker (G-P) model of channel coding with side information at
the encoder Gel’fand and Pinsker (1980) and the Slepian-Wolf (S-W) model of
lossless source coding with side information at the decoder Slepian and Wolf
(1973) have various practical applications such as blind watermarking, distributed
video coding, writing on defected cells. G-P model is extended to the continuous
alphabet gaussian sources by Costa Costa (1983), and lossy version of the S-W is
developed by Wyner-Ziv (W-Z) Wyner and Ziv (1976). The duality between these
channel coding and source coding problems is studied in Pradhan et al (2003);
Su et al (2000), and a more general model where the state information is partially
available to the encoder and the decoder (which need not to be the same) is studied
both for source coding Cover and Chiang (2002) and the channel coding Moulin
and Wang (2007); Voloshynovskiy et al (2004)cases.
Recently, various combinations of these source-channel coding schemes have been
investigated, for instance combined data hiding and lossy compression with state
information available only to the encoder in Maor and Merhav (2005); Yang and
Sun (2006), and joint source-channel coding for W-Z and G-P channels where there
exists two parallel channels in Merhav and Shamai (2003). In this chapter, we
address the two problems i) data hiding with the state information available to the
encoder and partial state information available to the decoder; ii) lossy compression
with partial state information available only to the decoder. These two problems
can be applied together within a scenario described as in the following.
Consider the communication problem shown in Figure 5.1. Alice wants to send
1
This chapter corresponds to a paper that will be soon submitted. It has been presented partially
in Dikici et al (2006b) and it is also related to the work of Dikici et al (2006c).
- 101 -
Chapter 5. Data Hiding and Distributed Source Coding
Figure 5.1: A Communication System between Alice and Bob via a non-secure Carrier.
a message M to Bob with a non-secure Carrier. She uses a host signal S which
is available only to her, and Ŝ a noisy version of the host signal is available to
Bob. Alice does not share her host signal with neither Bob nor the Carrier, however
Bob shares his noisy version with the Carrier at the decoding end. Alice embeds
her secret message M within the host signal S with a fidelity criterion such that
Ed(S, W ) ≤ D1 . Carrier wants to compress the message W while guaranteeing a
quality of service (QoS) to Alice and Bob such that his delivered copy Ŵ satisfies the
constraint Ed(W, Ŵ ) ≤ D2 . Hence the Carrier compresses W with the information
that Bob will share his noisy copy Ŝ at the decoding end. After the delivery of
Ŵ to Bob, Bob extracts the hidden message M̂ using his noisy copy Ŝ with a low
probability of error Pe .
The novelty of our work is that we analyzed the theoretical limits of the system, and then propose a practical code design that operates close to the theoretical
limits. One of the application areas of this system can be the development of a
low complexity encoder for a mobile handheld that can compress the redundancy
of the multimedia data while also carefully embedding the hidden information like
meta-data.
5.1.1
List of Symbols
List of symbols that are used in this chapter can be find below.
M
M
M̂
Pe
RC
RS
Discrete Message to be transmitted (Watermark).
Alphabet of Watermark.
Decoded Watermark.
Probability of decoding error.
Capacity of the data-hiding system.
Compression rate.
- 102 -
5.1. Introduction
F 1 − G1
F 2 − G2
X
S
Ŝ
W
Ŵ
B, T, Z
U
α
D, D1 , D2
N
Σ
Q, K, Ñ
h(X)
h(X, Y, Z)
I(X; Y |Z)
E
C0
C1
5.1.2
Encoder-decoder pair of data hiding.
Encoder-decoder pair of the Wyner-Ziv compression.
Stegotexts.
State information.
Partial state information available to the decoder.
Watermarked data.
Decompressed watermarked data at the decoder.
Additive random noises.
Auxiliary variable.
A constant for coding with side information.
Distortion levels.
Gaussian distribution.
Covariance matrix.
Variances of S, T and Z respectively.
Differential entropy of X.
Joint differential entropy of X, Y and Z.
Mutual information of X and Y given Z.
Expectation operator.
Source code.
Channel code.
Formal Statement of Problem
In this section we will give a precise statement of the problem which we stated
informally in the previous section.
Here we consider the discrete-valued hidden message M , and continuous-valued
host signal S and side information Ŝ. Specifically, the sequence {(Sn , Ŝn )}∞
n=1 repre-
- 103 -
Chapter 5. Data Hiding and Distributed Source Coding
Figure 5.2: Data Hiding + Source Coding Scheme.
sents independent samples of a pair of dependent random variables (S, Ŝ) with joint
probability p(s, ŝ), taking values within continuous
infinite set alphabet S × Ŝ, that is
Q
for any n and sn ×ŝn ∈ S n × Ŝ n , p(sn , ŝn ) = ni p(si , ŝi ). (S, Ŝ, W, Ŵ ) has joint probability distribution p(S, Ŝ, W, Ŵ ) and takes values within the set S × Ŝ × W × Ŵ.
An index M ∈ {1, ..., 2nRC } will be sent to the receiver in n uses of the channel, where RC is the embedding capacity of the channel per transmission. The
sequence {Xn } ∞
n=1 which takes values within infinite set X with a power constrained
E(d(S, S + X)) ≤ D1 is used to transmit the index M , where X is independent
given S and Ŝ. Furthermore the coded signal W is compressed by sending an index
V ∈ {1, ..., 2nRS } with a fidelity criterion E(d(W, Ŵ )) ≤ D2 where RS is the rate of
the Carrier per transmission given for a distortion D2 .
The goal is to form the best estimate of M̂ with the probability of decoding
error Pe → 0 respecting the fidelity criteria E(d(S, W )) ≤ D1 and E(d(W, Ŵ )) ≤
D2 , where S is available only to the embedding process and Ŝ is available to the
decompression and extraction.
This problem involves interplay between source coding and channel coding with
side information. We consider the following system, involving embedding-extraction
and compression-decompression pairs which are marked as [F 1 − G1] and [F 2 −
G2] respectively. Let us define the data hiding (F1,G1) and source coding (F2,G2)
mappings respectively in the following sections.
- 104 -
5.1. Introduction
5.1.2.1
Data Hiding (F 1, G1)
There is a mapping pair F1 and G1 given as
F 1 : M × S n → X n,
(5.1)
where E(d(X, 0)) ≤ D1 , and W is defined as W = X + S so E(d(S, W )) ≤ D1 ; and
G1 : Ŵ n × Ŝ n → M̂,
(5.2)
where E(d(W, Ŵ )) ≤ D2 . Given an encoder decoder pair [F1-G1], the error probability averaged over all possible messages M and all host signal S n is defined by
p(F 1, G1) = P r{M̂ 6= M }.
Definition 5.1 RC is the achievable rate if there exists a encoder decoder pair F1G1 such that p(F 1, G1) → 0. The capacity C is the supremum of the achievable rates.
5.1.2.2
Source Coding (F 2, G2)
A source code (n, v, ∆) is defined by two mappings F2, G2, an encoder and a decoder
respectively, where
F 2 : W n → {1, 2, ..., v} ,
(5.3)
and
G2 : {1, 2, ..., v} × Ŝ n → Ŵ n ,
(5.4)
d(W, Ŵ) = ∆
(5.5)
and
Definition 5.2 A pair (RS , D2 ) is said to be achievable if, for arbitrary ǫ > 0, there
exists (for n sufficiently large) a code (n, v, ∆) with
v ≤ 2n(RS +ǫ) , ∆ ≤ D2 + ǫ.
(5.6)
Definition 5.3 The rate distortion function R(D2 ) is
R(D2 ) =
min
(RS ,D2 )∈R
where R is the set of achievable (RS , D2 ) pairs.
- 105 -
RS ,
(5.7)
Chapter 5. Data Hiding and Distributed Source Coding
5.1.2.3
Summary of the overall setup
Sender has access to the realization of the secret message M and the noncausal host
signal realization sn . The encoder function F 1 finds
xn = F 1(M, sn )
(5.8)
P
with a power criterion 1/n xi 2 ≤ D1 . Then, the sender passes the watermarked
signal wn = sn +xn to the unreliable carrier. The carrier compresses the watermarked
signal as
v = F 2(wn ) = F 2(sn + F 1(M, sn )),
(5.9)
and transmit it to the receiver. The receiver shares its noisy version of the host
signal with the carrier and the carrier reconstructs the watermarked signal
ŵn = G2(v, ŝn ) = G2(F 2(sn + F 1(M, sn )), ŝn ),
(5.10)
with a fidelity criterion d(wn , ŵn ) ≤ D2 . At the final step, the receiver estimates the
secret message
M̂ = G1(ŵn , ŝn ) = G1(G2(F 2(sn + F 1(M, sn )), ŝn ), ŝn ).
5.1.3
(5.11)
Summary of Results
In this chapter, we give the rate distortion function of the Carrier and the capacity
formula of the system with continuous alphabet gaussian distributed state information. Letting f (S, X, T ) has a multivariate Gaussian distribution ∼ N (0, ΣS,X,T )
where the covariance matrix ΣS,X,T = diag(Q, D1 , K). Defining the state information available to the decoders G1 and G2 as Ŝ = S + T and watermarked signal as
W = S + X, Theorem-5.2 states that the minimum rate for a mean distortion level
E{d(W, Ŵ )} ≤ D2 for the Carrier is
(
QK
QK
D1
1
0 < D2 < D1 + Q+K
ln
+
,
2
D2
(Q+K)D2 ,
RS (D2 ) =
(5.12)
QK
0,
D2 ≥ D1 + Q+K
in nats per channel use. Moreover, according to the Theorem-5.3, the capacity of
the overall system is given as
1
D1 (D1 + Q − D2 )
RC = ln 1 +
(5.13)
2
D2 (D1 + Q)
in nats per channel use.
Some of our remarks can be found below:
• Remark-1: The rate distortion function RS (D2 ) of the carrier is the same as
in the case where the state information Ŝ is accessible at the both compressor
(F 2) and de-compressor (G2).
- 106 -
5.2. Theoretical Background
• Remark-2: The overall capacity RC does not depends on the accessibility
of Ŝ to the decoder G1 or not. In return, the accessibility of Ŝ to the decompressor (G2) affects the capacity RC indirectly because RC depends on D2
(Equation-5.13), where D2 depends on K (Equation-5.12).
• Remark-3: Unlike the capacity term found in Equation-4.8 in page 90, the
overall capacity RC depends on the variance Q of the host signal S.
Finally a practical coding approach for both Gaussian case is proposed using
superposition coding in Chapter-3.5 and LDPC binning method in Chapter-2.4. And
a similar coding scheme is done for the the Binary Symmetric Case.
The organization of the remaining of this Chapter as follows:
After giving the theoretical background of the source-channel coding in Chapter5.2, Chapter-5.3 focuses on the rate-distortion function of the carrier and the overall
capacity analysis of the system. You can find the proofs of the rate distortion function
and the capacity term in this section. Then the practical code design for Gaussian
Case is given in Chapter-5.4.1 and practical code design for the binary symmetric
case is given in Chapter-5.4.2.
5.2
5.2.1
Theoretical Background
Channel Coding with Side Information (CCSI)
The capacity of the memoryless channel p(y|x, s, ŝ) with state information (S, Ŝ)
i.i.d. p(s, ŝ), all taking values from finite alphabet, with S n available to the sender
and Ŝ n available to the receiver noncausally (See Figure5.3) is given in Cover and
Chiang (2002) as
C = max [I(U ; Y, Ŝ) − I(U ; S)],
p(x,u|s)
(5.14)
where the maximum is over all joint distributions of p(u)p(s, ŝ, x|u)p(y|x, s, ŝ), where
U is an auxiliary random variable with finite cardinality.
Figure 5.3: Channel Coding with two sided state information scheme
- 107 -
Chapter 5. Data Hiding and Distributed Source Coding
Moreover, the general capacity expression for the continuous alphabet case has
been stated in Moulin and Wang (2007):
C = sup
min [I(U ; Y, Ŝ) − I(U ; S̃)],
p(x,u|s̃) p(y|x,s)
(5.15)
where s̃ is the state information partially available to the encoder.
The achievable rate region for the continuous alphabet Gaussian case has been
derived in Chapter-4. The readers can refer to Chapter-3.2for more detailed background on CCSI.
5.2.2
Source Coding with Side Information (SCSI)
The details of the two main theorems concerning SCSI are given in Chapter-2.2.
While Slepian and Wolf (1973) derived the minimum achievable rate for lossless
compression of discrete input source, Wyner and Ziv (1976) extended this theory
to lossy case and derived the rate distortion function for the binary symmetric case
and the continuous alphabet gaussian input case.
Figure 5.4: Rate Distortion theory with side information at the decoder: Wyner-Ziv
Setup. Scheme.
5.3
Contribution 1: Capacity Analysis for Multivariate
Gaussian Source
In this section, we derive the capacity of the multivariate gaussian IDH-DSC communication problem shown in Figure 5.5. An index M ∈ {1, ..., m} will be sent to the
receiver in n uses of the channel, where m is the greatest integer smaller than or equal
to enRC , and RC is the rate in nats per transmission. Let S = (S1 , S2 , ..., Sn ) be the
sequence of noncausal state of the channel for n transmissions perfectly known to
the encoder, assumed to be a sequence of independent identically distributed (i.i.d.)
N (0, Q) random variables. We consider the case where this sequence of state is
partially known to the decoder Ŝ = (Ŝ1 , Ŝ2 , ..., Ŝn ) noncausally and is modeled as
Ŝ = S + T where θ is i.i.d. random variable according to N (0, K). We use the
squared error metric for the distortion measure of gaussian source. We first evaluate
the rate distortion function of the Carrier RS (D2 ), and then find the capacity of the
overall system RC .
- 108 -
5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source
Figure 5.5: Multivariate Gaussian Channel of IDH DSC scheme.
5.3.1
Evaluation of the Rate Distortion Function of the Carrier
Consider the communication channel for the Carrier point of view in Figure-5.6.
The noisy version of the host signal sn , such that Ŝ = S + T , is available to the
encoder when the Switch-A is closed, and it is not available to the encoder if the
Switch-A is open. We are interested in the case where the Switch-A is open, however to derive the rate distortion function for this case, the case where the switch is
closed is employed. We assume that the joint distribution f (S, X, T ) has a multivariate Gaussian distribution ∼ N (0, ΣS,X,T ) where the covariance matrix ΣS,X,T is
ΣS,X,T = diag(Q, D1 , K).
Definition 5.4 If switch-A is closed, the rate distortion function RW |Ŝ (D2 ) for compressing W given a noisy observation Ŝ available both to the encoder and to the
decoder with a fidelity criterion d(W, Ŵ ) ≤ D2 is defined as
RW |Ŝ (D2 ) =
min
p(ŵ|w,s):E{d(w,ŵ)}≤D2
I(W ; Ŵ |Ŝ).
(5.16)
∗
Definition 5.5 If switch-A is open, the rate distortion function RW
(D2 ) for com|Ŝ
pressing W given a noisy observation Ŝ available only to the decoder with a fidelity
criterion d(W, Ŵ ) ≤ D2 (See Figure5.6) is:
h
i
∗
RW
(D
)
=
inf
I(W
;
E)
−
I(
Ŝ;
E)
,
(5.17)
2
|Ŝ
p(ŵ|w,s):E{d(w,ŵ)}≤D2
where E is an auxiliary variable.
Theorem 5.1 The rate distortion function RW |Ŝ (D2 ) is:
- 109 -
Chapter 5. Data Hiding and Distributed Source Coding
Figure 5.6: Multivariate Gaussian Case: Carrier point of view.
RW |Ŝ (D2 ) =
(
1
2
ln
0,
D1
D2
+
QK
(Q+K)D2
0 < D2 < D1 +
D2 ≥ D 1 +
QK
Q+K ,
QK
Q+K
.
(5.18)
Proof : We first find a lower bound for the rate distortion function. Then prove
that this is achievable.
Since E{d(w, ŵ)} ≤ D2 , we observe
I(W ; Ŵ |Ŝ) = h(W |Ŝ) − h(W |Ŝ, Ŵ )
= h(W, Ŝ) − h(Ŝ) − h(W − Ŵ |Ŝ, Ŵ )
≥ h(W, Ŝ) − h(Ŝ) − h(W − Ŵ )
(5.19)
≥ h(W, Ŝ) − h(Ŝ) − h(N (0, Ed(W, Ŵ )))
(5.20)
1
1
= h(W, S) − ln((2πe)(Q + K)) − ln((2πe)D2 )
2
2
1
1
= ln((2πe)2 ((Q + D1 )(Q + K) − Q2 )) − ln((2πe)2 (Q + K)D2 )
2
2
(5.21)
D1 (Q + K) + QK
1
= ln
2
(Q + K)D2
1
D1
QK
= ln
+
.
(5.22)
2
D2 (Q + K)D2
where h is the differential entropy defined in Chapter-1.2. Please note that
Equation 5.19 follows from the fact that conditioning reduces entropy, Equation 5.20 follows from the fact that the gaussian distribution maximizes the
entropy for a given variance, and Equation 5.21 follows from the fact that the
- 110 -
5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source
joint probability of p(w, s) is multivariate gaussian distribution
and covariance matrix
Q + D1
Q
Σ(w,s) =
.
Q
Q+K
2
with mean 0
(5.23)
Hence
1
RW |Ŝ (D2 ) ≥ ln
2
D1
QK
+
D2 (Q + K)D2
in nats,
(5.24)
or
1
RW |Ŝ (D2 ) ≥ log2
2
D1
QK
+
D2 (Q + K)D2
in bits.
(5.25)
To find the conditional density f (ŵ|w) that achieves this lower bound, it is
more convenient to look at the test channel (conditional density f (w|ŵ)) to
construct f (w|ŵ) to achieve the equality in the bound. We choose the joint
QK
}},
distribution as shown in Figure5.7. If D2 ≤ max{0, min{Q + D1 , D1 + Q+K
we choose
W = Ŵ + B,
Ŵ ∼ N (0, Q + D1 − D2 ),
For greater values of D2 , if Q + D1 < D1 +
QK
Q+K
B ∼ N (0, D2 )
(5.26)
we choose Ŵ = 0 with
probability 1 achieving R(D2 ) = 0, and if Q + D1 ≥ D1 +
QK
Q+K
we choose
Ŵ = Ŝ with probability 1 achieving R(D2 ) = 0.
This completes the proof.
Figure 5.7: Gaussian test channel that achieves lower bound found in Equation 5.21.
Input: Ŵ ∼ N (0, Q + D1 − D2 ), output: Ŵ ∼ N (0, Q + D1 ).
2
2
See AppendixA.3 for the formula of joint differential entropy of a multivariate gaussian distributed random variables.
- 111 -
Chapter 5. Data Hiding and Distributed Source Coding
Please note that the equivalence of this test channel can be constructed as W is
the input and Ŵ is the output of the channel by using an addition and a multiplication operation (See Figure 5.8). The equivalent channel outputs Ŵ = (W + Z) · a
D2 (D1 +Q)
where Z is i.i.d. N (0, D
), and a is a constant multiplier defined as a =
1 +Q−D2
D1 +Q−D2
D1 +Q .
Then Ŵ has a gaussian distribution with 0 mean and variance
2
σŴ
= a2 ((D1 + Q) +
D2 (D1 + Q)
) = D1 + Q − D2 .
D1 + Q − D2
(5.27)
We will use this equivalent channel in our capacity calculations of the overall system.
Figure 5.8: Equivalent setup of the test channel in Figure-5.7 by using an addition
and a multiplication operator.
Theorem 5.2 For the independent multivariate gaussian case, the rate distortion
∗
function RW
(D2 ) has the value
|Ŝ
∗
RW
(D2 ) = RW |Ŝ (D2 ) =
|Ŝ
(
1
2
ln
0,
D1
D2
+
QK
(Q+K)D2
,
0 < D2 < D1 +
D2 ≥ D 1 +
QK
Q+K ,
QK
Q+K
.
(5.28)
Proof : We give a similar proof as in Wyner and Ziv (1976); Oohama (1997). Let
Y , E are conditionally independent given X, then the term I(W ; E) − I(Ŝ; E)
in Equation-5.17 is
I(W ; E) − I(Ŝ; E) = h(E|Ŝ) − h(E|Ŵ )
= h(E|Ŝ) − h(E|Ŵ , Ŝ)
= I(W ; E|Ŝ)
(5.29)
≥ I(W ; Ŵ |Ŝ),
(5.30)
where Equation-5.29 follows from the assumption of Y , E are conditionally
independent given X, and Equation-5.30 follows from the data processing inequality. The equality in Equation-5.30 holds if and only if
- 112 -
5.3. Contribution 1: Capacity Analysis for Multivariate Gaussian Source
h(W, Z|Ŵ , Ŝ) = 0.
(5.31)
For the independent gaussian variables X, T and W in Figure-5.5, the equation
h(W, Z|Ŵ , Ŝ) = 0 holds, and there is no rate loss with respect to the case
∗
Switch-A is closed. Hence RW
(D2 ) = RW |Ŝ (D2 ) and equals to the value as
|Ŝ
in Equation-5.28.
2
5.3.2
Capacity of the channel
In this Section, we derive the achievable communication rate between Alice and
Bob. With our findings of the rate-distortion function of the Carrier in the previous
section, the overall system can be sketched as in Figure-5.9 by replacing the Carrier
step by its equivalent channel setup given in Figure-5.8.
The setup in Figure-5.9 is closely related to the Case-B of "Dirty Paper Coding with Partial State Information" (Chapter-4). The two differences between the
Figure-4.1 in page 86 and the Figure-5.9 are i) The absence of the random variable θ
in Figure-5.9 and ii) A multiplication element is added to the output of the channel
in Figure-5.9 such that it outputs Ŵ = a · (X + S + Z) while the setup in Figure-4.1
outputs Y = X + S + Z. We follow the same methodology as in Chapter-4.3 in order
to find the achievable rate region.
Figure 5.9: Equivalent Scheme of Gaussian Channel.
Theorem 5.3 The capacity RC for the communication system given in Figure-5.9
is
1
D1 (D1 + Q − D2 )
RC = ln 1 +
.
(5.32)
2
D2 (D1 + Q)
- 113 -
Chapter 5. Data Hiding and Distributed Source Coding
Proof : Let X, S, Z, and T are i.i.d. random variables with respective Gaussian
2 (D1 +Q)
distributions N (0, D1 ), N (0, Q), N (0, D
D1 +Q−D2 ), and N (0, K). We define the
variance of the r.v. Z as
D1 +Q−D2
D1 +Q
D2 (D1 +Q)
D1 +Q−D2
= Ñ , and the multiplication constant
= a. Then, the joint distribution f (X, S, Z, T ) has a multivariate
Gaussian distribution ∼ N (0, ΣX,S,Z,T ) with a covariance matrix ΣX,S,Z,T =
diag(D1 , Q, Ñ , K).
The channel outputs Ŵ = a · (X + S + Z). Assuming U = X + αS where α is
a constant to be determined, the joint distribution, f (U, Ŵ , Ŝ) is then a multivariate gaussian with mean 0 and covariance matrix ΣU,Ŵ ,Ŝ = BΣX,S,Z,T Bt
where B is the matrix that satisfies the equation

X
U


 Ŵ  = B ·  S  .
 Z 
Ŝ
T



(5.33)
The solution for the matrix B

1 α 0 0
B= a a a 0 
0 0 0 1

(5.34)
yields to a covariance matrix


D1 + α 2 Q
a(P + αQ)
αQ
ΣU,Ŵ ,Ŝ =  a(P + αQ) a2 (D1 + Q + Ñ )
aQ 
αQ
aQ
Q+K
(5.35)
Then the relevant mutual informations can be calculated to yield
I(U ; Ŵ , Ŝ) = h (U ) + h(Ŵ , Ŝ) − h(U, Ŵ , Ŝ)
= h(X + αS) + h(a(X + S + Z), S + T ) − h(U, Ŵ , Ŝ)
1
= ln (2πe)(D1 + α2 Q)
2
1 + ln (2πe)2 (a2 ((D1 + Q + Ñ )(Q + K) − Q))
2
1 (5.36)
− ln (2πe)3 (a2 (D1 QK(1 − α)2 + Ñ K(D1 + α2 Q) + D1 Ñ Q))
2
(5.37)
and similarly
I(U ; S) = h(U ) + h(S) − h(U, S) =
- 114 -
1
ln
2
D1 + α 2 Q
D1
.
(5.38)
5.4. Contribution 2: Practical Code Design
Then if the term I(U ; Ŵ , Ŝ) − I(U ; S) can be given as a function of α as
!
1
D1 (K(D1 + Q + Ñ ) + Q(D1 + Ñ ))
R(α) = ln
.
(5.39)
2
D1 QK(1 − α)2 + Ñ K(D1 + α2 Q) + D1 Ñ Q
Equation-5.39 has the same form as in Equation-4.8 in page 90. As in same way,
if the Equation-5.39 is maximized with respect to α, the maximum achievable
rate is found to be
1
1
D1
D1 (D1 + Q − D2 )
R(α⋄ ) = ln 1 +
= ln 1 +
(5.40)
2
2
D2 (D1 + Q)
Ñ
for α⋄ = D1 /(D1 + Ñ ). Please note that the maximum achievable rate does
not depends on the correlation noise K. Since the achievable rate can not pass
the capacity where the state information S is perfectly available both to the
encoder and to the decoder which is equal to R(α⋄ ), then the capacity of this
channel is
1
D1 (D1 + Q − D2 )
⋄
RC = R(α ) = ln 1 +
(5.41)
2
D2 (D1 + Q)
This completes the proof.
5.4
2
Contribution 2: Practical Code Design
In the following two sections, we will give practical code design of a hybrid scheme
which utilize both channel coding and rate distortion with state information at the
encoder and decoder for the Data Hiding and Distributed Source Coding problem
introduced in the previous sections. The first code design is designed for evaluating
the theoretical limits calculated for the Gaussian side information case in Chapter5.3. While for the embedding part, the superposition data hiding code explained in
Chapter-3.5 is applied; for the source coding part, we use the DSC coding mechanism
explained in Chapter-2.4. The second practical design one is for the side information
with discrete alphabet case.
5.4.1
Practical Code Design for the Multivariate Gaussian Case
The theoretical rate distortion function of the Carrier and the overall capacity limits
of the communication system given in Figure-5.5 for the Gaussian case is calculated
as Equation-5.28 and Equation-5.32 in Chapter-5.3.
In this section, we propose a hybrid scheme for gaussian case which utilize both
channel coding and rate distortion with state information at the encoder and decoder
respectively. Briefly, Alice has a n-length host vector s where each element of the
vector is i.i.d. with probability distribution ∼ N (0, Q). Bob has a noisy version
of this host vector ŝ = s + t where each element of t is i.i.d. with probability
- 115 -
Chapter 5. Data Hiding and Distributed Source Coding
distribution ∼ N (0, K). At the decoding end, Bob shares this noisy version with the
Carrier. Alice embeds n/2 bit length message M within s (which corresponds to an
embedding rate of RC = 1/2 bit per
P channel use) such that the watermarked signal
w satisfies a fidelity criterion 1/n ni=1 (wi −si )2 ≤ D1 . The Carrier then compresses
the vector w to RS = 1 bit/channel use and decompresses at the decoder side as ŵ
using
Pthe noisy version ŝ shared by Bob such that the MSE distortion level satisfies
1/n ni=1 (wi − ŵi )2 ≤ D2 . In the final stage, Bob extracts the hidden message M̂
with the help of ŵ and ŝ. Decoding error probability can be calculated as
Pn/2
(Mi ⊕ M̂i )
Pe = i=1
,
(5.42)
n/2
P
where
is defined to be summation over real numbers while ⊕ is the modulo-2
summation.
Up to this point, we only fix the embedding rate RC as 1/2 bit per channel use
and compression rate RS as 1 bit per channel use. Below you can find the details of
each block.
5.4.1.1
Data Hiding Coder-Decoder Pair (F 1 − G1)
For F 1 − G1 pair for Alice and Bob, we use the superposition embedding described
in Chapter-3.5. F 1 is composed from an LDPC coder and a TCQ coder. A 1/2
rate LDPC code C1 modulates the hidden message M with a variance of α⋄ D1 as
D1
is a constant that maximizes
described in Chapter-3.5.3, where α⋄ =
D2 (D1 +Q)
(D1 + D
1 +Q−D2
)
Equation-5.40. Then the quantization code C0 finds the embedding error signal x
which has a variance D1 . Finally F 1 outputs the watermarked signal w = s + x.
The decoder G1 receives the noisy observation ŵ from the Carrier and accesses
the noisy state information ŝ. Then extracts the message M̂ using a joint LDPCBCJR decoding algorithm as explained in Chapter-3.5.6.
According to the performance of the data hiding system explained in Chapter-3.5,
for Q = 1 the data can be embedded with a embedding noise variance D1 = 0.062.
For an embedding rate of 1/2 bit per channel use, the hidden message can be decoded
even after an AWGN noise which is 1.5 dB away from the theoretical AWGN noise.
5.4.1.2
Distributed Source Coding Coder-Decoder Pair (F 2 − G2)
We now explain our code design for the Carrier’s F 2 − G2 pair. For F 2, a 4-level
lloyd-max quantizer is used for quantizing w as 2 bit per sample vector wq . The
Carrier then codes these quantized 2n bits by a 2/3 rate LDPC code as explained in
Chapter-2.4, and only the n bit parity vector z is transmitted to the Bob.
In the decoder end G2, with the help of noisy state information ŝ shared by Bob,
the Carrier applies an iterative belief propagation decoding process (See Chapter2.4.3 for details).
- 116 -
5.4. Contribution 2: Practical Code Design
5.4.1.3
Theoretical Limits and Performance Analysis of the Proposed
System
The theoretical limits of the rate-distortion and channel capacity are calculated as
Equation-5.12 and Equation-5.13. Let us fix the embedding capacity RC as 1/2
bit/channel use, the bit-rate of the carrier RS as 2 bits/channel use, the embedding
power D1 as 0.062 and the variance of the host signal Q as 1. The theoretical D2
value in order to achieve this capacity can be found by evaluating Equation-5.13
1
0.062(0.062 + 1 − D2 )
1
= log2 1 +
,
(5.43)
2
2
D2 (0.062 + 1)
which yields D2 = 0.0586. If we replace the theoretical D2 value to the rate distortion
function Equation-5.12 to find the corresponding K value that achieves a rate 1
bit/channel use, we end up with
1
0.062
K
1 = log2
+
,
(5.44)
2
0.0586 (1 + K)0.0586
which corresponds to K = 0.2082. In our system the embedding process can be
perfectly reconstructed up to a MSE level D2 = 0.0422 which corresponds a gap of
0.0586
= 1.43 dB
(5.45)
10 log10
0.0422
from the theoretical setup.
5.4.2
Practical Code Design for Discrete Case
In this section, we develop a toy example for the combined IDH DSC setup in Binary
Symmetric Case. A simple embedding process is followed by DSC coding based
on LDPC binning. The aim is to achieve a low embedding rate with a fidelity
criterion based on hamming distance. Then the watermarked signal is compressed
using Slepian-Wolf coding.
5.4.2.1
Data Hiding Coder-Decoder Pair (F 1 − G1)
For the case of informed data hiding of M within S, we used basic quantization
based on memoryless coset construction. The algorithm is described as follows: 3
bits information is partitioned into 4 cosets such that each element of the coset has
a hamming distance of 3. According to the two bits data of M the coset members
of that index is chosen Coset 00 = {000, 111}, Coset 01 = {001, 110}, Coset 10 =
{010, 101}, Coset 11 = {011, 100}. After creating the codebook, 2bit chunk of M
and R bit chunk of S is taken. And the least significant 3 bits of the sub-block of
the host signal S is depicted for embedding. The 3 bits value of S is quantized to
W : W (S, M ) = arg minZ∈Coset M k Z − S k which W is at most one bit differ from
- 117 -
Chapter 5. Data Hiding and Distributed Source Coding
M . The distance metric is chosen as hamming distance. And this insertion of 2 bits
within block length R continues until embedding all the data. As an example, assume
that the 2 bits length message 01 is being embedded into the least 3 significant bits
of S which is 010. The element which has the minimum hamming distance between
010 and the elements of Coset 01 is chosen as the quantification output, which is
W = 110 in this case. At the decoder side, the extraction of the watermark is
straightforward such that the knowing the codebook and insertion frequency R, the
coset index that the received block data resides in is decoded as the embedded data.
5.4.2.2
Distributed Source Coding Coder-Decoder Pair (F 2 − G2)
For F 2 − D2 pair, we use Syndrome Coding using LDPC. The Carrier codes the
watermarked signal bits W by using a 2/3 rate LDPC code as explained in Chapter2.4, and only the parity vector z is transmitted to the Bob. In the decoder end G2,
with the help of noisy state information ŝ shared by Bob, the Carrier applies an
iterative belief propagation decoding process (See Chapter-2.4.3 for details).
5.4.2.3
Experimental Setup
In our experiments, we fix R = 20 and embed a 50 bit message message M into
4000 bit signal S which is distributed Bernoulli(1/2). Then using 2/3 LDPC binning
scheme as explained in Chapter-2.4 the W is compressed to a 2000 bit length signal
and transmitted to the decoder. The decoder performs a modified belief propagation
decoding using the parity bits of W and the side information Ŝ = S ⊕ T where T
is binary string with bernoulli(p1 ) distribution. The performance of the system for
a block length 4000 is compared with the performance of DSC system without any
embedding explained in Chapter-2.4.4. As seen in Figure-5.10, decoding bit error
rate of the LDPC decoder versus entropy of the correlation noise H(p1 ) is plotted.
The dashed curve corresponds to the case where there is no embedding in to the S
where the other corresponds to the compression performance of the W after 1/20 bit
per sample embedding rate. The embedding process has a performance loss of 0.02
bit per sample if we compared with no embedding case which is acceptable.
5.5
Conclusion
In this chapter, both theoretical and practical analysis of IDH and DSC system is
done. In theoretical part, strong information theoretical results are achieved such as
the derivation of rate distortion function for the non-trust Carrier and the capacity
formula of the overall embedding system. We also concluded interesting remarks on
these theoretical findings such as absence of the noisy state information of the Bob to
the Carrier encoding stage does not change the rate distortion curve. Similarly, the
absence of the original host signal to the Bob does not neither change the capacity of
- 118 -
5.5. Conclusion
Figure 5.10: Embedding performance for 1/200 bit per sample with a compression
of 2 : 1 of the watermarked string using 2/3 rate LDPC code with block length 4000.
Minimum 0.02 bit per sample entropy rate loss with respect to no-embedding case.
the embedding system. Moreover, practical code designs for gaussian case and BSC
case are proposed with the help of our proposed DSC method in Chapter-2 and IDH
method in Chapter-3.
- 119 -
Chapter 5. Data Hiding and Distributed Source Coding
- 120 -
Conclusion
Strongly motivated by the duality between the source coding and the channel coding
with state information, we would like to propose a system that contains data hiding
and efficient compression functionalities. Moreover the theoretical limits of the proposed system should be studied while evaluating the proposed practical design with
respect to the theoretical limits. This subject intersects a wide range of signal processing fundamentals such as error correcting codes, vector quantification, likelihood
marginalization, iterative decoding while the analysis of the system limits is related
strongly to information theory.
The contributions of this dissertation can be grouped as theoretical findings and
practical code designs.
Information Theoretical Contributions
In this dissertation, the theoretical rate distortion function and embedding capacity
bounds are derived for the infinite alphabet gaussian case. Our theoretical contributions can be itemized as follows:
1. Maximum achievable rate of the communication system in Figure-4.1 in page
86, where the channel state information S is partially available to the encoder
as S1 and partially available to the decoder as S2 is derived in Chapter-4. This
general setup is reduced into more simpler cases and each case is analyzed in
detail.
2. The capacity of the communication system in Figure-5.9 in page 113 is evaluated where the state information S is perfectly available to the encoder as
S and partially available to the decoder as Ŝ, and the channel outputs the
compressed signal Ŵ .
3. Rate distortion function of the communication system in Figure-5.6 in page
110 is derived where the compression of S + X is done with a noisy version
S + T is accessible to the decoder.
- 121 -
Conclusion
Table 5.2: Channel Coding with State Information Problems
Problem
Gel’fand and Pinsker (1980)
Costa (1983)
Cover and Chiang (2002)
Dikici et al.
Chapter-4(General Case)
Dikici et al.
Chapter-4(Case-B)
Public Watermarking
Encoder
State S1
S
S
S1
S1
Channel
State Sa
S
S
S
S
Decoder
State S2
S2
S2
Type of
source
Discrete
Gaussian
Discrete
Gaussian
S
S
S2
Gaussian
S
-
-
Discrete or
Gaussian
Table-5.2 briefly gives the existing theoretical studies in the field of channel coding
with side information, and compares with our theoretical contributions in this area.
The problems are defined as the channel state Sa , its availability to the encoder and
to the decoder while the type of the state can be drawn from discrete or continuous
alphabet sets. Hence the rows four and five correspond to our information theoretical
contributions no 1 and no 2 respectively.
Similarly, Table-5.3 positions our contribution no 3 with respect to the source
coding with channel information problems. The encoder input, the decoder’s side
information and the type of sources investigated in each problem are given.
Table 5.3: Source Coding with State Information Problems
Problem
Slepian and Wolf (1973)
Encoder
access
S
Decoder
access
S+T
Wyner and Ziv (1976)
S
S+T
DIKICI et al.
(Chapter-5)
S+X
S+T
Type of
source
Lossless rate
Discrete Case
R(D) function
BSC and Gaussian
R(D) function
Gaussian
Proposed Practical Code Designs
Our proposed practical code designs can be grouped into two categories such as
distributed source coding and data hiding.
In DSC, we proposed a Slepian-Wolf coder based on LDPC binning which have a
- 122 -
Conclusion
performance gap of 0.08 bits per channel use with respect to the maximum correlation
noise variance for 2 : 1 rate compression. Moreover this coding method is applied to
an image compression system where Low pass DWT coefficients assume to be known
to the decoder as side information.
In Data Hiding, we proposed a low embedding rate robust image watermarking
based on Miller et al (2004) system with using DWT coefficients of the image
and a perceptual shaping for embedding process. Furthermore a high embedding rate
system is proposed with the concatenation of a good source code based on TCQ and
a good channel code based on LDPC. The system operates at an AWGN variance
1.5 dB away from the theoretical limits for an embedding rate of 1/2 bit per channel
use.
Finally, the combination of our Slepian-Wolf coding scheme and Superposition
data hiding scheme is used to evaluate our theoretical findings in Chapter-5.
Perspectives
The perspectives can be grouped as the application point of view and the theoretical
point of view.
For the application perspectives, several improvements on the proposed schemes
and possible can be achieved. For instance, the utilization of the max-lloyd quantizer for the Slepian-Wolf coder in Chapter-5.4.1.2 can be replaced by a more effective
quantization code. Moreover, in high embedding rate practical design in Chapter3.5.2.2, the 2-level PAM coding of the channel code C1 can be done with a more
efficient way by considering also the side information available to the encoder. Moreover, a practical code design for the general case given in Chapter-4 can be proposed
using a modified version of the informed high embedding rate code in Chapter-3.5.
One of the practical application based on the proposed schemes in this dissertation can be the transmission of high resolution image or video given that a coarse
version is publicly freely available. The second stream enhances the coarse version
if the receiver purchased the key embedded on to the second stream. Another application could be the embedding of some meta-data into image for indexing purposes.
Finally, in theoretical point of view, two main directions that we will continue to
study can be given as:
• Our information theoretical contributions can be extended to the case where
state information is not i.i.d. but drawn from gauss-markov source. By deriving
so, a more realistic theoretical limits for image and video signals can be found.
• The theoretical setup of communicating with non-trust Carrier in Chapter-5
can be extended to the encrypted domain such that Alice transmits her signal
to Bob using an encryption and the Carrier try to compress the encrypted
domain signal with a fidelity criterion.
- 123 -
Conclusion
- 124 -
Appendix
- 125 -
Appendix A
Achievable Rate Region
Calculations for two partially
side information known to the
encoder and decoder
respectively
A.1
Derivation of the Achievable Rate Region
Recalling that Y = X + S + Z and S2 = S + T , the joint distribution of U, Y, S2 is
a multivariate Gaussian distribution f (U, Y, S2 ) ∼ N (0, BΣBt ), where



U

 Y =B·


S2

and where
X
S
Z
θ
T







I αI 0 αI 0
B =  I I I 0 0 .
0 I 0 0 I

Then,

(P + α2 (Q + L))I
(P + αQ)I
αQI
.
(P + αQ)I
(P + Q + N )I
QI
B · Σ · Bt = 
αQI
QI
(Q + K)I

- 127 -
(A.1)
(A.2)
(A.3)
Chapter A. Achievable Rate Region Calculations for two partially side information
known to the encoder and decoder respectively
Hence, the joint entropy1 of the random variables (U ; Y ; S2 ) is
h(U ; Y ; S2 ) = 1/2 ln (2πe)3 BΣBt .
(A.4)
The relevant mutual informations can be calculated to yield
I(U ; Y, S2 )
= h (U ) + h(Y ; S2 ) − h(U ; Y ; S2 )
= h(X + αS + αθ)) + h(X + S + Z; S + T )
+
−
− h(U ; Y ; S2 )
1
= ln (2πe)(P + α2 (Q + L))
2
1
ln (2πe)2 ((P + Q + N )(Q + K) − Q2 )
2
1
ln (2πe)3 (P QK(1 − α)2 + N K(P + α2 (Q + L)) + α2 L(P Q + P K + QK + N Q) + P N Q)
2
(A.5)
and similarly
1
I(U ; S1 ) = h(U ) + h(S + θ) − h(U ; S + θ) = ln
2
A.2
P + α2 (Q + L)
P
.
(A.6)
Maximization of the Rate
The rate function in Equation 4.5 has the form:
1
D
R(α) = ln
,
2
Aα2 + Bα + C
(A.7)
where A, B, C and D are constants depending of the values P,Q,K and L. The
denominator of the ln term is a quadratic polynomial and is minimized when α =
−B/2A. Then, maximization of R(α) with respect to α has the form:
4AD
1
.
(A.8)
R(−B/2A) = ln
2
4AC − B2
Since the term D is expressed as C+[...], the rate can be written as:
4A(C + [...]) − B2 + B2
4A[...] + B2
1
1
R(−B/2A) = ln
= ln 1 +
2
2
4AC − B2
4AC − B2
(A.9)
and then it is straightforward to obtain the rate by replacing A, B, C and [...] by
their values.
1
See Appendix A.3.
- 128 -
A.3. Entropy of Multivariate Gaussian Distribution
A.3
Entropy of Multivariate Gaussian Distribution
It is well known that for X multivariate Gaussian distribution X ∼ N (µ, Σ) with
mean µ and covariance matrix Σ, then,
fX (x1 , x2 , .., xn ) =
1
(2π)n/2 |Σ|
1
exp(− (x − µ)T Σ−1 (x − µ))
2
(A.10)
where |Σ| is the determinant of the covariance matrix.
Joint entropy of f is:
h(f ) =
Z
∞Z ∞
−∞ −∞
Z
...
∞
f (x) ln(f (x)) dx
−∞
1
(n + n ln(2π) + ln |Σ|)
2
1
= ln ((2πe)n |Σ|) .
2
=
(A.11)
Moreover if Y is a linear transformation of X such that Y = BX, Y is also
multivariate Gaussian distribution with X ∼ N (Bµ, BΣ−1 Bt ).
- 129 -
Chapter A. Achievable Rate Region Calculations for two partially side information
known to the encoder and decoder respectively
- 130 -
Appendix B
Codes and Degree Distributions
for Generating LDPC Matrices
You can find below the LDPC degree distributions used for Distributed Source Coding in Chapter-2 and Informed Data Hiding in Chapter-3.
B.1
Degree Distributions of rate 2/3 code, for 2 : 1 compression rate in DSC
• Regular code: λ(x) and ρ(x) are given as
λ(x) = x2 ,
(B.1)
ρ(x) = x5 .
(B.2)
and
• Irregular Code: λ(x) and ρ(x) are given as
λ(x) =0.41584493083218x + 0.32456702571975x2 + 0.17761981591744x6
+ 0.0025725519244473x8 + 0.0046654731946759x18
+ 0.039272974694212x20 + 0.015612811744969x21
+ 0.0017256946022807x26 + 0.01811872137005x99 ,
(B.3)
and
ρ(x) = 0.80851063829787x17 + 0.19148936170213x18 .
- 131 -
(B.4)
Chapter B. Codes and Degree Distributions for Generating LDPC Matrices
B.2
Degree Distribution of rate 1/2 code, for Informed
Data Hiding
λ(x) and ρ(x) are given as
λ(x) =0.4811081282955x + 0.31433341715558x2
+ 0.15356804095148x6 + 0.050990413597444x19 ,
(B.5)
and
ρ(x) = x7 .
- 132 -
(B.6)
Appendix C
Publications of the author
Publications Related to the Thesis
In Preparation
• «Dirty Paper Coding with Partial State Information».
• «Joint Data Hiding and Wyner-Ziv Coding, Theory and Practice»..
International Conferences and Workshops
• Dikici, C., Idrissi, K. and Baskurt, A. «Dirty-paper writing based on
LDPC codes for Data Hiding». International Workshop on Multimedia Content
Representation, Classification and Security (MRCS), pages 114–120, LNCS,
September 2006.
• Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source
Coding of Still Images». European Signal Processing Conference (EUSIPCO).
September 2006.
• Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source
Coding with Partially Available Side Information». SPIE Security, Steganography, and Watermarking of Multimedia Contents VIII , volume 6072, 60721E,
February 2006.
• Dikici, C., Guermazi R., Idrissi, K. and Baskurt, A. «Distributed Source
Coding of Still Images». European Signal Processing Conference (EUSIPCO)
VIII , September 2005.
- 133 -
Chapter C. Publications of the author
National Conferences
• Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage informé pour le codage
distribué». CORESA, September 2006.
National Plenary
• Dikici, C., «Codage et tatouage avec information adjacente». GDR ISIS,Théme
D: Télécommunications: Journée Plénière, Paris, December 2006.
Other Publications
• Dikici, C. and Bozma, I. «Video Coding Based on Pre-attentive Processing».
SPIE Real-Time Imaging, volume 5671, pages 212–220, January 2005.
• Dikici, C., Civanlar, R. and Bozma, I. «Fovea based Coding for Video
Streaming». International Conference on Image Analysis and Recognition (ICIAR),
LNCS, volume 3211, pages 285-294, Porto., September 2004.
• Dikici, C., Alp, U., Ayaz, H., Karadeniz, M., Civanlar, R. and Bozma,
I. «Fovea based Real-Time Video Processing and Streaming». Proc. of Signl
Processing and Applications Conference (SIU), Istanbul , 2003 [in turkish].
• Alp, U., Ayaz, H., Karadeniz, M., Dikici, C. and Bozma, I. «Remote
Control of a Robot over the Internet.». Proc. of Signl Processing and Applications Conference (SIU), Istanbul , 2003 [in turkish].
• Sarac, I., Dikici, C. and Sankur, B. «New framing protocol for IP over
SONET / SDH». Proc. of 1 st Communication Conference, Ankara, 2001 [in
turkish].
- 134 -
Appendix D
Cited Author Index
This index lists the names of all authors cited in the references of this dissertation. It
is designed to enable the reader to locate the chapters of this book in which the work
of specific authors is discussed. Entries refer the reader to a page number. Thus,
the entry “Cheng, S. 49, 50, 53” means that Cheng, S. is cited in pages 49, 50 and
53 respectively.
Aaron, A. M. xi, 3, 47–50, 53, 54
Cheng, S. 49, 50, 53
Acikel, O. F. 4, 47
Chiang, M. 4, 66, 85–87, 101, 107, 122
Amraoui, A. 51, 75
Chou, J. 4, 47, 66, 101
Anderson, J. B. 4, 47
Chung, S. Y. 4, 26, 47, 51, 75
Cocke, J. 4, 22, 29
Bahl, L. 4, 22, 29
Cohen, A. S. 65
Bajcsy, J. 47
Costa, M. 3, 5, 7, 20, 64, 84, 85, 88, 101,
122
Baskurt, A. 38, 54, 101
Bauml, R. 66
Cover, T. M. 4, 14, 43, 66, 85–87, 101,
107, 122
Benedetto, S. 4, 47
Cox, I. J. 20, 64–66, 68, 73, 123
Bennatan, A. 67, 73
Craver, S. 61
Berger, T. 43
Dikici, C. 38, 54, 101
Berrou, C. 4, 47
Divsalar, D. 4, 47
Bilgin, A. 69
Doberty, L. 45, 50
Boliek, M. P. 69
Doërr, G. J. 66, 68, 73, 123
Boutros, J. 75
Dragotti, P. L. 50
Burshtein, D. 67, 73
Duhamel, P. 84, 88, 91
Caire, G. 67, 73, 75
Eggers, J. J. 4, 66, 101
Chen, B. 20, 64–66
Erez, U. 65, 67, 91
Bastug, A. 66
- 135 -
Cited Author Index
Ficher, T. R. 21, 49, 74
Levický, D. 70
Foriš, P. 70
Liu, Z. 49
Forney, G. D. Jr. 4, 47, 74
Liveris, A. D. xi, 47–50, 53, 54
Gallager, R. G. 4, 25, 47
Mackay, D. J. C. 4, 25, 47
Gamal, A. E. 84, 88
Majumbar, A. 50
Garcia-Frias, J. 47
Maor, A. 101
Gehrigand, N. 50
Marcellin, M. W. 21, 49, 69, 74
Gel’fand, S. I. 3, 64, 84, 101, 122
McKellips, A. L. 20, 65
Georghiades, C. N. xi, 47–50, 53, 54
Memon, N 62
Girod, B. xi, 3, 4, 47–50, 53, 54, 61, 66,
101
Merhav, N. 101
Glavieux, A. 4, 47
Mihcak, M. K. 61
Gormish, M. J. 69
Miller, M. L. 20, 64–66, 68, 73, 123
Guermazi, R. 38
Mitran, P. 47
Guillemot, C. 49, 53
Montorsi, G. 4, 47
Hartung, F. 60, 61
Moulin, P. 61, 65, 84, 87, 101, 108
Heegard, C. 84, 88
Holliman, M. 62
Idrissi, K. 38, 54, 101
Ishwar, P. 50
Mihcak, K. M. 93, 101
Narayanan, F 49
Neal, R. M. 4, 25, 47
Oohama, Y. 112
O’Sullivan, J. A. 65
Jelinek, F. 4, 22, 29
Ozonat, K. 54
Kerckhoffs, A. 61
Pearl, J. 27
Klein Gunnewiek, R. 50
Pereira, S. 62
Koval, O. 93, 101
Pérez-González, F. 93, 101
Kuhn, M. 62
Petitcolas, F. A. P. 61, 62, 72
Kusuma, J. 45, 50
Pinsker, M. S. 3, 64, 84, 101, 122
Kutter, M. 60
Pollara, F. 4, 47
Lagendijk, R. L. 50
Lajnef, K. xi, 47, 49, 53, 54
Lan, C. F. 49
Lapidoth, A. 65
Pradhan, S. S. 4, 45, 47–50, 66, 101
Pun, T. 93, 101
Puri, R. 50
Le Gall, D. 55, 69
Ramchandran, K. 3, 4, 45, 47–50, 66,
101
Le Guelvouit, G. 66
Rane, S. 50
- 136 -
Cited Author Index
Raviv, J. 4, 22, 29
Urbanke, R. L. 4, 26, 30, 47, 51, 75
Rebollo-Monedero, D. 49, 50
Varodayan, D. 48
Richardson, T. J. 4, 26, 30, 47
Villasenor, J. 70
Ryan, W. E. 4, 47
Viterbi, A. 22
Salehi, M. 84
Voloshynovskiy, S. 93, 101
Sankur, B. 66
Schonberg, D. 48
Wang, Y 84, 87, 101, 108
Setton, E. 49, 50
Watson, A. B. 67, 68, 70
Shamai, S. 49, 65, 67, 73, 91, 101
Westerlaken, R. P. 50
Shannon, C. E. 2, 84
Wolf, J. 3, 6, 19, 38, 39, 45, 101, 108, 122
Shokrollahi, M. A. 4, 26, 30, 47
Wornell, G. W. 20, 64–66
Siohan, P. 49, 53
Wyner, A. 3, 6, 38, 41–43, 45, 101, 108,
112, 122
Slepian, D. 3, 6, 19, 38, 39, 45, 101, 108,
122
Solomon, J. A. 70
Xiong, Z. xi, 47–50, 53, 54
Stankovic, V. 50
Yang, E. H. 101
Stone, H. S. 61, 62
Yang, G. Y. 70
Su, J. K. 4, 61, 101
Yang, Y. 50
Sun, W. 101
Yeo, B. L. 61
Tabatabai, A. 55, 69
Yeung, M. M. 61
Tepe, K. E. 4, 47
Thitimajshima, P. 4, 47
Thomas, J. A. 14, 43
Tzschoppe, R. 66
Ungerboeck, G. 21, 45, 74
Zaidi, A. 84, 88, 91
Zamir, R. 49, 65, 67, 91
Zhao, Y. 47
Zhu, X 50
Ziv, J. 3, 6, 38, 41–43, 101, 108, 112, 122
- 137 -
Cited Author Index
- 138 -
Bibliography
Aaron, A. M. and Girod, B. «Compression with Side Information Using Turbo
Codes». In DCC ’02: Proceedings of the Data Compression Conference (DCC ’02),
page 252. IEEE Computer Society, Washington, DC, USA. 2002.
Aaron, A. M., Setton, E. and Girod, B. «Towards practical Wyner-Ziv coding
of video». In Proceedings of the IEEE Image Processing, ICIP , volume 2,3, pages
869–872. 2003.
Acikel, O. F. and Ryan, W. E. «Punctured turbo-codes for BPSK/QPSK channels». IEEE Trans. Commun., 47(9):1315–1323. 1997.
Amraoui, A., Chung, S. Y. and Urbanke, R. L. «LTHC: Ldpcopt.» http:
//lthcwww.epfl.ch/research/ldpcopt/. Access Date: Oct 2007. 2003.
Bahl, L., Cocke, J., Jelinek, F. and Raviv, J. «Optimal decoding of linear
codes for minimizing symbol error rate (Corresp.)». IEEE Trans. Inform. Theory,
20(2):284–287. 1974.
Bajcsy, J. and Mitran, P. «Coding for the slepian-wolf problem with turbo
codes». In GlobeCom’01, San Antonio. 2001a.
Bajcsy, J. and Mitran, P. «Design of fractional rate FSM encoders using Latin
squares». In IEEE Int. Symp. Inform. Theory - Recent Results Session, Washington. 2001b.
Bastug, A. and Sankur, B. «Improving the payload of watermarking channels
via LDPC coding». IEEE Signal Processing Lett., 11(2):90–92. 2004.
Benedetto, S., Divsalar, D., Montorsi, G. and Pollara, F. «Serial concatenation of interleaved codes: Performance analysis, design, and iterative decoding».
IEEE Trans. Inform. Theory, 44(5):909–926. 1998.
Bennatan, A., Burshtein, D., Caire, G. and Shamai, S. «Superposition coding
for side-information channels». IEEE Trans. Inform. Theory, 52(5):1872–1889.
2006.
- 139 -
BIBLIOGRAPHY
Berger, T. Rate-Distortion Theory: A mathematical basis for data compression.
Prentice-Hall. 1971.
Berrou, C. and Glavieux, A. «Near optimum error correcting coding and decoding: turbo-codes». IEEE Trans. Commun., 44(6):1261–1271. 1996.
Berrou, C., Glavieux, A. and Thitimajshima, P. «Near Shannon limit errorcorrecting coding and decoding: Turbo-Codes». In IEEE International Conference
on Communications, Geneve. 1993.
Boutros, J. and Caire, G. «Iterative multiuser joint decoding: United framework
and asymptotic analysis». IEEE Trans. Inform. Theory, 48(7):1772–1793. 2002.
Chen, B. and Wornell, G. W. «Digital watermarking and information embedding using dither modulation». In IEEE Second Workshop on Multimedia Signal
Processing, pages 273–278. 1998.
Chen, B. and Wornell, G. W. «Provably robust digital watermarking». In SPIE:
Multimedia Systems and Applications II (part of Photonics East 99), Boston, volume 3845, pages 43–54. 1999.
Chen, B. and Wornell, G. W. «Quantization index modulation: A class of
provably good methods for digital watermarking and information embedding».
IEEE Trans. Inform. Theory, 47(5):1423–1443. 2001.
Chou, J., Pradhan, S. S. and Ramchandran, K. «A robust blind watermaking
scheme based on distributed source coding principles». In ACM Multimedia, pages
49–56. 2000.
Chou, J., Pradhan, S. S. and Ramchandran, K. «Turbo and trellis-based
constructions for source coding with side information». In IEEE Data Compression
Conf. (DCC), Snowbird, UT . 2003.
Chung, S. Y. On the Construction of Some Capacity-Approaching Coding Schemes..
Ph.D. thesis, MA: MIT Press. 2000.
Chung, S. Y., Forney, G. D. J., Richardson, T. J. and Urbanke, R. L. «On
the design of Low-Density Parity-Check codes within 0.0045dB of the Shannon
limits». IEEE Commun. Lett., 5(2):58–60. 2001a.
Chung, S. Y., Richardson, T. J. and Urbanke, R. L. «Analysis of sum-product
decoding of low-density parity-check codes using a Gaussian approximation». IEEE
Trans. Inform. Theory, 47(2):657–670. 2001b.
Cohen, A. S. and Lapidoth, A. «The Gaussian watermarking game». IEEE
Trans. Inform. Theory, 48(6):1639–1667. 2002.
Costa, M. «Writing on dirty paper (Corresp.)». IEEE Trans. Inform. Theory,
29(3):439–441. 1983.
- 140 -
BIBLIOGRAPHY
Cover, T. M. and Chiang, M. «Duality between channel capacity and rate distortion with two-sided state information». IEEE Trans. Inform. Theory, 48(6):1629–
1638. 2002.
Cover, T. M. and Thomas, J. A. Elements of information theory. WileyInterscience, New York, NY, USA. 1991.
Cox, I. J. and Miller, M. L. «The First 50 Years of Electronic Watermarking». EURASIP Journal on Applied Signal Processing, 2002(2):126–132.
Doi:10.1155/S1110865702000525. 2002.
Cox, I. J., Miller, M. L. and McKellips, A. L. «Watermarking as communications with side information». Proceedings of the IEEE (USA), 87(7):1127–1141.
1999.
Craver, S., Memon, N., Yeo, B. L. and Yeung, M. M. «Resolving Rightful
Ownerships with Invisible Watermarking Techniques: Limitations, Attacks, and
Implications». IEEE Journal on Selected Areas in Communications, 16(4):573–
586. 1998.
Dikici, C., Guermazi, R., Idrissi, K. and Baskurt, A. «Distributed Source
Coding of Still Images». In Proc. of European Signal Processing Conf. EUSIPCO,
Antalya. 2005.
Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding of
Still Images». In Proc. of European Signal Processing Conf. EUSIPCO, Florence.
2006a.
Dikici, C., Idrissi, K. and Baskurt, A. «Joint Data-Hiding and Source Coding
with Partially Available Side Information». In Proc. of SPIE Electronic Imaging,
volume 6072. 2006b.
Dikici, C., Idrissi, K. and Baskurt, A. «Tatouage Informé pour le Codage
Distribué». In Proc. of CORESA. 2006c.
Eggers, J. J., Bauml, R., Tzschoppe, R. and Girod, B. «Scalar costa scheme
for information embedding». IEEE Trans. Signal Processing, 51(4):1003–1019.
2003.
Erez, U., Shamai, S. and Zamir, R. «Capacity and lattice strategies for canceling
known interference». IEEE Trans. Inform. Theory, 51(11):3820–3833. 2005.
Forney, G. D. J. and Ungerboeck, G. «Modulation and coding for linear
Gaussian channels». IEEE Trans. Inform. Theory, 44(6):2384–2415. 1998.
Gallager, R. G. Low-Density Parity-Check Codes.. Ph.D. thesis, MA: MIT Press.
1963.
- 141 -
BIBLIOGRAPHY
Garcia-Frias, J. and Zhao, Y. «Compression of correlated binary sources using
turbo codes». IEEE Commun. Lett., 5(10):417–419. 2001.
Garcia-Frias, J. and Zhao, Y. «Compression of binary memoryless sources using
punctured turbo codes». IEEE Commun. Lett., 6(9):394–396. 2002.
Gehrigand, N. and Dragotti, P. L. «Distributed Compression in Camera Sensor
Network». In IEEE International Workshop on Multimedia Signal Processing,
Siena, Italy. 2004.
Gel’fand, S. I. and Pinsker, M. S. «Coding for Channel with Random Parameters». Prob. Contr. Inform. Theory, 9(1):19–31. 1980.
Girod, B., Aaron, A. M., Rane, S. and Rebollo-Monedero, D. «Distributed
video coding». In Special Issue on Video Coding and Delivery, Proceedings of the
IEEE , volume 93, pages 71–83. 2005.
Hartung, F. and Kutter, M. «Multimedia watermarking techniques». Proc.
IEEE , 87(7):1079–1107. 1999.
Hartung, F., Su, J. K. and Girod, B. «Spread Spectrum Watermarking: Malicious Attacks and Counterattacks». In SPIE Electronic Imaging, Security and
Watermarking of Multimedia Contents, pages 147–158. 1999.
Heegard, C. and Gamal, A. E. «On the capacity of computer memory with
defects». IEEE Trans. Inform. Theory, 29(5):731–739. 1983.
Holliman, M. and Memon, N. «Counterfeiting attacks on oblivious block-wise
independentinvisible watermarking schemes». IEEE Trans. Image Processing,
9(3):432–441. 2000.
Kerckhoffs, A. «La cryptographie militaire». Journal des sciences militaires,
9(1):5–38. 1883.
Kuhn, M. and Petitcolas, F. A. P. «Stirmark». http://www.petitcolas.net/
fabien/watermarking/stirmark/. Access Date: Oct 2007. 2000.
Kusuma, J., Doberty, L. and Ramchandran, K. «Distributed compression for
sensor networks». In IEEE Intl. Conf. on Image Processing (ICIP),Thessaloniki,
Greece, volume 1, pages 82–85. 2001.
Lajnef, K. Etude du codage de sources distribuées pour de nouveaux concepts en
compression vidéo. Ph.D. thesis, Thèse de doctorat en Traitement du Signal,
Université de Rennes 1. 2006.
Lajnef, K., Guillemot, C. and Siohan, P. «Distributed coding of three binary
and Gaussian correlated sources using punctured turbo codes». EURASIP Journal
on Applied Signal Processing, 86(11):3131–3149. ISSN 0165-1684. 2006.
- 142 -
BIBLIOGRAPHY
Le Gall, D. and Tabatabai, A. «Sub-band coding of digital images using symmetric short kernel filters and arithmetic coding techniques». In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 2, pages
761–764. 2000.
Le Guelvouit, G. «Trellis-coded quantization for public-key watermarking». In
IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 2005.
Levický, D. and Foriš, P. «Human Visual System Models in Digital Image Watermarking». In Radioengineering, volume 13, pages 38–43. 2004.
Liu, Z., Cheng, S., Liveris, A. D. and Xiong, Z. «Slepian-Wolf Coded Nested
Lattice Quantization for Wyner-Ziv Coding: High-Rate Performance Analysis and
Code Design». IEEE Trans. Inform. Theory, 52(10):4358–4379. 2006.
Liveris, A. D., Lan, C. F., Narayanan, F., Xiong, Z. and Georghiades,
C. N. «Slepian-Wolf coding of three binary sources using LDPC codes». In
International Symposium on Turbo Codes and Related Topics. 2003a.
Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Compression of binary
sources with side information at the decoder using LDPC codes». IEEE Commun.
Lett., 6(10):440–442. 2002a.
Liveris, A. D., Xiong, Z. and Georghiades, C. N. «A Distributed Source
Coding Technique For Highly Correlated Images Using Turbo-Codes». In IEEE
Int. Conf. Acoust., Speech, Signal Processing (ICASSP), Orlando. 2002b.
Liveris, A. D., Xiong, Z. and Georghiades, C. N. «Distributed compression of
binary sources using conventional parallel and serial concatenated convolutional
codes». In IEEE Data Compression Conf. (DCC), Snowbird, UT . 2003b.
Mackay, D. J. C. Information Theory, Inference, and Learning Algorithms. Cambridge University Press. 2003.
Mackay, D. J. C. and Neal, R. M. «Near Shannon limit performance of low
density parity check codes». Electronics Letters, 33(6):457–458. 1997.
Maor, A. and Merhav, N. «On Joint Information Embedding and Lossy Compression». IEEE Trans. Inform. Theory, 51(8):2998–3008. 2005.
Marcellin, M. W. and Ficher, T. R. «Trellis Coded Quantization of Memoryless
and Gauss-Markov Sources». IEEE Trans. Commun., 38(1):82–93. 1990.
Marcellin, M. W., Gormish, M. J., Bilgin, A. and Boliek, M. P. «An
Overview of JPEG-2000». In Data Compression Conference, pages 523–544. 2000.
Merhav, N. and Shamai, S. «On joint source-channel coding for the WynerZiv source and the Gel’fand-Pinsker channel». IEEE Trans. Inform. Theory,
49(11):2844–2855. 2003.
- 143 -
BIBLIOGRAPHY
Miller, M. L., Doërr, G. J. and Cox, I. J. «Applying informed coding and
embedding to design a robust high-capacity watermark». IEEE Trans. Image
Processing, 13(6):792–807. 2004.
Moulin, P. and Mihcak, M. K. «The parallel-Gaussian watermarking game».
IEEE Trans. Inform. Theory, 50(2):272–289. 2004.
Moulin, P. and O’Sullivan, J. A. «Information-Theoretic Analysis of Information
Hiding». IEEE Trans. Inform. Theory, 49(3):563–593. 2003.
Moulin, P. and Wang, Y. «Capacity and Random-Coding Exponents for Channel
Coding With Side Information». IEEE Trans. Inform. Theory, 53(4):1326–1347.
2007.
Oohama, Y. «Gaussian Multiterminal Source Coding». IEEE Trans. Inform. Theory, 43(6):1912–1923. 1997.
Ozonat, K. «Lossless distributed source coding for highly correlated still images».
2000.
Pearl, J. Probabilistic Reasoning in Intelligent Systems : Networks of Plausible
Inference. Morgan Kaufmann. 1988.
Pereira, S. «Checkmark». http://watermarking.unige.ch/Checkmark. Access
Date: Oct 2007. 2001.
Petitcolas, F. A. P. «Watermarking schemes evaluation». IEEE Trans. Signal
Processing, 17(5):58–64. 2000.
Pradhan, S. S., Chou, J. and Ramchandran, K. «Duality between source
coding and channel coding and its extension to the side information case». IEEE
Trans. Inform. Theory, 49(5):1181–1203. 2003.
Pradhan, S. S., Kusuma, J. and Ramchandran, K. «Distributed compression in
a dense micro-sensor network». IEEE Signal Processing Mag., 19(3):51–60. 2002.
Pradhan, S. S. and Ramchandran, K. «Distributed Source Coding Using Syndromes (DISCUS): Design and Construction». In DCC ’99: Proceedings of the
Conference on Data Compression, page 158. IEEE Computer Society, Washington, DC, USA. 1999.
Pradhan, S. S. and Ramchandran, K. «Distributed source coding: Symmetric
rates and applications to sensor networks». In IEEE Data Compression Conf.
(DCC), Snowbird, UT . 2000.
Puri, R., Majumbar, A., Ishwar, P. and Ramchandran, K. «Distributed video
coding in wireless sensor networks». IEEE Signal Processing Mag., 23(4):94–106.
2006.
- 144 -
BIBLIOGRAPHY
Puri, R. and Ramchandran, K. «PRISM: A new robust video architecture based
on distributed compression principles». In Allerton Conf. Communication Control,
and Computing, Allerton, IL. 2002.
Rebollo-Monedero, D. and Girod, B. «Design of optimal quantizers for distributed coding of noisy sources». In IEEE Int. Conf. Acoust., Speech, Signal
Processing (ICASSP), Philadelphia. 2005.
Richardson, T. J., Shokrollahi, M. A. and Urbanke, R. L. «Design of
capacity-approaching irregular low-density parity-check codes». IEEE Trans. Inform. Theory, 47(2):619–637. 2001.
Richardson, T. J. and Urbanke, R. L. «The capacity of Low-Density ParityCheck codes under message-passing decoding». IEEE Trans. Inform. Theory,
47(2):599–618. 2001a.
Richardson, T. J. and Urbanke, R. L. «Efficient encoding of low-density paritycheck codes». IEEE Trans. Inform. Theory, 47(2):638–656. 2001b.
Salehi, M. «Capacity and Coding for Memories with Real-Time Noisy Defect
Information at Encoder and Decoder». In Proceedings of the IEE Communication,
Speech and Vision, volume 139, pages 113–117. 1992.
Schonberg, D., Pradhan, S. S. and Ramchandran, K. «LDPC Codes Can
Approach the Slepian-Wolf Bound for General Binary Sources». In 40th Allerton Conf. Communication Control, and Computing, Allerton, IL, pages 576–585.
2002.
Shannon, C. E. «Channels with side information at the transmitter». In IBM J.
of Research and Development, volume 2, pages 289–293. 1958.
Shannon, C. E. «Coding theorems for a discrete source with a fidelity criterion».
In IRE Nat. Conv. Rec., Pt. 4 , pages 142–163. 1959.
Slepian, D. and Wolf, J. «Noiseless coding of correlated information sources».
IEEE Trans. Inform. Theory, 19(4):471–480. 1973.
Stankovic, V., Yang, Y. and Xiong, Z. «Distributed Source Coding for Multimedia Multicast Over Heterogeneous Networks». IEEE Journal of Selected Topics
in Signal Processing, 1(2):220–230. 2007.
Stone, H. S. «Analysis of attacks on image watermarks with randomized coefficients». 1996.
Su, J. K., Eggers, J. J. and Girod, B. «Illustration of the duality between
channel coding and rate distortion with side information». In 34th Asilomar Conf
Signals, Systems and Computers, Pacific Grove, CA, USA, Oct. 29-Nov. 1 . 2000.
- 145 -
BIBLIOGRAPHY
Tepe, K. E. and Anderson, J. B. «Turbo codes for binary symmetric and binary
erasure channels». In IEEE International Symposium on Information Theory,
page 59. 1998.
Ungerboeck, G. «Channel Coding with Multilevel/Phase Signals». IEEE Trans.
Inform. Theory, 28(1):55–67. 1982.
Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive distributed source
coding using low-density parity-check codes». In 39th Asilomar Conf Signals,
Systems and Computers, Pacific Grove, CA, USA. 2005.
Varodayan, D., Aaron, A. M. and Girod, B. «Rate-adaptive codes for
distributed source coding». EURASIP Journal on Applied Signal Processing,
86(11):3123–3130. 2006.
Viterbi, A. «Error bounds for convolutional codes and an asymptotically optimum
decoding algorithm». IEEE Trans. Inform. Theory, 13(2):260–269. 1967.
Voloshynovskiy, S., Koval, O., Pérez-González, F., Mihcak, K. M. and
Pun, T. «Data-hiding with host state at the encoder and partial side information
at the decoder». URL http://vision.unige.ch/publications/postscript/
2005/VoloshynovskiyKovalPerezGonzalezMihcakPun_SP2005.pdf, (preprint).
2004.
Watson, A. B. «DCT quantization matrices visually optimized for individual images». In SPIE Human Vision, Visual Processing, and Digital Display IV , volume
1913, pages 202–216. 1993.
Watson, A. B., Yang, G. Y., Solomon, J. A. and Villasenor, J. «Visibility
of wavelet quantization noise». IEEE Trans. Image Processing, 6(8):1164–1175.
1997.
Westerlaken, R. P., Klein Gunnewiek, R. and Lagendijk, R. L. «TurboCode Based Wyner-Ziv Video Compression». In Twenty-sixth Symposium on Information Theory in the Benelux , pages 113–120. 2005.
Wyner, A. «Recent results in the Shannon theory». IEEE Trans. Inform. Theory,
20(1):2–10. 1974.
Wyner, A. and Ziv, J. «The rate-distortion function for source coding with side
information at the decoder». IEEE Trans. Inform. Theory, 22(1):1–10. 1976.
Xiong, Z., Liveris, A. D. and Cheng, S. «Distributed source coding for sensor
networks». IEEE Signal Processing Mag., 21(5):80–94. 2004.
Yang, E. H. and Sun, W. «Combined Source Coding and Watermarking». In
Information Theory Workshop, Proceedings of the IEEE , pages 322–326. 2006.
- 146 -
BIBLIOGRAPHY
Zaidi, A. and Duhamel, P. «On coding with a partial knowledge of the state
information». In Proceedings of the IEEE 39th Asilomar conference on Signals,
Systems and Computers, pages 657–661. 2005.
Zamir, R. and Shamai, S. «Nested linear/ lattice codes for Wyner-Ziv encoding».
In IEEE Information Theory Workshop, Killarney, Ireland , pages 92–93. 1998.
Zamir, R., Shamai, S. and Erez, U. «Nested linear/lattice codes for structured
multiterminal binning». IEEE Trans. Inform. Theory, 48(6):1250–1276. 2002.
Zhu, X., Aaron, A. M. and Girod, B. «Distributed compression for large camera
arrays». In IEEE Workshop on Statistical Signal Processing, St Louis, Missouri .
2003.
- 147 -
BIBLIOGRAPHY
- 148 -
INSA de LYON
Informed Watermarking and Compression of Multi-Sources
Technological advances in the fields of telecommunications, multimedia and the diverse choice
of portable handhelds during the last decade, derive to create novel services such as sharing of multimedia content, video-conferencing or content protection, where all running on low-power devices.
Hence alternative low complexity coding techniques need to be developed for replacing conventional
ones. Coding with state information, a potential solution to shifting the encoder complexity to the
decoder, has two main applications:
1)Distributed Source Coding(DSC) for compressing a source given a correlated version of it is
available only to the decoder.
2)Informed Data Hiding(IDH) for embedding a watermark to a host signal where the host signal
is available only to the encoder.
For each problem stated above, practical code designs that operate close to the theoretical limits
are proposed. The combination of good error correcting codes such as Low Density Parity-Check
(LDPC) Codes and good quantization codes such as Trellis Coded Quantization (TCQ) are used
at the design of the proposed capacity approaching codes.
Moreover, the theoretical achievable rate limits for a relaxed IDH setup, such that a noisy
observation of the host signal is available to the decoder is derived.
Finally, motivated by the strong duality between DSC and IDH, a hybrid scheme that uses
both data hiding and compression is proposed. In addition to the derivation of theoretical channel
capacity and rate distortion function, a complete framework is proposed.
Keywords: Coding with State Information, Compression, Watermarking, Distributed Source Coding, Writing on Dirty Paper, Low Density Parity Check Codes, Trellis Coded Quantization.
Tatouage informé et Compression Multi-sources
Les avancées technologiques qu’ont connu les télécommunications, le multimédia et les systèmes
mobiles ont ouvert la porte à l’émergence, puis au développement de nouveaux services tels que le
partage de bases de donées multimedia, la vidéo-conférence ou la protection des contenus, tout en
utilisant des systèmes à faible puissance. D’où la nécessité de disposer de nouvelles techniques de
codage à complexité réduite. Les techniques de codage exploitant la présence d’une information
parallèle peuvent constituer une solution potentielle permettant de déporter la complexité de codage
vers le décodeur. Celles-ci s’appliquent notamment à deux principes de codage :
1) Le codage de source distribué (Distributed Source Coding DSC ) pour compresser un signal
donné, sachant qu’un autre signal corrélé à celui d’origine est disponible au niveau du décodeur.
2) La dissimulation de données informée (Informed Data Hiding IDH ) permettant d’insérer un
message dans un signal hôte, ce dernier n’étant connu qu’au codeur.
Pour chacune de ces deux techniques, nous proposons des solutions qui approchent les limites
théoriques. Nous combinons pour cela des techniques performantes tant de codage canal, de type
LDPC, que de quantification de type Treillis (TCQ). Par ailleurs, nous étudions les limites théoriques
pouvant être atteintes par IDH, dans le cas où une version bruitée du signal hôte est disponible au
décodeur.
Enfin, exploitant la forte dualité qui existe entre DSC et IDH, nous proposons un schéma pratique hybride complet mettant en œuvre les deux techniques, ainsi qu’une étude théorique de la
fonction débit / distorsion et de la capacité d’un tel système.
Mots clés : Codage avec information adjacente, compression, tatouage, codage de sources distribuées, LDPC, TCQ.
Laboratoire d’InfoRmatique en Images et Systèmes d’information, UMR 5205 CNRS

Documents pareils