The erasure channel

Transcription

The erasure channel
Coding for loss tolerant systems
Workshop APRETAF, 22 janvier 2009
Mathieu Cunche, Vincent Roca
INRIA, équipe Planète
INRIA Rhône-Alpes – Mathieu Cunche, Vincent Roca
 The erasure channel
 Erasure codes
 Reed-Solomon codes
 LDPC codes
 Application to distributed storage
2
The erasure channel
 erasure channel
o definition:
a symbol either arrives to the
destination, without any error…
… or is erased and never received
0
0
Erased !
1
1
≠ BSC (binary symmetric) and AWGN channels…
o the integrity assumption is a strong hypothesis
o a received symbol is 100% guaranteed error free
3
The erasure channel
 where do we find erasure channels?
o On the Internet
o Because of routing error, congestion
o Because of bad CRC/checksum
o On wireless and satelitte networks
o intermittent connection due to obstacles
o Distributed storage
o disk failure in RAID systems
o node failure in a data center
o Distributed computation
o Fail stop
4
 The erasure channel
 Erasure codes
 Reed-Solomon codes
 LDPC codes
 Application to distributed storage
5
Erasure codes
o k sources symbols, encoded into n encoding symbols
before encoding
k
o Code rate =
=
n
after encoding
o Close to 1 => little redundancy
o Close to 0 => high amount of redundancy
Transmission
Symbol erasure
k source symbols
Source object
Decoded object
Encoding
(n-k) repair symbols
Decoding
6
Erasure codes
 Often used as AL-FEC codes
o “Application Level-Forward Error Correction” codes
 AL-FEC differ from Physical-layer FEC codes
o PHY codes:
o correct bit errors, and if not possible detect the errors
o Symbol = bit
o AL-FEC:
o recover from symbol erasures
o Symbol = byte, IP datagram, file chunck
7
Erasure codes
 how can we define good erasure codes?
 performance metrics for erasure codes
o erasure recovery capabilities
o main metric, measured as the overhead ratio:
# _ of _ symbols_ required _ for _ decoding
decoding _ overhead 
1
k
o decoding needs (1+overhead)*k symbols to succeed,
whereas ideal (MDS) codes need only k symbols

o encoding and decoding speed
o to appreciate the complexity
o required memory during encoding and decoding
8
 The erasure channel
 Erasure codes
 Reed-Solomon codes
 LDPC codes
 Application to distributed storage
9
Reed Solomon codes
 In short
o Discovered by Reed & Solomon in 1959
o Linear codes over GF(2n)
o Sum : simple binary XOR
o Multiplication and Division: use a logarithmic table
o Based on polynomial interpolation
o Practical implementation with Vandermonde matrix
o any k×k submatrix of a Vandermonde is invertible
10
Reed Solomon codes
 Encoding
o Matrix vector multiplication
X
×
G
=
×
Source vector:
k source symbols
Y
=
Encoded vector:
Generator matrix:
n encoded symbols
k x n Vandermonde
o Complexity O(k2) operations
11
Reed Solomon codes
 Decoding
o Solve a linear system
X
×
G’
×
Source vector:
k source symbols
=
Y’
=
Received vector:
kxk submatrix of G
k received symbols
(invertible)
o Good VDM property: any kxk submatrix is invertible
o k encoding symbols are enough to decode
o Decoding overhead = 0, said differently RS are MDS
o Complexity O(k3)
12
Reed Solomon codes: summary
 Perfect codes
o Decoding overhead = 0
o Decoding possible as soon as k symbols are received
 … but limited scalability
o n<255 GF(28) is sufficient
o Fast operation over GF(28), (small logarithmic table)
o Decoding speed = a few 10 Mbps
o n>255, use GF(216) or more
o Log table too large, cannot fit in cache
o Decoding speed falls = a few Mbps
13
 The erasure channel
 Erasure codes
 Reed-Solomon codes
 LDPC codes
 Application to distributed storage
14
LDPC codes
 in short
o “Low Density Parity Check” (LDPC)
o linear block codes
o Sparse parity check matrix
o discovered by Gallager in the 60’s, re-discovered in mid-90s
o In general encoding require to solve a linear system
O(k3)
o but high performance, lightweight variants exist
o in the remaining we focus on a binary LDPC
o Based on XOR operations
15
LDPC codes
 LDPC-staircase codes (RFC 5170)
o a simple (trivial) parity check matrix structure
Source symbols
Parity symbols
S1 S2 S3 S4 S5 P1 P2 P3 P4 P5
0
0
1
1
1
1
0
1
0
1
1
1
0
1
0
0
1
0
1
1
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
Constraints
0
1
1
0
1
S1  S4  S5  P1  P2 = 0
o A.K.A. double diagonal or Repeat Accumulate codes
o high encoding speed (encoding is trivial)
o recovery capabilities can be made close to ideal
codes 
16
LDPC codes
 Encoding
S1 S2 S3 S4 S5 P1 P2 P3 P4 P5
0
1
1
0
1
0
0
1
1
1
1
0
1
0
1
1
1
0
1
0
0
1
0
1
1
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
1
0
0
0
0
1
S3S4
P1
S1S4S5P1 P2
S1S2S3P2 P3
S2S4S5P3 P4
S1S2S3S5P4P5
=0
=0
=0
=0
=0
S1S4S5P1P2=0
o Linear complexity O(k)
 Decoding
o solve a system of linear equations
o Several techniques are feasible…
17
LDPC codes
 Sol.1: Iterative Decoding (ID)
o If an equation has only one unknown variable, this latter
is equal to the sum of the others. Reiterate …
o Efficient thanks to the sparsness of the parity check matrix
o Pros: Low complexity (linear O(k))
o Low CPU load and high sustainable bandwidth
o Cons: Suboptimal in terms of correction capabilities
o Some full rank systems cannot be solved
code rate
(k=1000,N1=3)
Average overhead
Overhead for a failure proba ≤ 10-4
2/3 (=0.66)
9.99 %
13.93 %
2/5 (=0.4)
17.13 %
22.91 %
18
LDPC codes
 Sol.2: Maximum Likelihood(ML) decoding
o Solve a linear system (Gaussian Elimination, LU
decomposition …)
xA = b
Information of the
Missing symbols
Submatrix of the
received symbols
Generator matrix
o Excellent erasure correction capabilities
code rate
(k=1000,N1=5)
Average overhead
Overhead for a failure proba ≤ 10-4
2/3 (=0.66)
0.63 %
2.21 %
2/5 (=0.4)
2.04 %
4.41 %
o High complexity: O(k3)
19
Some more details on LDPC codes considered
 Sol. 3: Hybrid ID/ML scheme
o Hybrid decoder
o start decoding with ID (fast)
o finish with ML if necessary (optimal)
o excellent erasure correction capabilities…
o … while remaining very fast
20
LDPC codes
 Decoding speed of the hybrid decoder
o LDPC-staircase (N1=5), code rate 2/3, k=1,000
o Reed Solomon over GF(28)
32.4 times faster than RS
ML needed more
(1.7 Gbps)
and more often
sustainable
ID sufficient
decoding
speed
still 10.2 times faster
(Mbps)
(500 Mbps)
with RS: 54Mbps
loss probability(%)
21
 The erasure channel
 Erasure codes
 Reed-Solomon codes
 LDPC codes
 Application to distributed storage
22
Application to distributed storage
Using replication :
•A file partitionned into 8 blocks
• Each block is replicated 4 times
Client_1
1 3
6 7
2 4
5 8
1 3
4 6
1 2
6 8
2 5
7 8
2 3
5 7
3 4
6 7
1 4
5 8
Client_2
Can tolerate up to 3 failures
23
Application to distributed storage
Using erasure codes:
•A file encoded into 32 blocks:
Client_1
A B
C D
8 source blocks
24 repair blocks
E F
G H
1 2
3 4
M N
O P
Q R
S T
U V
W X
I J
K L
5 6
7 8
Client_2
Can tolerate up to 6 failures,
since 8 blocks are enough to decode
24
Conclusion
 Erasure codes
o Add redundancy to combat symbol erasures
 Reed-Solomon
o Perfect codes (MDS), but inefficient for large objects
 LDPC codes
o Can encode large objects
o Corrections capabilities close to MDS
o High encoding and decoding speed
25
Questions ?