The erasure channel
Transcription
The erasure channel
Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes – Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 2 The erasure channel erasure channel o definition: a symbol either arrives to the destination, without any error… … or is erased and never received 0 0 Erased ! 1 1 ≠ BSC (binary symmetric) and AWGN channels… o the integrity assumption is a strong hypothesis o a received symbol is 100% guaranteed error free 3 The erasure channel where do we find erasure channels? o On the Internet o Because of routing error, congestion o Because of bad CRC/checksum o On wireless and satelitte networks o intermittent connection due to obstacles o Distributed storage o disk failure in RAID systems o node failure in a data center o Distributed computation o Fail stop 4 The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 5 Erasure codes o k sources symbols, encoded into n encoding symbols before encoding k o Code rate = = n after encoding o Close to 1 => little redundancy o Close to 0 => high amount of redundancy Transmission Symbol erasure k source symbols Source object Decoded object Encoding (n-k) repair symbols Decoding 6 Erasure codes Often used as AL-FEC codes o “Application Level-Forward Error Correction” codes AL-FEC differ from Physical-layer FEC codes o PHY codes: o correct bit errors, and if not possible detect the errors o Symbol = bit o AL-FEC: o recover from symbol erasures o Symbol = byte, IP datagram, file chunck 7 Erasure codes how can we define good erasure codes? performance metrics for erasure codes o erasure recovery capabilities o main metric, measured as the overhead ratio: # _ of _ symbols_ required _ for _ decoding decoding _ overhead 1 k o decoding needs (1+overhead)*k symbols to succeed, whereas ideal (MDS) codes need only k symbols o encoding and decoding speed o to appreciate the complexity o required memory during encoding and decoding 8 The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 9 Reed Solomon codes In short o Discovered by Reed & Solomon in 1959 o Linear codes over GF(2n) o Sum : simple binary XOR o Multiplication and Division: use a logarithmic table o Based on polynomial interpolation o Practical implementation with Vandermonde matrix o any k×k submatrix of a Vandermonde is invertible 10 Reed Solomon codes Encoding o Matrix vector multiplication X × G = × Source vector: k source symbols Y = Encoded vector: Generator matrix: n encoded symbols k x n Vandermonde o Complexity O(k2) operations 11 Reed Solomon codes Decoding o Solve a linear system X × G’ × Source vector: k source symbols = Y’ = Received vector: kxk submatrix of G k received symbols (invertible) o Good VDM property: any kxk submatrix is invertible o k encoding symbols are enough to decode o Decoding overhead = 0, said differently RS are MDS o Complexity O(k3) 12 Reed Solomon codes: summary Perfect codes o Decoding overhead = 0 o Decoding possible as soon as k symbols are received … but limited scalability o n<255 GF(28) is sufficient o Fast operation over GF(28), (small logarithmic table) o Decoding speed = a few 10 Mbps o n>255, use GF(216) or more o Log table too large, cannot fit in cache o Decoding speed falls = a few Mbps 13 The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 14 LDPC codes in short o “Low Density Parity Check” (LDPC) o linear block codes o Sparse parity check matrix o discovered by Gallager in the 60’s, re-discovered in mid-90s o In general encoding require to solve a linear system O(k3) o but high performance, lightweight variants exist o in the remaining we focus on a binary LDPC o Based on XOR operations 15 LDPC codes LDPC-staircase codes (RFC 5170) o a simple (trivial) parity check matrix structure Source symbols Parity symbols S1 S2 S3 S4 S5 P1 P2 P3 P4 P5 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 Constraints 0 1 1 0 1 S1 S4 S5 P1 P2 = 0 o A.K.A. double diagonal or Repeat Accumulate codes o high encoding speed (encoding is trivial) o recovery capabilities can be made close to ideal codes 16 LDPC codes Encoding S1 S2 S3 S4 S5 P1 P2 P3 P4 P5 0 1 1 0 1 0 0 1 1 1 1 0 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 S3S4 P1 S1S4S5P1 P2 S1S2S3P2 P3 S2S4S5P3 P4 S1S2S3S5P4P5 =0 =0 =0 =0 =0 S1S4S5P1P2=0 o Linear complexity O(k) Decoding o solve a system of linear equations o Several techniques are feasible… 17 LDPC codes Sol.1: Iterative Decoding (ID) o If an equation has only one unknown variable, this latter is equal to the sum of the others. Reiterate … o Efficient thanks to the sparsness of the parity check matrix o Pros: Low complexity (linear O(k)) o Low CPU load and high sustainable bandwidth o Cons: Suboptimal in terms of correction capabilities o Some full rank systems cannot be solved code rate (k=1000,N1=3) Average overhead Overhead for a failure proba ≤ 10-4 2/3 (=0.66) 9.99 % 13.93 % 2/5 (=0.4) 17.13 % 22.91 % 18 LDPC codes Sol.2: Maximum Likelihood(ML) decoding o Solve a linear system (Gaussian Elimination, LU decomposition …) xA = b Information of the Missing symbols Submatrix of the received symbols Generator matrix o Excellent erasure correction capabilities code rate (k=1000,N1=5) Average overhead Overhead for a failure proba ≤ 10-4 2/3 (=0.66) 0.63 % 2.21 % 2/5 (=0.4) 2.04 % 4.41 % o High complexity: O(k3) 19 Some more details on LDPC codes considered Sol. 3: Hybrid ID/ML scheme o Hybrid decoder o start decoding with ID (fast) o finish with ML if necessary (optimal) o excellent erasure correction capabilities… o … while remaining very fast 20 LDPC codes Decoding speed of the hybrid decoder o LDPC-staircase (N1=5), code rate 2/3, k=1,000 o Reed Solomon over GF(28) 32.4 times faster than RS ML needed more (1.7 Gbps) and more often sustainable ID sufficient decoding speed still 10.2 times faster (Mbps) (500 Mbps) with RS: 54Mbps loss probability(%) 21 The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 22 Application to distributed storage Using replication : •A file partitionned into 8 blocks • Each block is replicated 4 times Client_1 1 3 6 7 2 4 5 8 1 3 4 6 1 2 6 8 2 5 7 8 2 3 5 7 3 4 6 7 1 4 5 8 Client_2 Can tolerate up to 3 failures 23 Application to distributed storage Using erasure codes: •A file encoded into 32 blocks: Client_1 A B C D 8 source blocks 24 repair blocks E F G H 1 2 3 4 M N O P Q R S T U V W X I J K L 5 6 7 8 Client_2 Can tolerate up to 6 failures, since 8 blocks are enough to decode 24 Conclusion Erasure codes o Add redundancy to combat symbol erasures Reed-Solomon o Perfect codes (MDS), but inefficient for large objects LDPC codes o Can encode large objects o Corrections capabilities close to MDS o High encoding and decoding speed 25 Questions ?