1-OF-N DATA LINKS - ASP
Transcription
1-OF-N DATA LINKS - ASP
TWO-PHASE PROTOCOL CONVERTERS FOR 3D ASYNCHRONOUS 1-OF-N DATA LINKS Julian Pontes, Pascal Vivet***, Yvain Thonnart CEA-LETI, Minatec Campus, Grenoble, France ASP-DAC’2015, Tokyo, Japan, 20 January 2015 www.cea.fr & 3D Asynchronous NoC Cliquez pour modifier le style du titre Modular 3D NoC architecture for 3D stacking technology (TSV) Stacking of multiple dies for multi-core design, as a 3D GALS template 3D is still un-mature technology, not well supported by CAD tools (TA, CTS, etc .) Asynchronous interface to avoid 3D clocks distribution (Clocks local to each dies) 3D ANoC proposal 3D mesh based, using hierarchical routers (2D direction and 3D direction) [W. Lafi, DATE’11] Full asynchronous QDI design [Y. Thonnart, P. Vivet, DATE’2010] [F.Darve, Sheibanyrad, ISVLSI’2011] 2D NoC Router 3D NoC Router Processing Unit ANoC protocol How to implement efficiently the 3D Link physical layer ? & © CEA. All rights reserved DACLE Division| January 2015 |2 Interface Design Challenge Cliquez 3D pour modifier le style du titre Data communication in 3D Synchronous communication Clock distribution Power hungry Complex timing closure = Chip to Chip modeling Asynchronous Communication No timing assumption = Easier Timing Closure Performance problems due to long 3D link propagation delay 3D interconnect (µ-bumps, TSV, RDL, etc) µ-buffer cell (buffer + ESD + level shifters, etc.) Design-for-Test logic (Boundary Scan, etc) µ-pillar bump Backside RDL Delay Problem at 3D interface & © CEA. All rights reserved DACLE Division| January 2015 |3 Outline Cliquez pour modifier le style du titre Introduction 2-phase Data Link Converter Principle for 3D 2-phase Data Link Converter Design Experimental Results Conclusions & Perspectives & © CEA. All rights reserved DACLE Division| January 2015 |4 Outline Cliquez pour modifier le style du titre Introduction 2-phase Data Link Converter Principle for 3D 2-phase Data Link Converter Design Experimental Results Conclusions & Perspectives & © CEA. All rights reserved DACLE Division| January 2015 |5 3D 2-phase link architecture (1) Cliquez pour data modifier le style du titre Main Proposal Use existing 4-phase protocol for on-chip NoC communication : 2D NoC layers Use new 2-phase 1T-of-N protocol for off-chip NoC communication : 3D NoC link Implement efficient protocol converters to reduce penalty of off-hip delays Implementation No Clocking at the 3D interface Full robustness of 2D links and 3D links : no timing hypothesis Implement the protocol converters as HardMacros (including light timing hypothesis) & © CEA. All rights reserved DACLE Division| January 2015 |6 3D 2-phase link architecture (2) Cliquez pour data modifier le style du titre • • 2conversino State of the Art Cost of 4 Performance estimation : • 3D Link Delay : - Link delay = 3D line + µ-buffer + µ-bumps + µ-buffer + test muxes - Link delay ~ 6 - 8 gate delay • 3D Link Cycle Time : 4-phase = Stage cycle time + 4 * Link Delay 2-phase = Stage cycle time + 2 * Link Delay & Simple off-chip protocol & code, in order to reduce penalty of off-chip delays Reduce switching power of 3D links (half transitions) Our Target On Chip Communication (4-phase better) © CEA. All rights reserved Off Chip Communication (2-phase better) DACLE Division| January 2015 |7 Outline Cliquez pour modifier le style du titre Introduction 2-phase Data Link Converter Principle for 3D 2-phase Data Link Converter Design Experimental Results Conclusions & Perspectives & © CEA. All rights reserved DACLE Division| January 2015 |8 Asynchronous Protocols and Cliquez pour modifier le Data styleEncoding du titre Asynchronous Design choices : 1. Timing Model 1. 2. 3. 4. Asynchronous logic is without any clocks, but so many different ways to implement it ! Delay Insensitive (Most Robust) Quasi-Delay Insensitive (QDI) (Fully Robust) Speed Independent (similar to QDI) Bundled Data (similar to synchronous hyp.) 2. Handshake Protocol Source : A. Yakovlev 1. Four phase 1. 2. Easy to implement (for QDI type) Can be used for Computation 2. Two phase 1. 2. Faster and energy efficient Can be used for communication 3. Data Encoding 1. Systematic Encoding 1. 2. Non systematic Encoding 1. & Additional codeword added to data to reach unordered requirements Delay Insensitive by construction © CEA. All rights reserved DACLE Division| January 2015 |9 4-phase data communication (1) Cliquez pour modifier le style du titre Return-to-Zero Protocol Spacer used to separate two data Handshake composed by four transactions Data TX Two transactions to transfer data Two transactions to transfer spacer Cycle Time = at least equal to 4 times the link delay RX ack 4-phase protocol : this is a level encoded logic, simple logic used for computation & © CEA. All rights reserved DACLE Division| January 2015 | 10 4-phase data communication (2) Cliquez pour modifier le style du titre For 4-phase protocol, usage of general codes : m-of-n It is a Delay Insensitive (DI) Code Subclass Often, it is restricted to 1-of-N codes Example: 1-of-2 (Dual Rail) [Standard implementation] Use of 2 signals, to encode 1 bit : one rail to encode 0, one rail to encode 1. Value A1 A0 Spacer 0 0 ‘0’ 0 1 ‘1’ 1 0 1-of-4 (Four Rail) [Standard implementation, for communication] Use of 4 signals, to encode 2 bits, 1 rail per value in [0-3] Better power consumption, compared to dual rail : half the transition (1 transition of 1-of-4 instead of 2 transitions of 1-of2), for approximately the same logic complexity. & Value A3 A2 A1 A0 Spacer 0 0 0 0 ‘0’ 0 0 0 1 ‘1’ 0 0 1 0 ‘2’ 0 1 0 0 ‘3’ 1 0 0 0 Use 1-of-4 / 4-phase for on-chip NoC communication existing ANoC design [Y. Thonnart, P. Vivet, DATE’2010] © CEA. All rights reserved DACLE Division| January 2015 | 11 2-phase data communication (1) Cliquez pour modifier le style du titre Non Return-to Zero Protocol No Spacer Handshake composed by two transactions Data TX One transaction to transfer data One transaction to transfer ack Cycle Time = at least equal to 2 times the link delay RX ack 2 phase : requires transition detection logic, classically used for off-chip communication (like a 3D Link) & © CEA. All rights reserved DACLE Division| January 2015 | 12 2-phase data communication (2) Cliquez pour modifier le style du titre For 2-phase protocol, choice of different encoding : Level Encoding (also called LEDR for dual rail [R. Manohar], and LETS for its 1-of-n generalization [S. Nowick]) Transmission is divided in two phases : Odd and even Convertion to binary is facilitated Detection is performed using XOR gates Even Phase Odd Phase Value A3 A2 A1 A0 Value A3 A2 A1 A0 ‘0’ 0 0 0 0 ‘0’ 0 0 0 1 ‘1’ 0 0 1 1 ‘1’ 0 0 1 0 ‘2’ 0 1 0 1 ‘2’ 0 1 0 0 ‘3’ 1 0 0 1 ‘3’ 1 0 0 0 Transition Signaling Called 1T-of-n : one transition on Rail i indicate the i value Was already existing for dual-rail, but not widely used. Generalization proposed for 1-of-n Value & A1 A0 Value A3 A2 A1 A0 ‘0’ - - - T ‘1’ - - T - ‘2’ - T - - ‘3’ T - - - ‘0’ - T ‘1’ T - - A0 is used to transport odd/even phase - Other bits to transport the value Convertion to 1-of-n 4-phase is facilitated Proposal : Use 2-phase / 1T-of-4 for off-chip communication (3D Link) © CEA. All rights reserved DACLE Division| January 2015 | 13 4-phase to 2-phase (1) Cliquez pour modifier le converter style du titre Converter Behavior 4-phase data used as clock Input Data Rising edge generates a twophase transition Input Data Falling edge is ignored on twophase side. Used to generate the acknowledgement signal C-element synchronizes four-phase and two-phase Performance Analysis Forward Path Flip-Flop Backward Path Spacer Detection = Xor + C-element Data Detection = 2 Xor gates for detection + Xor Gate + Inverter + C-element & © CEA. All rights reserved DACLE Division| January 2015 | 14 4-phase to 2-phase (2) Cliquez pour modifier le converter style du titre 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 1 Data 4-phase Ack Data 2-phase Ack T T & © CEA. All rights reserved DACLE Division| January 2015 | 15 2-phase 4-phase (1) Cliquez pour to modifier le converter style du titre Converter Behavior 4-phase ack_in used as clock Four-phase acknowledge coordinates the conversion XOR between current and previous data used to generate the four-phase data directly. Performance Analysis Forward Path Data = Latch + XOR Gate Spacer =2 Latches + XOR Gate Backward Path 2 Latches + Xor & © CEA. All rights reserved DACLE Division| January 2015 | 16 2-phase to 4-phase (2) Cliquez pour modifier le converter style du titre 1 0 0 0 1 0 1 01 0 0 0 0 0 0 0 0 10 Data 2-phase Ack 10 T T Data 4-phase & Ack © CEA. All rights reserved DACLE Division| January 2015 | 17 Outline Cliquez pour modifier le style du titre Introduction 2-phase Data Link Converter Principle for 3D 2-phase Data Link Converter Design Experimental Results Conclusions & Perspectives & © CEA. All rights reserved DACLE Division| January 2015 | 18 Protocol Converterle Implementation Cliquez pour modifier style du titre Procotol Converter STMicroelectronics, 32nm Bulk technology Implemented as Hard Macro Exemple for a protocol converter of 64 bits Hard Macro Area : 7000 µm2 Implementation Flow Full std-cell design RTL like design with C-element instanciation Synthesis Place & Route With ad-hoc timing constraints in .sdc (small local timing hypothesis, not fully QDI) & © CEA. All rights reserved DACLE Division| January 2015 | 19 4-phase vs 2-phase Evaluation Cliquez pour modifier: Link le style du titre 4-phase (standard protocol) 2-phase (using 4/2 protocol converters) STMicro, 32 nm CMOS Typical Conditions Receiver 1 Segment – No Buffering Sender S Buffer S B & Maximum Link Delay for 1Gwords/second transmission R 2 Segments – 1 Buffer Stage B R 3 Segments – 2 Buffer Stages © CEA. All rights reserved DACLE Division| January 2015 | 20 2x2modifier ANoC Testcase Evaluation Cliquez pour le style du titre NoC Implementation 2x2 Asynchronous NoC (2D only, not 3D) Each router integrates a 4 2 protocol converter 32-bit data long links (2mm length, one segment only) 32nm STMicro – typical conditions Traffic Analysis R C C 2mm Data Link (1 segment) 256 NoC words 1 packet every 2-µsec Extract Average Latency & Throughput Values Leakage Power Dynamic Power Total Power Implementation Latency Throughput 4-phase ANoC (standard implementation) 3.58ns 947 MFlits 1.36 mW 7.415 mW 8.77 mW 4-phase ANoC (with Protocol Converters) 3.6ns 1.24 GFlits 1.39 mW 5.92 mW 7.32 mW Overhead +0.56% +31% +2.2% -20.2% -16.5% & © CEA. All rights reserved DACLE Division| January 2015 | 21 Outline Cliquez pour modifier le style du titre Introduction 2-phase Data Link Converter Principle for 3D 2-phase Data Link Converter Design Experimental Results Conclusions & Perspectives & © CEA. All rights reserved DACLE Division| January 2015 | 22 Protocol Converters : Related Cliquez pour modifier le style duWork titre Author & Reference Converter Four-phase Two-phase Comment McLaughlin, IEEE Trans. on VLSI, 2009 Complex Dual Rail LEDR Distributed Control P. McGee, Async’2008 Complex 1-of-n Level Encoded LETS Distributed Control C. LaFrieda, Async’2010 Complex Dual Rail LEDR Custom Cells M. Cannizzaro, Async’2012 Complex m-of-n PID – Level Encoded Distributed Control W. Athas, Patent US005388241 Simple Req/Ack Req/Ack Bundle-Data Local T. Hanyu, Patent US2013073771 Complex Dual-Rail Level Encoded Distributed Control T. Pontius, Patent US20080270875 Simple Req/Ack Req/Ack Bundle-Data Multi Node Control P. Vivet PhD document, 2001 Simple Dual Rail “1T-of-2” QDI, dedicated cells, Limited to dual-rail This Work Simple 1-of-n 1T-of-n QDI, Full Std-Cell Local Conversion & © CEA. All rights reserved DACLE Division| January 2015 | 23 Conclusions & style Perspectives Cliquez pour modifier le du titre New asynchronous protocol circuit converters Proposal of new code 1T-of-N (Transition signalling) New protocol converter with High throughput and Low latency Can achieve 1Gflit/s transfer Modular for wide data paths. Easy implementation : full std-cell. Good solution for 3D communication No global clock distribution, no synchronization between different dies Faster off-chip data communication compared to standard 4-phase Power saving by reducing number of transitions Perspectives Integrate the proposed protocol converters in a full 3D architecture Evaluate the protocol converters performances on a 3D link & © CEA. All rights reserved DACLE Division| January 2015 | 24 Acknowledgment Cliquez pour modifier le style du titre This work was funded thanks to the French national program 'programme d’Investissements d’Avenir, IRT Nanoelec' ANR-10-AIRT-05 Thanks ….. & Any questions ? © CEA. All rights reserved DACLE Division| January 2015 | 25