1-OF-N DATA LINKS - ASP

Transcription

1-OF-N DATA LINKS - ASP
TWO-PHASE PROTOCOL CONVERTERS
FOR 3D ASYNCHRONOUS
1-OF-N DATA LINKS
Julian Pontes, Pascal Vivet***, Yvain Thonnart
CEA-LETI, Minatec Campus, Grenoble, France
ASP-DAC’2015, Tokyo, Japan,
20 January 2015
www.cea.fr
&
3D Asynchronous
NoC
Cliquez pour modifier
le style du titre
Modular 3D NoC architecture for 3D stacking technology (TSV)
Stacking of multiple dies for multi-core design, as a 3D GALS template
3D is still un-mature technology, not well supported by CAD tools (TA, CTS, etc .)
Asynchronous interface to avoid 3D clocks distribution (Clocks local to each dies)
3D ANoC proposal
3D mesh based, using hierarchical routers (2D direction and 3D direction) [W. Lafi, DATE’11]
Full asynchronous QDI design
[Y. Thonnart, P. Vivet, DATE’2010] [F.Darve, Sheibanyrad, ISVLSI’2011]
2D NoC Router
3D NoC Router
Processing Unit
ANoC protocol
How to implement efficiently
the 3D Link physical layer ?
&
© CEA. All rights reserved
DACLE Division| January 2015
|2
Interface
Design
Challenge
Cliquez 3D
pour
modifier
le style
du titre
Data communication in 3D
Synchronous communication
Clock distribution
Power hungry
Complex timing closure = Chip to Chip modeling
Asynchronous Communication
No timing assumption = Easier Timing Closure
Performance problems due to long 3D link propagation delay
3D interconnect (µ-bumps, TSV, RDL, etc)
µ-buffer cell (buffer + ESD + level shifters, etc.)
Design-for-Test logic (Boundary Scan, etc)
µ-pillar
bump
Backside
RDL
Delay Problem
at 3D interface
&
© CEA. All rights reserved
DACLE Division| January 2015
|3
Outline
Cliquez pour modifier le style du
titre
Introduction
2-phase Data Link Converter Principle for 3D
2-phase Data Link Converter Design
Experimental Results
Conclusions & Perspectives
&
© CEA. All rights reserved
DACLE Division| January 2015
|4
Outline
Cliquez pour modifier le style du
titre
Introduction
2-phase Data Link Converter Principle for 3D
2-phase Data Link Converter Design
Experimental Results
Conclusions & Perspectives
&
© CEA. All rights reserved
DACLE Division| January 2015
|5
3D 2-phase
link architecture
(1)
Cliquez
pour data
modifier
le style du titre
Main Proposal
Use existing 4-phase protocol for on-chip NoC communication : 2D NoC layers
Use new 2-phase 1T-of-N protocol for off-chip NoC communication : 3D NoC link
Implement efficient protocol converters to reduce penalty of off-hip delays
Implementation
No Clocking at the 3D interface
Full robustness of 2D links and 3D links : no timing hypothesis
Implement the protocol converters as HardMacros (including light timing hypothesis)
&
© CEA. All rights reserved
DACLE Division| January 2015
|6
3D 2-phase
link architecture
(2)
Cliquez
pour data
modifier
le style du titre
•
•
2conversino
State of the Art
Cost of 4
Performance estimation :
• 3D Link Delay :
- Link delay = 3D line + µ-buffer + µ-bumps +
µ-buffer + test muxes
- Link delay ~ 6 - 8 gate delay
• 3D Link Cycle Time :
4-phase = Stage cycle time + 4 * Link Delay
2-phase = Stage cycle time + 2 * Link Delay
&
Simple off-chip protocol &
code, in order to reduce
penalty of off-chip delays
Reduce switching power of 3D
links (half transitions)
Our Target
On Chip
Communication
(4-phase better)
© CEA. All rights reserved
Off Chip
Communication
(2-phase better)
DACLE Division| January 2015
|7
Outline
Cliquez pour modifier le style du
titre
Introduction
2-phase Data Link Converter Principle for 3D
2-phase Data Link Converter Design
Experimental Results
Conclusions & Perspectives
&
© CEA. All rights reserved
DACLE Division| January 2015
|8
Asynchronous
Protocols and
Cliquez
pour modifier
le Data
styleEncoding
du titre
Asynchronous Design choices :
1. Timing Model
1.
2.
3.
4.
Asynchronous logic is without any clocks,
but so many different ways to implement it !
Delay Insensitive (Most Robust)
Quasi-Delay Insensitive (QDI) (Fully Robust)
Speed Independent (similar to QDI)
Bundled Data (similar to synchronous hyp.)
2. Handshake Protocol
Source : A. Yakovlev
1. Four phase
1.
2.
Easy to implement (for QDI type)
Can be used for Computation
2. Two phase
1.
2.
Faster and energy efficient
Can be used for communication
3. Data Encoding
1.
Systematic Encoding
1.
2.
Non systematic Encoding
1.
&
Additional codeword added to
data to reach unordered requirements
Delay Insensitive by construction
© CEA. All rights reserved
DACLE Division| January 2015
|9
4-phase
data communication
(1)
Cliquez
pour modifier
le style du titre
Return-to-Zero Protocol
Spacer used to separate two data
Handshake composed by four transactions
Data
TX
Two transactions to transfer data
Two transactions to transfer spacer
Cycle Time = at least equal to 4 times the link delay
RX
ack
4-phase protocol :
this is a level encoded logic, simple logic used for computation
&
© CEA. All rights reserved
DACLE Division| January 2015
| 10
4-phase
data communication
(2)
Cliquez
pour modifier
le style du titre
For 4-phase protocol, usage of general codes : m-of-n
It is a Delay Insensitive (DI) Code Subclass
Often, it is restricted to 1-of-N codes
Example:
1-of-2 (Dual Rail) [Standard implementation]
Use of 2 signals, to encode 1 bit : one rail to encode 0, one rail to encode 1.
Value
A1
A0
Spacer
0
0
‘0’
0
1
‘1’
1
0
1-of-4 (Four Rail) [Standard implementation, for communication]
Use of 4 signals, to encode 2 bits, 1 rail per value in [0-3]
Better power consumption, compared to dual rail :
half the transition (1 transition of 1-of-4 instead of 2 transitions of 1-of2),
for approximately the same logic complexity.
&
Value
A3
A2
A1
A0
Spacer
0
0
0
0
‘0’
0
0
0
1
‘1’
0
0
1
0
‘2’
0
1
0
0
‘3’
1
0
0
0
Use 1-of-4 / 4-phase
for on-chip NoC communication
existing ANoC design
[Y. Thonnart, P. Vivet, DATE’2010]
© CEA. All rights reserved
DACLE Division| January 2015
| 11
2-phase
data communication
(1)
Cliquez
pour modifier
le style du titre
Non Return-to Zero Protocol
No Spacer
Handshake composed by two transactions
Data
TX
One transaction to transfer data
One transaction to transfer ack
Cycle Time = at least equal to 2 times the link delay
RX
ack
2 phase : requires transition detection logic,
classically used for off-chip communication (like a 3D Link)
&
© CEA. All rights reserved
DACLE Division| January 2015
| 12
2-phase
data communication
(2)
Cliquez
pour modifier
le style du titre
For 2-phase protocol, choice of different encoding :
Level Encoding (also called LEDR for dual rail [R. Manohar], and LETS for its 1-of-n generalization [S. Nowick])
Transmission is divided in two phases : Odd and even
Convertion to binary
is facilitated
Detection is performed using XOR gates
Even Phase
Odd Phase
Value
A3
A2
A1
A0
Value
A3
A2
A1
A0
‘0’
0
0
0
0
‘0’
0
0
0
1
‘1’
0
0
1
1
‘1’
0
0
1
0
‘2’
0
1
0
1
‘2’
0
1
0
0
‘3’
1
0
0
1
‘3’
1
0
0
0
Transition Signaling
Called 1T-of-n : one transition on Rail i indicate the i value
Was already existing for dual-rail, but not widely used.
Generalization proposed for 1-of-n
Value
&
A1
A0
Value
A3
A2
A1
A0
‘0’
-
-
-
T
‘1’
-
-
T
-
‘2’
-
T
-
-
‘3’
T
-
-
-
‘0’
-
T
‘1’
T
-
- A0 is used to transport odd/even phase
- Other bits to transport the value
Convertion to 1-of-n 4-phase
is facilitated
Proposal :
Use 2-phase / 1T-of-4
for off-chip communication
(3D Link)
© CEA. All rights reserved
DACLE Division| January 2015
| 13
4-phase
to 2-phase
(1)
Cliquez
pour modifier
le converter
style du titre
Converter Behavior
4-phase data used as clock
Input Data Rising edge generates a twophase transition
Input Data Falling edge is ignored on twophase side. Used to generate the
acknowledgement signal
C-element synchronizes four-phase and
two-phase
Performance Analysis
Forward Path
Flip-Flop
Backward Path
Spacer Detection = Xor + C-element
Data Detection = 2 Xor gates for detection
+ Xor Gate + Inverter + C-element
&
© CEA. All rights reserved
DACLE Division| January 2015
| 14
4-phase
to 2-phase
(2)
Cliquez
pour modifier
le converter
style du titre
1
1
0
1
0
1
0
1
0
0
1
0
0
0
1
0
0
1
Data
4-phase
Ack
Data
2-phase
Ack
T
T
&
© CEA. All rights reserved
DACLE Division| January 2015
| 15
2-phase
4-phase
(1)
Cliquez
pour to
modifier
le converter
style du titre
Converter Behavior
4-phase ack_in used as clock
Four-phase acknowledge coordinates the
conversion
XOR between current and previous data
used to generate the four-phase data
directly.
Performance Analysis
Forward Path
Data = Latch + XOR Gate
Spacer =2 Latches + XOR Gate
Backward Path
2 Latches + Xor
&
© CEA. All rights reserved
DACLE Division| January 2015
| 16
2-phase
to 4-phase
(2)
Cliquez
pour modifier
le converter
style du titre
1
0
0
0
1
0
1
01
0
0
0
0
0
0
0
0
10
Data
2-phase
Ack
10
T
T
Data
4-phase
&
Ack
© CEA. All rights reserved
DACLE Division| January 2015
| 17
Outline
Cliquez pour modifier le style du
titre
Introduction
2-phase Data Link Converter Principle for 3D
2-phase Data Link Converter Design
Experimental Results
Conclusions & Perspectives
&
© CEA. All rights reserved
DACLE Division| January 2015
| 18
Protocol
Converterle
Implementation
Cliquez
pour modifier
style du titre
Procotol Converter
STMicroelectronics, 32nm Bulk technology
Implemented as Hard Macro
Exemple for a protocol converter of 64 bits
Hard Macro Area : 7000 µm2
Implementation Flow
Full std-cell design
RTL like design
with C-element instanciation
Synthesis
Place & Route
With ad-hoc timing
constraints in .sdc
(small local timing hypothesis,
not fully QDI)
&
© CEA. All rights reserved
DACLE Division| January 2015
| 19
4-phase
vs 2-phase
Evaluation
Cliquez
pour
modifier: Link
le style
du titre
4-phase
(standard protocol)
2-phase
(using 4/2 protocol converters)
STMicro, 32 nm CMOS
Typical Conditions
Receiver 1 Segment – No Buffering
Sender
S
Buffer
S
B
&
Maximum Link Delay
for 1Gwords/second transmission
R 2 Segments – 1 Buffer Stage
B
R 3 Segments – 2 Buffer Stages
© CEA. All rights reserved
DACLE Division| January 2015
| 20
2x2modifier
ANoC Testcase
Evaluation
Cliquez pour
le style
du titre
NoC Implementation
2x2 Asynchronous NoC (2D only, not 3D)
Each router integrates a 4 2 protocol converter
32-bit data long links (2mm length, one segment only)
32nm STMicro – typical conditions
Traffic Analysis
R
C
C
2mm Data Link
(1 segment)
256 NoC words
1 packet every 2-µsec
Extract Average Latency & Throughput Values
Leakage
Power
Dynamic
Power
Total
Power
Implementation
Latency
Throughput
4-phase ANoC
(standard implementation)
3.58ns
947 MFlits
1.36 mW 7.415 mW
8.77 mW
4-phase ANoC
(with Protocol Converters)
3.6ns
1.24 GFlits
1.39 mW
5.92 mW
7.32 mW
Overhead
+0.56%
+31%
+2.2%
-20.2%
-16.5%
&
© CEA. All rights reserved
DACLE Division| January 2015
| 21
Outline
Cliquez pour modifier le style du
titre
Introduction
2-phase Data Link Converter Principle for 3D
2-phase Data Link Converter Design
Experimental Results
Conclusions & Perspectives
&
© CEA. All rights reserved
DACLE Division| January 2015
| 22
Protocol
Converters
: Related
Cliquez
pour
modifier le
style duWork
titre
Author & Reference
Converter
Four-phase
Two-phase
Comment
McLaughlin,
IEEE Trans. on VLSI, 2009
Complex
Dual Rail
LEDR
Distributed
Control
P. McGee,
Async’2008
Complex
1-of-n
Level Encoded LETS
Distributed
Control
C. LaFrieda,
Async’2010
Complex
Dual Rail
LEDR
Custom Cells
M. Cannizzaro,
Async’2012
Complex
m-of-n
PID – Level
Encoded
Distributed
Control
W. Athas,
Patent US005388241
Simple
Req/Ack
Req/Ack
Bundle-Data
Local
T. Hanyu,
Patent US2013073771
Complex
Dual-Rail
Level Encoded
Distributed
Control
T. Pontius,
Patent US20080270875
Simple
Req/Ack
Req/Ack
Bundle-Data
Multi Node Control
P. Vivet
PhD document, 2001
Simple
Dual Rail
“1T-of-2”
QDI, dedicated cells,
Limited to dual-rail
This Work
Simple
1-of-n
1T-of-n
QDI, Full Std-Cell
Local Conversion
&
© CEA. All rights reserved
DACLE Division| January 2015
| 23
Conclusions
& style
Perspectives
Cliquez pour
modifier le
du titre
New asynchronous protocol circuit converters
Proposal of new code 1T-of-N (Transition signalling)
New protocol converter with High throughput and Low latency
Can achieve 1Gflit/s transfer
Modular for wide data paths. Easy implementation : full std-cell.
Good solution for 3D communication
No global clock distribution, no synchronization between different dies
Faster off-chip data communication compared to standard 4-phase
Power saving by reducing number of transitions
Perspectives
Integrate the proposed protocol converters
in a full 3D architecture
Evaluate the protocol converters
performances on a 3D link
&
© CEA. All rights reserved
DACLE Division| January 2015
| 24
Acknowledgment
Cliquez pour modifier
le style du titre
This work was funded thanks to the French national
program 'programme d’Investissements d’Avenir, IRT
Nanoelec' ANR-10-AIRT-05
Thanks …..
&
Any questions ?
© CEA. All rights reserved
DACLE Division| January 2015
| 25

Documents pareils