Magny Cours

Transcription

Magny Cours
Bla de Com pu t in g w it h t h e
AM D Opt e r on TM Pr oce ssor ( “M a gn y- Cou r s”)
Pat Conway ( Present er)
Nat han Kalyanasundharam
Gregg Donley
Kevin Lepak
Bill Hughes
Age nda
Processor Archit ect ure
 AMD driving t he x86 64- bit processor evolut ion
 Driving forces behind t he Twelve- Core AMD
Opt eron™ processor codenam ed “ Magny- Cours”
 CPU silicon
 MCM 2.0 package, speeds and feeds
Perform ance and scalabilit y
 2P/ 4P blade and rack t opologies
 HyperTransport ™ t echnology HT Assist design
–
–
–
–
Cache coherence prot ocol
Transact ion scenarios and frequencies
Coverage rat io
Mem ory lat ency and bandwidt h
A look ahead
2 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
x 8 6 6 4 - bit Ar chit e ct ur e Evolut ion
2003
2005
2007
2008
2010
2009
AM D
Opt e r on™
AM D
Opt e r on™
“Ba r ce lona ”
“Sha ngha i”
“I st a nbul”
“M a gny- Cour s”
90nm SOI
90nm SOI
65nm SOI
45nm SOI
45nm SOI
45nm SOI
K8
K8
Greyhound
Greyhound+
Greyhound+
Greyhound+
L2 / L3
1MB/ 0
1MB/ 0
512kB/ 2MB
512kB/ 6MB
512kB/ 6MB
512kB/ 12MB
H ype r
Tr a nspor t ™
Te chnology
3x 1.6GT/ .s
3x 1.6GT/ .s
3x 2GT/ s
3x 4.0GT/ s
3x 4.8GT/ s
4x 6.4GT/ s
M e m or y
2x DDR1 300
2x DDR1 400
2x DDR2 667
2x DDR2 800
2x DDR2 1066
4x DDR3 1333
M fg.
Pr oce ss
CPU Cor e
M a x Pow e r Budge t Re m a ins Consist e nt
3 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Perform ance relat ive t o original
AMD Opt eron™ Processor
D r a m a t ic Ba ck - t o- ba ck Ga ins
50
45
40
35
30
25
20
15
10
5
0
Pla nne d
Float ing Point
2003
2004
Single Core
2005
I nt eger
2006
Dual Core
2007
2008
Quad Core
2009
“ I st anbul”
6 core
2010
“ Magny- Cours” *
12 core
“Sh a n gh a i” t o “I st a n bu l” de live r s 3 4 %
m or e pe r for m a n ce in t h e sa m e pow e r e n ve lope
* “ Magny- Cours” and Fut ure silicon dat a is based on AMD proj ect ions
4 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
2011
Fut ure silicon
D r iving For ce s Be hind “M a gny- Cour s”
Se r ve r
Th r ou gh pu t
 Exploit t hread level parallelism
 Leverage Direct ly Connect ed MCM 2.0
 Maxim ize com put e densit y in 2P/ 4P blades and racks
Vir t u a liza t ion  Run m ore VMs per server
 Provide hardware cont ext ( t hread) based QOS
En e r gy
Pr opor t ion a l
Com pu t in g
 More perform ance, sam e power envelope
 Power conservat ion when idle
 Design efficiency – “ Magny- Cours” silicon sam e as
“ I st anbul”
– Can help speed qualificat ion t im es and cust om ers’ t im e t o m arket
Econ om ics
 Reasonable die size perm it s 2 die per ret icle
( Yield ⇑ Manufact uring Cost ⇓)
– Yield im provem ent s can help ensure supply chain st abilit y
– Manufact uring cost savings ult im at ely benefit cust om ers
5 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
“M a gny- Cour s” Silicon
H T0 Link
L3
CACH E
CORE 0
H
CORE 1
L2
T
CORE 2
D
L2
CACH E
2
D
N ORTH BRI D GE
H
T
CORE 5
CORE 4
CORE 3
3
R
L2
L2
L3
CACH E
CACH E
H T1 Link
6 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
M CM 2 .0 Logica l Vie w
G3 4 Sock e t
“ Magny- Cours” ut ilizes a
D ir e ct ly Con n e ct e d MCM
DDR3 Mem ory
Channel
Package has 12 cores,
4 HT port s, & 4 m em ory
channels
Die ( Node) has 6 cores,
4 HT port s & 2 m em ory
channels
x16 cHT
P0
x16
( NC)
x8 cHT
P1
x16 cHT
7 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Topologie s
2P
4P
P2
P0
P6
P2
P3
P7
P0
P1
P3
P4
I/ O
I/ O
P1
I/ O
x16
Diam et er 1
Avg Diam 0.75
DRAM BW 85.6 GB/ s
XFI RE BW 71.7 GB/ s ( * )
I/ O
P5
I/ O
x16
Diam et er 2
Avg Diam 1.25
DRAM BW 170.4 GB/ s
XFI RE BW 143.4 GB/ s
( * ) XFI
RE BW“is
t he m axim
um available
coherent m em
ory bandwidt
he HT links2009
were t he only
8 | The
AMD
MagnyCours”
Processor
| Hot
Chipsh |if tAugust
Each node accesses it s own m em ory and t hat of every ot her node in an int erleaved fashion.
lim it ing fact or.
I/ O
Block D ia gr a m
“ Magny- Cours” Die ( Node)
Core
0
Core
1
Core
2
Core
3
Core
4
Core
5
512kB
L2
512kB
L2
512kB
L2
512kB
L2
512kB
L2
512kB
L2
Syst em Request I nt erface ( SRI )
L3 t ag
L3 dat a array
( 6MB)
XBAR
Mem ory
Cont roller
MCT/ DCT
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
Probe
Filt er
4 HyperTransport ™3 Technology Port s
9 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
H ype r Tr a nspor t ™ Te chnology H T Assist
( Pr obe Filt e r )
“ Old” broadcast prot ocol
Key enabling t echnology on “ I st anbul” and
“ Magny- Cours”
HT Assist is a sparse direct ory cache
 Associat ed wit h t he m em ory cont roller on
t he hom e node
 Tracks all lines cached in t he syst em from
t he hom e node
Elim inat es m ost probe broadcast s ( see diagram )
 Lowers lat ency
– local accesses get local DRAM
lat ency, no need t o wait for probe
responses
– less queuing delay due t o lower HT
t raffic overhead
 I ncreases syst em bandwidt h by reducing
probe t raffic
1 0 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Hom e
Node Pr obes
RdBlk
RdBlk
Request
Request
Resps
Req
Node
PF – clean dat a
PF
Lookup
Hom e
Node
DRAM
Resp
Req
Node
PF – dirt y dat a
PF
Lookup
Hom e
Node
Direct ed
Pr obe
Req
Node
Cache Owner
Resp
W he r e D o W e Put t he H T Assist Pr obe Filt e r ?
Q: Where do we st ore probe filt er ent ries wit hout adding
a large on- chip probe filt er RAM which is not used in a
1P deskt op syst em ?
A: St eal 1MB of 6MB L3 cache per die in “ Magny- Cours”
syst em s
way 0 way 1
way 15
L3 cache
Dir
set
I m plem ent at ion in fast SRAM ( L3) m inim izes
– Access lat ency
– Port occupancy of read- m odify- writ e operat ions
– I ndirect ion lat ency for cache- t o- cache t ransfers
1 1 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
For m a t of a Pr obe Filt e r Ent r y
 16 probe filt er ent ries per L3 cache line ( 64B) , 4B per
ent ry, 4- way set associat ive
 1MB of a 6MB L3 cache per die holds 256k probe filt er ent ries
and covers 16MB of cache
4 ways
Entry 0
Entry 1
Entry 2
Entry 3
L3 Cache Line (64B)
4 sets
Entry 12
Probe Filter Entry (4B)
Tag
Entry 13
Entry 14
State
Entry 15
Owner
EM, O, S, S1 states
1 2 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Ca che Cohe r e nce Pr ot ocol
 Track lines in M, E, O or S st at e in probe filt er
 PF is fully inclusive of all cached dat a in syst em
- if a line is cached, t hen a PF ent ry m ust exist .
 Presence of probe filt er ent ry says line in M, E, O or
S st at e
 Absence of probe filt er ent ry says line is uncached
 New m essages
– Direct ed probe on probe filt er hit
– Replacem ent not ificat ion E - > I ( clean VicBlk)
1 3 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Pr obe Filt e r Tr a n sa ct ion Sce n a r ios
PF H it
PF M iss ( * )
I
O
S
S1
EM
I
O
S
S1
EM
FETCH
-
D
-
-
D
-
B
B
DI
DI
LOAD
-
D
-
-
D
-
B
B
DI
DI
STORE
-
B
B
B
DI
-
B
B
DI
DI
Legend
-
Filt ered
D
Direct ed
DI
Direct ed I nvalidat e
B
Broadcast I nvalidat e
“ Effect ive”
“ I neffect ive”
( * ) PF m iss im plies line is Uncached ( no broadcast necessary) . St at e refers t o t he st at e
of t he line t o be replaced upon allocat ion of new PF ent ry.
Tr a dit ion a l “Ca ch e H it Ra t io” doe s n ot m e a su r e
e ffe ct ive n e ss of pr obe filt e r
1 4 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Pr obe Filt e r Cove r a ge Ra t io
Mem ory 0
Dir 0
256k lines
5MB L3 + 3MB L2
128k lines
P0
5MB L3+ 3MB L2
128k lines
Dir 2
256k lines
Mem ory 3
P2
Typical
Worst case ( Hot spot t ing)
Uniform ly dist ribut ed dat a
Hom e node of each cached line is P0
Coverage rat io = 256k : : 128k
= 2 .0 x
Wit h sharing, a PF ent ry m ay
t rack m ult iple cached copies and
t he coverage rat io increases
Mem ory 1
Dir 1
256k lines
Coverage rat io = 256k : : 128k * 4
= 0 .5 x
P1
5MB L3 + 3MB L2
128k lines
P3
Dir 3
256k lines
5MB L3 + 3MBL2
128k lines
Mem ory 3
2 Socket “ Magny- Cours”
1 5 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
H T Assist a nd M e m or y La t e ncy
Wit h “ old” broadcast coherence prot ocol, t he lat ency of a
m em ory access is t he longer of 2 pat hs:

t im e it t akes t o ret urn dat a from DRAM and

t he t im e it t akes t o probe all caches
Wit h HT Assist , local m em ory lat ency is significant ly reduced as
it is not necessary t o probe caches on ot her nodes.
Several server workloads nat urally have ~ 100% local accesses
 SPECint ® , SPECfp®
 VMMARKTM t ypically run wit h 1 VM per core
 SPECpower_ssj ® wit h 1 JVM per core
 STREAM
Pr obe Filt e r a m plifie s be n e fit of a n y N UM A opt im iza t ion s in
OS/ a pplica t ion w h ich m a k e m e m or y a cce sse s loca l
SPEC, SPECint , SPECfp, and SPECpower_ssj are t radem arks or regist ered t radem arks of t he St andard Perform ance Evaluat ion Corporat ion.
1 6 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
A Look Ahe a d
Socket com pat ible upgrade t o “ Magny- Cours” is planned wit h
 More cores for addit ional t hread- level paralleism
 More cache t o m aint ain cache- per- core balance
 Sam e power envelope
 Finer grain power m anagem ent
New processor core ( “ Bulldozer ” )
 Planned brand new x86 64- bit m icroarchit ect ure
 32nm design
 I nst ruct ion set ext ensions
 Higher m em ory level parallelism
1 7 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
Th a n k you !
1 8 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009
D iscla im e r & At t r ibut ion
D I SCLAI M ER
The inform at ion present ed in t his docum ent is for inform at ional purposes only and m ay cont ain t echnical
inaccuracies, om issions and t ypographical errors.
The inform at ion cont ained herein is subj ect t o change and m ay be rendered inaccurat e for m any reasons, including
but not lim it ed t o product and roadm ap changes, com ponent and m ot herboard version changes, new m odel and/ or
product releases, product differences bet ween differing m anufact urers, soft ware changes, BI OS flashes, firm ware
upgrades, or t he like. AMD assum es no obligat ion t o updat e or ot herwise correct or revise t his inform at ion.
However, AMD reserves t he right t o revise t his inform at ion and t o m ake changes from t im e t o t im e t o t he cont ent
hereof wit hout obligat ion of AMD t o not ify any person of such revisions or changes.
AMD MAKES NO REPRESENTATI ONS OR WARRANTI ES WI TH RESPECT TO THE CONTENTS HEREOF AND ASSUMES
NO RESPONSI BI LI TY FOR ANY I NACCURACI ES, ERRORS OR OMI SSI ONS THAT MAY APPEAR I N THI S I NFORMATI ON.
AMD SPECI FI CALLY DI SCLAI MS ANY I MPLI ED WARRANTI ES OF MERCHANTABI LI TY OR FI TNESS FOR ANY
PARTI CULAR PURPOSE. I N NO EVENT WI LL AMD BE LI ABLE TO ANY PERSON FOR ANY DI RECT, I NDI RECT, SPECI AL
OR OTHER CONSEQUENTI AL DAMAGES ARI SI NG FROM THE USE OF ANY I NFORMATI ON CONTAI NED HEREI N, EVEN
I F AMD I S EXPRESSLY ADVI SED OF THE POSSI BI LI TY OF SUCH DAMAGES.
ATTRI BUTI ON
© 2009 Advanced Micro Devices, I nc. All right s reserved. AMD, t he AMD Arrow logo, AMD Opt eron, and
com binat ions t hereof are t radem arks of Advanced Micro Devices, I nc. HyperTransport is a licensed t radem ark of
t he HyperTransport Technology Consort ium . Ot her nam es are for inform at ional purposes only and m ay be
t radem arks of t heir respect ive owners.
SPEC, SPECint , SPECfp, and SPECpower_ssj are t radem arks or regist ered t radem arks of t he St andard Perform ance
Evaluat ion Corporat ion.
1 9 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009