Magny Cours
Transcription
Magny Cours
Bla de Com pu t in g w it h t h e AM D Opt e r on TM Pr oce ssor ( “M a gn y- Cou r s”) Pat Conway ( Present er) Nat han Kalyanasundharam Gregg Donley Kevin Lepak Bill Hughes Age nda Processor Archit ect ure AMD driving t he x86 64- bit processor evolut ion Driving forces behind t he Twelve- Core AMD Opt eron™ processor codenam ed “ Magny- Cours” CPU silicon MCM 2.0 package, speeds and feeds Perform ance and scalabilit y 2P/ 4P blade and rack t opologies HyperTransport ™ t echnology HT Assist design – – – – Cache coherence prot ocol Transact ion scenarios and frequencies Coverage rat io Mem ory lat ency and bandwidt h A look ahead 2 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 x 8 6 6 4 - bit Ar chit e ct ur e Evolut ion 2003 2005 2007 2008 2010 2009 AM D Opt e r on™ AM D Opt e r on™ “Ba r ce lona ” “Sha ngha i” “I st a nbul” “M a gny- Cour s” 90nm SOI 90nm SOI 65nm SOI 45nm SOI 45nm SOI 45nm SOI K8 K8 Greyhound Greyhound+ Greyhound+ Greyhound+ L2 / L3 1MB/ 0 1MB/ 0 512kB/ 2MB 512kB/ 6MB 512kB/ 6MB 512kB/ 12MB H ype r Tr a nspor t ™ Te chnology 3x 1.6GT/ .s 3x 1.6GT/ .s 3x 2GT/ s 3x 4.0GT/ s 3x 4.8GT/ s 4x 6.4GT/ s M e m or y 2x DDR1 300 2x DDR1 400 2x DDR2 667 2x DDR2 800 2x DDR2 1066 4x DDR3 1333 M fg. Pr oce ss CPU Cor e M a x Pow e r Budge t Re m a ins Consist e nt 3 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Perform ance relat ive t o original AMD Opt eron™ Processor D r a m a t ic Ba ck - t o- ba ck Ga ins 50 45 40 35 30 25 20 15 10 5 0 Pla nne d Float ing Point 2003 2004 Single Core 2005 I nt eger 2006 Dual Core 2007 2008 Quad Core 2009 “ I st anbul” 6 core 2010 “ Magny- Cours” * 12 core “Sh a n gh a i” t o “I st a n bu l” de live r s 3 4 % m or e pe r for m a n ce in t h e sa m e pow e r e n ve lope * “ Magny- Cours” and Fut ure silicon dat a is based on AMD proj ect ions 4 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 2011 Fut ure silicon D r iving For ce s Be hind “M a gny- Cour s” Se r ve r Th r ou gh pu t Exploit t hread level parallelism Leverage Direct ly Connect ed MCM 2.0 Maxim ize com put e densit y in 2P/ 4P blades and racks Vir t u a liza t ion Run m ore VMs per server Provide hardware cont ext ( t hread) based QOS En e r gy Pr opor t ion a l Com pu t in g More perform ance, sam e power envelope Power conservat ion when idle Design efficiency – “ Magny- Cours” silicon sam e as “ I st anbul” – Can help speed qualificat ion t im es and cust om ers’ t im e t o m arket Econ om ics Reasonable die size perm it s 2 die per ret icle ( Yield ⇑ Manufact uring Cost ⇓) – Yield im provem ent s can help ensure supply chain st abilit y – Manufact uring cost savings ult im at ely benefit cust om ers 5 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 “M a gny- Cour s” Silicon H T0 Link L3 CACH E CORE 0 H CORE 1 L2 T CORE 2 D L2 CACH E 2 D N ORTH BRI D GE H T CORE 5 CORE 4 CORE 3 3 R L2 L2 L3 CACH E CACH E H T1 Link 6 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 M CM 2 .0 Logica l Vie w G3 4 Sock e t “ Magny- Cours” ut ilizes a D ir e ct ly Con n e ct e d MCM DDR3 Mem ory Channel Package has 12 cores, 4 HT port s, & 4 m em ory channels Die ( Node) has 6 cores, 4 HT port s & 2 m em ory channels x16 cHT P0 x16 ( NC) x8 cHT P1 x16 cHT 7 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Topologie s 2P 4P P2 P0 P6 P2 P3 P7 P0 P1 P3 P4 I/ O I/ O P1 I/ O x16 Diam et er 1 Avg Diam 0.75 DRAM BW 85.6 GB/ s XFI RE BW 71.7 GB/ s ( * ) I/ O P5 I/ O x16 Diam et er 2 Avg Diam 1.25 DRAM BW 170.4 GB/ s XFI RE BW 143.4 GB/ s ( * ) XFI RE BW“is t he m axim um available coherent m em ory bandwidt he HT links2009 were t he only 8 | The AMD MagnyCours” Processor | Hot Chipsh |if tAugust Each node accesses it s own m em ory and t hat of every ot her node in an int erleaved fashion. lim it ing fact or. I/ O Block D ia gr a m “ Magny- Cours” Die ( Node) Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 512kB L2 512kB L2 512kB L2 512kB L2 512kB L2 512kB L2 Syst em Request I nt erface ( SRI ) L3 t ag L3 dat a array ( 6MB) XBAR Mem ory Cont roller MCT/ DCT DRAM DRAM DRAM DRAM DRAM DRAM Probe Filt er 4 HyperTransport ™3 Technology Port s 9 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 H ype r Tr a nspor t ™ Te chnology H T Assist ( Pr obe Filt e r ) “ Old” broadcast prot ocol Key enabling t echnology on “ I st anbul” and “ Magny- Cours” HT Assist is a sparse direct ory cache Associat ed wit h t he m em ory cont roller on t he hom e node Tracks all lines cached in t he syst em from t he hom e node Elim inat es m ost probe broadcast s ( see diagram ) Lowers lat ency – local accesses get local DRAM lat ency, no need t o wait for probe responses – less queuing delay due t o lower HT t raffic overhead I ncreases syst em bandwidt h by reducing probe t raffic 1 0 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Hom e Node Pr obes RdBlk RdBlk Request Request Resps Req Node PF – clean dat a PF Lookup Hom e Node DRAM Resp Req Node PF – dirt y dat a PF Lookup Hom e Node Direct ed Pr obe Req Node Cache Owner Resp W he r e D o W e Put t he H T Assist Pr obe Filt e r ? Q: Where do we st ore probe filt er ent ries wit hout adding a large on- chip probe filt er RAM which is not used in a 1P deskt op syst em ? A: St eal 1MB of 6MB L3 cache per die in “ Magny- Cours” syst em s way 0 way 1 way 15 L3 cache Dir set I m plem ent at ion in fast SRAM ( L3) m inim izes – Access lat ency – Port occupancy of read- m odify- writ e operat ions – I ndirect ion lat ency for cache- t o- cache t ransfers 1 1 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 For m a t of a Pr obe Filt e r Ent r y 16 probe filt er ent ries per L3 cache line ( 64B) , 4B per ent ry, 4- way set associat ive 1MB of a 6MB L3 cache per die holds 256k probe filt er ent ries and covers 16MB of cache 4 ways Entry 0 Entry 1 Entry 2 Entry 3 L3 Cache Line (64B) 4 sets Entry 12 Probe Filter Entry (4B) Tag Entry 13 Entry 14 State Entry 15 Owner EM, O, S, S1 states 1 2 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Ca che Cohe r e nce Pr ot ocol Track lines in M, E, O or S st at e in probe filt er PF is fully inclusive of all cached dat a in syst em - if a line is cached, t hen a PF ent ry m ust exist . Presence of probe filt er ent ry says line in M, E, O or S st at e Absence of probe filt er ent ry says line is uncached New m essages – Direct ed probe on probe filt er hit – Replacem ent not ificat ion E - > I ( clean VicBlk) 1 3 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Pr obe Filt e r Tr a n sa ct ion Sce n a r ios PF H it PF M iss ( * ) I O S S1 EM I O S S1 EM FETCH - D - - D - B B DI DI LOAD - D - - D - B B DI DI STORE - B B B DI - B B DI DI Legend - Filt ered D Direct ed DI Direct ed I nvalidat e B Broadcast I nvalidat e “ Effect ive” “ I neffect ive” ( * ) PF m iss im plies line is Uncached ( no broadcast necessary) . St at e refers t o t he st at e of t he line t o be replaced upon allocat ion of new PF ent ry. Tr a dit ion a l “Ca ch e H it Ra t io” doe s n ot m e a su r e e ffe ct ive n e ss of pr obe filt e r 1 4 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Pr obe Filt e r Cove r a ge Ra t io Mem ory 0 Dir 0 256k lines 5MB L3 + 3MB L2 128k lines P0 5MB L3+ 3MB L2 128k lines Dir 2 256k lines Mem ory 3 P2 Typical Worst case ( Hot spot t ing) Uniform ly dist ribut ed dat a Hom e node of each cached line is P0 Coverage rat io = 256k : : 128k = 2 .0 x Wit h sharing, a PF ent ry m ay t rack m ult iple cached copies and t he coverage rat io increases Mem ory 1 Dir 1 256k lines Coverage rat io = 256k : : 128k * 4 = 0 .5 x P1 5MB L3 + 3MB L2 128k lines P3 Dir 3 256k lines 5MB L3 + 3MBL2 128k lines Mem ory 3 2 Socket “ Magny- Cours” 1 5 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 H T Assist a nd M e m or y La t e ncy Wit h “ old” broadcast coherence prot ocol, t he lat ency of a m em ory access is t he longer of 2 pat hs: t im e it t akes t o ret urn dat a from DRAM and t he t im e it t akes t o probe all caches Wit h HT Assist , local m em ory lat ency is significant ly reduced as it is not necessary t o probe caches on ot her nodes. Several server workloads nat urally have ~ 100% local accesses SPECint ® , SPECfp® VMMARKTM t ypically run wit h 1 VM per core SPECpower_ssj ® wit h 1 JVM per core STREAM Pr obe Filt e r a m plifie s be n e fit of a n y N UM A opt im iza t ion s in OS/ a pplica t ion w h ich m a k e m e m or y a cce sse s loca l SPEC, SPECint , SPECfp, and SPECpower_ssj are t radem arks or regist ered t radem arks of t he St andard Perform ance Evaluat ion Corporat ion. 1 6 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 A Look Ahe a d Socket com pat ible upgrade t o “ Magny- Cours” is planned wit h More cores for addit ional t hread- level paralleism More cache t o m aint ain cache- per- core balance Sam e power envelope Finer grain power m anagem ent New processor core ( “ Bulldozer ” ) Planned brand new x86 64- bit m icroarchit ect ure 32nm design I nst ruct ion set ext ensions Higher m em ory level parallelism 1 7 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 Th a n k you ! 1 8 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009 D iscla im e r & At t r ibut ion D I SCLAI M ER The inform at ion present ed in t his docum ent is for inform at ional purposes only and m ay cont ain t echnical inaccuracies, om issions and t ypographical errors. The inform at ion cont ained herein is subj ect t o change and m ay be rendered inaccurat e for m any reasons, including but not lim it ed t o product and roadm ap changes, com ponent and m ot herboard version changes, new m odel and/ or product releases, product differences bet ween differing m anufact urers, soft ware changes, BI OS flashes, firm ware upgrades, or t he like. AMD assum es no obligat ion t o updat e or ot herwise correct or revise t his inform at ion. However, AMD reserves t he right t o revise t his inform at ion and t o m ake changes from t im e t o t im e t o t he cont ent hereof wit hout obligat ion of AMD t o not ify any person of such revisions or changes. AMD MAKES NO REPRESENTATI ONS OR WARRANTI ES WI TH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSI BI LI TY FOR ANY I NACCURACI ES, ERRORS OR OMI SSI ONS THAT MAY APPEAR I N THI S I NFORMATI ON. AMD SPECI FI CALLY DI SCLAI MS ANY I MPLI ED WARRANTI ES OF MERCHANTABI LI TY OR FI TNESS FOR ANY PARTI CULAR PURPOSE. I N NO EVENT WI LL AMD BE LI ABLE TO ANY PERSON FOR ANY DI RECT, I NDI RECT, SPECI AL OR OTHER CONSEQUENTI AL DAMAGES ARI SI NG FROM THE USE OF ANY I NFORMATI ON CONTAI NED HEREI N, EVEN I F AMD I S EXPRESSLY ADVI SED OF THE POSSI BI LI TY OF SUCH DAMAGES. ATTRI BUTI ON © 2009 Advanced Micro Devices, I nc. All right s reserved. AMD, t he AMD Arrow logo, AMD Opt eron, and com binat ions t hereof are t radem arks of Advanced Micro Devices, I nc. HyperTransport is a licensed t radem ark of t he HyperTransport Technology Consort ium . Ot her nam es are for inform at ional purposes only and m ay be t radem arks of t heir respect ive owners. SPEC, SPECint , SPECfp, and SPECpower_ssj are t radem arks or regist ered t radem arks of t he St andard Perform ance Evaluat ion Corporat ion. 1 9 | The AMD “ Magny- Cours” Processor | Hot Chips | August 2009