Barriers: Friend or Foe?
Transcription
Barriers: Friend or Foe?
Barriers: Friend or Foe? Steve Blackburn Tony Hosking Department of Computer Science Department of Computer Sciences Australian National University Purdue University Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Read & Write Barrier Costs Are r/w barrier costs significant? Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Read and Write Barriers • Algorithmically powerful mechanisms – Extend semantics of each read/write • Particularly useful to GC • Untested assumption: “read/write barriers are expensive” – Curtails creativity in GC algorithm development – Encourages (unnecessary?) work on avoidance • Prior work – [Zorn 1990] (used simulation & traces) – [Blackburn & McKinley 2002] (compilation & inlining) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Our Contributions • Methodology for measurement • Evaluate mutator overhead – 5 common w/b, 2 r/b – 9 benchmarks – 3 architectures (AMD, P4, PPC) – Exclude compiler, GC from measurements Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Methodology • Want to remove barrier – Compare with and without barrier • Add full trace to generational collector – Remembered objects irrelevant – Can include/exclude barrier • MMTk, Jikes RVM – – – – Hardware performance counters Pseudo-adaptive (realistic, deterministic) Second iteration (avoid compiler overhead) Best of 5 (least disturbed) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Code 1 public final void writeBarrier(ObjectReference src, Address slot, 2 ObjectReference tgt, int mode) 3 throws InlinePragma { 4 // insert write barrier code here 5 slot.store (tgt); 6 } Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Code cont. Java Boundary PPC asm x86 asm 4 if (slot.LT(NURSERY_START) 5 && tgt.GE(NURSERY_START)) 6 remSlots.insert(slot); 1 liu R3,0x6e10 2 cmplW cr1,R30,R3 3 bge 1 54 4 liu R3,0x6e10 5 cmplW cr1,R31,R3 6 bge 1 7c 1 cmp edi 0xa0200000 2 jlge 0 3 cmp ebx 0xa0200000 4 jlge 0 Object 4 if (getHeader(src) 5 .and(LOGGING_MASK) 6 .EQ(UNLOGGED)) 7 rememberObject(src); 1 lwz R4,-8(R5) 2 rlinm R4,R4,0x0,0x1d,0x1d 3 cmpiW cr1,R4,0x4 4 beq 1 78 1 mov ecx -8[edx] 2 and ecx 4 3 cmp ecx 4 4 jeq 0 Card 4 int card=src.rshl(LOG_CARD_SIZE); 5 cardTable.add(card).store((byte) 1); 1 lwz R5,0x1664(JT) 2 rlinm R6,R3,0x16,0xa,0x1f 3 lil R7,0x1 4 stbx R7,R5,R6 1 mov ebx [0x290279a] 2 shr eax 10 3 mov [0+ebx+eax<<0] 1 Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE)) 5 remSlots.insert(slot); 1 xor R3,R30,R31 2 liu R5,0x40 3 cmplW cr1,R3,R5 4 bge 1 74 1 mov edi eax 2 mov eax edi 3 xor eax ebx 4 cmp eax 0x400000 5 jlge 0 (Slot) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Experiments: Hardware • 3 platforms: – 1.9GHz AMD Athlon XP 2600 1GB – 2.6GHz Pentium 4 1GB – 1.6GHz PowerPC 970 768MB • AMD and Intel performance counters – – – – – cycles instructions retired L1/L2 cache misses TLB misses both mutator and collector, separately Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Experiments: Software • MMTk in Jikes RVM version 2.3.2+CVS – ignore remsets GC configuration (now in MMTk) – patched to support performance counters – pseudo-adaptive compilation – read barriers • Debian Linux 2.6.0 kernel + x86 perfctr • Standalone mode Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Overhead mean of SPECjvm98 & SPECjbb 6% Overhead 5% 4% amd 3% p4 ppc 2% 1% 0% Boundary Tuesday, October 26, 2004 Object Hybrid Zone Card International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Code (Again) Java Boundary PPC asm x86 asm 4 if (slot.LT(NURSERY_START) 5 && tgt.GE(NURSERY_START)) 6 remSlots.insert(slot); 1 liu R3,0x6e10 2 cmplW cr1,R30,R3 3 bge 1 54 4 liu R3,0x6e10 5 cmplW cr1,R31,R3 6 bge 1 7c 1 cmp edi 0xa0200000 2 jlge 0 3 cmp ebx 0xa0200000 4 jlge 0 Object 4 if (getHeader(src) 5 .and(LOGGING_MASK) 6 .EQ(UNLOGGED)) 7 rememberObject(src); 1 lwz R4,-8(R5) 2 rlinm R4,R4,0x0,0x1d,0x1d 3 cmpiW cr1,R4,0x4 4 beq 1 78 1 mov ecx -8[edx] 2 and ecx 4 3 cmp ecx 4 4 jeq 0 Card 4 int card=src.rshl(LOG_CARD_SIZE); 5 cardTable.add(card).store((byte) 1); 1 lwz R5,0x1664(JT) 2 rlinm R6,R3,0x16,0xa,0x1f 3 lil R7,0x1 4 stbx R7,R5,R6 1 mov ebx [0x290279a] 2 shr eax 10 3 mov [0+ebx+eax<<0] 1 Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE)) 5 remSlots.insert(slot); 1 xor R3,R30,R31 2 liu R5,0x40 3 cmplW cr1,R3,R5 4 bge 1 74 1 mov edi eax 2 mov eax edi 3 xor eax ebx 4 cmp eax 0x400000 5 jlge 0 (Slot) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Code (Again) Java Boundary PPC asm x86 asm 4 if (slot.LT(NURSERY_START) 5 && tgt.GE(NURSERY_START)) 6 remSlots.insert(slot); 1 liu R3,0x6e10 2 cmplW cr1,R30,R3 3 bge 1 54 4 liu R3,0x6e10 5 cmplW cr1,R31,R3 6 bge 1 7c 1 cmp edi 0xa0200000 2 jlge 0 3 cmp ebx 0xa0200000 4 jlge 0 Object 4 if (getHeader(src) 5 .and(LOGGING_MASK) 6 .EQ(UNLOGGED)) 7 rememberObject(src); 1 lwz R4,-8(R5) 2 rlinm R4,R4,0x0,0x1d,0x1d 3 cmpiW cr1,R4,0x4 4 beq 1 78 1 mov ecx -8[edx] 2 and ecx 4 3 cmp ecx 4 4 jeq 0 Card 4 int card=src.rshl(LOG_CARD_SIZE); 5 cardTable.add(card).store((byte) 1); 1 lwz R5,0x1664(JT) 2 rlinm R6,R3,0x16,0xa,0x1f 3 lil R7,0x1 4 stbx R7,R5,R6 1 mov ebx [0x290279a] 2 shr eax 10 3 mov [0+ebx+eax<<0] 1 Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE)) 5 remSlots.insert(slot); 1 xor R3,R30,R31 2 liu R5,0x40 3 cmplW cr1,R3,R5 4 bge 1 74 1 mov edi eax 2 mov eax edi 3 xor eax ebx 4 cmp eax 0x400000 5 jlge 0 (Slot) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Write Barrier Code (Again) Java Boundary PPC asm x86 asm 4 if (slot.LT(NURSERY_START) 5 && tgt.GE(NURSERY_START)) 6 remSlots.insert(slot); 1 liu R3,0x6e10 2 cmplW cr1,R30,R3 3 bge 1 54 4 liu R3,0x6e10 5 cmplW cr1,R31,R3 6 bge 1 7c 1 cmp edi 0xa0200000 2 jlge 0 3 cmp ebx 0xa0200000 4 jlge 0 Object 4 if (getHeader(src) 5 .and(LOGGING_MASK) 6 .EQ(UNLOGGED)) 7 rememberObject(src); 1 lwz R4,-8(R5) 2 rlinm R4,R4,0x0,0x1d,0x1d 3 cmpiW cr1,R4,0x4 4 beq 1 78 1 mov ecx -8[edx] 2 and ecx 4 3 cmp ecx 4 4 jeq 0 Card 4 int card=src.rshl(LOG_CARD_SIZE); 5 cardTable.add(card).store((byte) 1); 1 lwz R5,0x1664(JT) 2 rlinm R6,R3,0x16,0xa,0x1f 3 lil R7,0x1 4 stbx R7,R5,R6 1 mov ebx [0x290279a] 2 shr eax 10 3 mov [0+ebx+eax<<0] 1 Zone 4 if (slot.xor(tgt).GE(ZONE_SIZE)) 5 remSlots.insert(slot); 1 xor R3,R30,R31 2 liu R5,0x40 3 cmplW cr1,R3,R5 4 bge 1 74 1 mov edi eax 2 mov eax edi 3 xor eax ebx 4 cmp eax 0x400000 5 jlge 0 (Slot) Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 AMD Athlon 2600+ 1.9GHz Write Barrier 12% 10% Zone 0% Card jb do ps eu m ea k ac _2 28 _j _2 27 _m io ud ga pe m 2_ _2 2 _2 1 _d 09 ay t _r Tuesday, October 26, 2004 _2 s 05 _2 02 _j _2 _2 01 _c om -4% es es pr -2% n 2% b Hybrid tr t 4% 3_ ja va c Object b 6% ra ce Boundary s 8% International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Write Barrier 10% 8% 6% Boundary Object 4% Hybrid 2% Zone 0% n ea jb do ps eu m b k ac _2 28 _j m tr t io _2 27 _ _2 22 _m _j 13 _2 pe ga ud c av a b _d 09 ay tr _r 05 Tuesday, October 26, 2004 _2 ac e es _2 02 _j _2 _2 01 _c -4% s s es om pr -2% Card International Symposium on Memory Management Vancouver BC, October 2004 G5 PowerPC 970 1.6GHz Write Barrier 14% 12% Boundary 10% 8% Object 6% Hybrid 4% Zone 2% Card n ea m bb ps e ud oj k t _2 28 _j ac _2 27 _m tr io ga pe av 22 _m _j _2 13 _2 ud ac b _d 09 ra yt ra 5_ _2 0 Tuesday, October 26, 2004 _2 ce ss _j e 02 _2 _2 01 _c om pr -2% es s 0% International Symposium on Memory Management Vancouver BC, October 2004 Performance Counters Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Write Barrier Retired Instructions 12% 10% Boundary 8% Object 6% Hybrid 4% Zone 2% Card n ea m bb oj ud ps e _2 28 _j ac k tr t _2 27 _m io ud _2 22 _m pe _j 13 _2 ga av ac b _d ac 05 _r ay tr _j 02 _2 Tuesday, October 26, 2004 _2 09 e s es es _2 _2 01 _c om pr -2% s 0% International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Write Barrier L1 Misses 20% 10% 0% ps eu do n ea ac _j 28 _2 m jb b k t _2 27 _m tr o ud i pe _m 22 _2 ga av ac _j _d 09 _2 tr ay _r 05 b e ac es _j 02 _2 13 _2 -30% _2 _2 01 _c -20% s s es om pr -10% Boundary Object Hybrid Zone Card -40% -50% Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Write Barrier L2 Misses 140% 120% 100% 20% Zone 0% Card ea m ps eu do k _2 28 _j ac _2 27 _m tr o ud i _2 22 _m pe _j 13 _2 ga b _d 09 ay t _r 05 Tuesday, October 26, 2004 _2 s es _j _2 _2 01 -60% 02 _c o m -40% _2 pr e -20% n Hybrid jb b 40% t Object av ac 60% ra ce Boundary ss 80% International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Write Barrier DTLB Misses 25% 20% 15% Boundary 10% Object n ea ps eu do ac _j 28 _2 m jb b k t tr _2 27 _m ud Card _2 22 _m pe _j 13 ga b _d _2 ay t _r Tuesday, October 26, 2004 _2 09 s 05 02 _j _2 _2 -15% _2 01 _c -10% es es om pr -5% io Zone av ac 0% ra ce Hybrid s 5% International Symposium on Memory Management Vancouver BC, October 2004 Read Barrier Code 1 public final ObjectReference readBarrier(ObjectReference obj, 2 Address slot, int mode) 3 throws InlinePragma { 4 ObjectReference value = slot.loadObjectReference(); 5 return value; // insert read barrier code here 6 } Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Read Barrier Code cont. Java PPC asm x86 asm Unconditional 5 return value.and(~3); 1 rlinm R3,R3,0x0,0x0,0x1d 1 and cax -4 Conditional 5 if (value.and(1).NE(1)) 6 return value; 7 else 8 return 0; 1 rlinm R4,R3,0x0,0x1f,0x1f 2 cmpiW cr1,R4,0x1 3 bne 1 3c 1 mov edx eax 2 and edx 1 3 cmp edx 1 4 mov edx 0 5 cmovne edx eax 6 mov eax edx Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004 Read Barrier Overhead mean of SPECjvm98 & SPECjbb 25% Overhead 20% 15% amd p4 10% ppc 5% 0% Unconditional Tuesday, October 26, 2004 Conditonal International Symposium on Memory Management Vancouver BC, October 2004 AMD Athlon 2600+ 1.9GHz Read Barrier 40% 35% 30% 25% 20% 15% 10% 5% 0% Unconditional n m ea jb b ud o ps e _2 2 8_ j ac k tr t 7_ m o _2 2 _m pe g au di va c _2 22 13 _j a _2 _d b 09 tr ac ra y 5_ Tuesday, October 26, 2004 _2 e s es _2 02 _j _2 0 _2 0 1_ c om pr es s Conditional International Symposium on Memory Management Vancouver BC, October 2004 Intel P4 2.6GHz Read Barrier 35% 30% 25% 20% Unconditional 15% Conditional 10% 5% Tuesday, October 26, 2004 n ea m o _2 27 _m tr t _2 28 _j ac k ps eu do jb b pe ga ud i c _2 22 _m 3_ ja va _d b 09 _2 _2 1 05 _r a yt ra ce s je s 2_ _2 0 _2 _2 0 1_ co m pr es s 0% International Symposium on Memory Management Vancouver BC, October 2004 G5 PowerPC 970 1.6GHz Read Barrier 16% 14% 12% 10% 8% Unconditional 6% 4% Conditional Tuesday, October 26, 2004 ea n m b jb do ps eu 8_ j ac k t _2 2 7_ m tr io _2 2 _2 22 _m _j 13 _2 pe ga ud av ac b _d 09 _2 tr ac e 05 _r ay es s _2 02 _j _2 _2 01 _c om pr -2% -4% es s 2% 0% International Symposium on Memory Management Vancouver BC, October 2004 Conclusions • New methodology: available in MMTk – Specific barrier patches at: http://cs.anu.edu.au/~Steve.Blackburn/pubs/wb-ismm-2004.tgz • Barrier costs (often) surprisingly low • Barrier costs very architecturally sensitive – GC developers: think about your target arch. – GC papers: what architecture did they use? – Architects: choices impact OO languages in surprising ways. Tuesday, October 26, 2004 International Symposium on Memory Management Vancouver BC, October 2004