Barriers: Friend or Foe?

Commentaires

Transcription

Barriers: Friend or Foe?
Barriers: Friend or Foe?
Steve Blackburn
Tony Hosking
Department of Computer Science
Department of Computer Sciences
Australian National University
Purdue University
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Read & Write Barrier Costs
Are r/w barrier costs significant?
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Read and Write Barriers
• Algorithmically powerful mechanisms
– Extend semantics of each read/write
• Particularly useful to GC
• Untested assumption:
“read/write barriers are expensive”
– Curtails creativity in GC algorithm development
– Encourages (unnecessary?) work on avoidance
• Prior work
– [Zorn 1990] (used simulation & traces)
– [Blackburn & McKinley 2002] (compilation & inlining)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Our Contributions
• Methodology for measurement
• Evaluate mutator overhead
– 5 common w/b, 2 r/b
– 9 benchmarks
– 3 architectures (AMD, P4, PPC)
– Exclude compiler, GC from measurements
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Methodology
• Want to remove barrier
– Compare with and without barrier
• Add full trace to generational collector
– Remembered objects irrelevant
– Can include/exclude barrier
• MMTk, Jikes RVM
–
–
–
–
Hardware performance counters
Pseudo-adaptive (realistic, deterministic)
Second iteration (avoid compiler overhead)
Best of 5 (least disturbed)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code
1 public final void writeBarrier(ObjectReference src, Address slot,
2
ObjectReference tgt, int mode)
3 throws InlinePragma {
4
// insert write barrier code here
5
slot.store (tgt);
6 }
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code cont.
Java
Boundary
PPC asm
x86 asm
4 if (slot.LT(NURSERY_START)
5 && tgt.GE(NURSERY_START))
6 remSlots.insert(slot);
1 liu R3,0x6e10
2 cmplW cr1,R30,R3
3 bge 1 54
4 liu R3,0x6e10
5 cmplW cr1,R31,R3
6 bge 1 7c
1 cmp edi 0xa0200000
2 jlge 0
3 cmp ebx 0xa0200000
4 jlge 0
Object
4 if (getHeader(src)
5
.and(LOGGING_MASK)
6
.EQ(UNLOGGED))
7 rememberObject(src);
1 lwz R4,-8(R5)
2 rlinm R4,R4,0x0,0x1d,0x1d
3 cmpiW cr1,R4,0x4
4 beq 1 78
1 mov ecx -8[edx]
2 and ecx 4
3 cmp ecx 4
4 jeq 0
Card
4 int card=src.rshl(LOG_CARD_SIZE);
5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)
2 rlinm R6,R3,0x16,0xa,0x1f
3 lil R7,0x1
4 stbx R7,R5,R6
1 mov ebx [0x290279a]
2 shr eax 10
3 mov [0+ebx+eax<<0] 1
Zone
4 if (slot.xor(tgt).GE(ZONE_SIZE))
5 remSlots.insert(slot);
1 xor R3,R30,R31
2 liu R5,0x40
3 cmplW cr1,R3,R5
4 bge 1 74
1 mov edi eax
2 mov eax edi
3 xor eax ebx
4 cmp eax 0x400000
5 jlge 0
(Slot)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Experiments: Hardware
• 3 platforms:
– 1.9GHz AMD Athlon XP 2600 1GB
– 2.6GHz Pentium 4 1GB
– 1.6GHz PowerPC 970 768MB
• AMD and Intel performance counters
–
–
–
–
–
cycles
instructions retired
L1/L2 cache misses
TLB misses
both mutator and collector, separately
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Experiments: Software
• MMTk in Jikes RVM version 2.3.2+CVS
– ignore remsets GC configuration (now in MMTk)
– patched to support performance counters
– pseudo-adaptive compilation
– read barriers
• Debian Linux 2.6.0 kernel + x86 perfctr
• Standalone mode
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Overhead
mean of SPECjvm98 & SPECjbb
6%
Overhead
5%
4%
amd
3%
p4
ppc
2%
1%
0%
Boundary
Tuesday, October 26, 2004
Object
Hybrid
Zone
Card
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java
Boundary
PPC asm
x86 asm
4 if (slot.LT(NURSERY_START)
5 && tgt.GE(NURSERY_START))
6 remSlots.insert(slot);
1 liu R3,0x6e10
2 cmplW cr1,R30,R3
3 bge 1 54
4 liu R3,0x6e10
5 cmplW cr1,R31,R3
6 bge 1 7c
1 cmp edi 0xa0200000
2 jlge 0
3 cmp ebx 0xa0200000
4 jlge 0
Object
4 if (getHeader(src)
5
.and(LOGGING_MASK)
6
.EQ(UNLOGGED))
7 rememberObject(src);
1 lwz R4,-8(R5)
2 rlinm R4,R4,0x0,0x1d,0x1d
3 cmpiW cr1,R4,0x4
4 beq 1 78
1 mov ecx -8[edx]
2 and ecx 4
3 cmp ecx 4
4 jeq 0
Card
4 int card=src.rshl(LOG_CARD_SIZE);
5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)
2 rlinm R6,R3,0x16,0xa,0x1f
3 lil R7,0x1
4 stbx R7,R5,R6
1 mov ebx [0x290279a]
2 shr eax 10
3 mov [0+ebx+eax<<0] 1
Zone
4 if (slot.xor(tgt).GE(ZONE_SIZE))
5 remSlots.insert(slot);
1 xor R3,R30,R31
2 liu R5,0x40
3 cmplW cr1,R3,R5
4 bge 1 74
1 mov edi eax
2 mov eax edi
3 xor eax ebx
4 cmp eax 0x400000
5 jlge 0
(Slot)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java
Boundary
PPC asm
x86 asm
4 if (slot.LT(NURSERY_START)
5 && tgt.GE(NURSERY_START))
6 remSlots.insert(slot);
1 liu R3,0x6e10
2 cmplW cr1,R30,R3
3 bge 1 54
4 liu R3,0x6e10
5 cmplW cr1,R31,R3
6 bge 1 7c
1 cmp edi 0xa0200000
2 jlge 0
3 cmp ebx 0xa0200000
4 jlge 0
Object
4 if (getHeader(src)
5
.and(LOGGING_MASK)
6
.EQ(UNLOGGED))
7 rememberObject(src);
1 lwz R4,-8(R5)
2 rlinm R4,R4,0x0,0x1d,0x1d
3 cmpiW cr1,R4,0x4
4 beq 1 78
1 mov ecx -8[edx]
2 and ecx 4
3 cmp ecx 4
4 jeq 0
Card
4 int card=src.rshl(LOG_CARD_SIZE);
5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)
2 rlinm R6,R3,0x16,0xa,0x1f
3 lil R7,0x1
4 stbx R7,R5,R6
1 mov ebx [0x290279a]
2 shr eax 10
3 mov [0+ebx+eax<<0] 1
Zone
4 if (slot.xor(tgt).GE(ZONE_SIZE))
5 remSlots.insert(slot);
1 xor R3,R30,R31
2 liu R5,0x40
3 cmplW cr1,R3,R5
4 bge 1 74
1 mov edi eax
2 mov eax edi
3 xor eax ebx
4 cmp eax 0x400000
5 jlge 0
(Slot)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Write Barrier Code (Again)
Java
Boundary
PPC asm
x86 asm
4 if (slot.LT(NURSERY_START)
5 && tgt.GE(NURSERY_START))
6 remSlots.insert(slot);
1 liu R3,0x6e10
2 cmplW cr1,R30,R3
3 bge 1 54
4 liu R3,0x6e10
5 cmplW cr1,R31,R3
6 bge 1 7c
1 cmp edi 0xa0200000
2 jlge 0
3 cmp ebx 0xa0200000
4 jlge 0
Object
4 if (getHeader(src)
5
.and(LOGGING_MASK)
6
.EQ(UNLOGGED))
7 rememberObject(src);
1 lwz R4,-8(R5)
2 rlinm R4,R4,0x0,0x1d,0x1d
3 cmpiW cr1,R4,0x4
4 beq 1 78
1 mov ecx -8[edx]
2 and ecx 4
3 cmp ecx 4
4 jeq 0
Card
4 int card=src.rshl(LOG_CARD_SIZE);
5 cardTable.add(card).store((byte) 1);
1 lwz R5,0x1664(JT)
2 rlinm R6,R3,0x16,0xa,0x1f
3 lil R7,0x1
4 stbx R7,R5,R6
1 mov ebx [0x290279a]
2 shr eax 10
3 mov [0+ebx+eax<<0] 1
Zone
4 if (slot.xor(tgt).GE(ZONE_SIZE))
5 remSlots.insert(slot);
1 xor R3,R30,R31
2 liu R5,0x40
3 cmplW cr1,R3,R5
4 bge 1 74
1 mov edi eax
2 mov eax edi
3 xor eax ebx
4 cmp eax 0x400000
5 jlge 0
(Slot)
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
AMD Athlon 2600+ 1.9GHz
Write Barrier
12%
10%
Zone
0%
Card
jb
do
ps
eu
m
ea
k
ac
_2
28
_j
_2
27
_m
io
ud
ga
pe
m
2_
_2
2
_2
1
_d
09
ay
t
_r
Tuesday, October 26, 2004
_2
s
05
_2
02
_j
_2
_2
01
_c
om
-4%
es
es
pr
-2%
n
2%
b
Hybrid
tr
t
4%
3_
ja
va
c
Object
b
6%
ra
ce
Boundary
s
8%
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Write Barrier
10%
8%
6%
Boundary
Object
4%
Hybrid
2%
Zone
0%
n
ea
jb
do
ps
eu
m
b
k
ac
_2
28
_j
m
tr
t
io
_2
27
_
_2
22
_m
_j
13
_2
pe
ga
ud
c
av
a
b
_d
09
ay
tr
_r
05
Tuesday, October 26, 2004
_2
ac
e
es
_2
02
_j
_2
_2
01
_c
-4%
s
s
es
om
pr
-2%
Card
International Symposium on Memory Management
Vancouver BC, October 2004
G5 PowerPC 970 1.6GHz
Write Barrier
14%
12%
Boundary
10%
8%
Object
6%
Hybrid
4%
Zone
2%
Card
n
ea
m
bb
ps
e
ud
oj
k
t
_2
28
_j
ac
_2
27
_m
tr
io
ga
pe
av
22
_m
_j
_2
13
_2
ud
ac
b
_d
09
ra
yt
ra
5_
_2
0
Tuesday, October 26, 2004
_2
ce
ss
_j
e
02
_2
_2
01
_c
om
pr
-2%
es
s
0%
International Symposium on Memory Management
Vancouver BC, October 2004
Performance Counters
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Write Barrier Retired Instructions
12%
10%
Boundary
8%
Object
6%
Hybrid
4%
Zone
2%
Card
n
ea
m
bb
oj
ud
ps
e
_2
28
_j
ac
k
tr
t
_2
27
_m
io
ud
_2
22
_m
pe
_j
13
_2
ga
av
ac
b
_d
ac
05
_r
ay
tr
_j
02
_2
Tuesday, October 26, 2004
_2
09
e
s
es
es
_2
_2
01
_c
om
pr
-2%
s
0%
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Write Barrier L1 Misses
20%
10%
0%
ps
eu
do
n
ea
ac
_j
28
_2
m
jb
b
k
t
_2
27
_m
tr
o
ud
i
pe
_m
22
_2
ga
av
ac
_j
_d
09
_2
tr
ay
_r
05
b
e
ac
es
_j
02
_2
13
_2
-30%
_2
_2
01
_c
-20%
s
s
es
om
pr
-10%
Boundary
Object
Hybrid
Zone
Card
-40%
-50%
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Write Barrier L2 Misses
140%
120%
100%
20%
Zone
0%
Card
ea
m
ps
eu
do
k
_2
28
_j
ac
_2
27
_m
tr
o
ud
i
_2
22
_m
pe
_j
13
_2
ga
b
_d
09
ay
t
_r
05
Tuesday, October 26, 2004
_2
s
es
_j
_2
_2
01
-60%
02
_c
o
m
-40%
_2
pr
e
-20%
n
Hybrid
jb
b
40%
t
Object
av
ac
60%
ra
ce
Boundary
ss
80%
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Write Barrier DTLB Misses
25%
20%
15%
Boundary
10%
Object
n
ea
ps
eu
do
ac
_j
28
_2
m
jb
b
k
t
tr
_2
27
_m
ud
Card
_2
22
_m
pe
_j
13
ga
b
_d
_2
ay
t
_r
Tuesday, October 26, 2004
_2
09
s
05
02
_j
_2
_2
-15%
_2
01
_c
-10%
es
es
om
pr
-5%
io
Zone
av
ac
0%
ra
ce
Hybrid
s
5%
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Code
1 public final ObjectReference readBarrier(ObjectReference obj,
2
Address slot, int mode)
3 throws InlinePragma {
4
ObjectReference value = slot.loadObjectReference();
5
return value; // insert read barrier code here
6 }
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Code cont.
Java
PPC asm
x86 asm
Unconditional
5 return value.and(~3);
1 rlinm R3,R3,0x0,0x0,0x1d
1 and cax -4
Conditional
5 if (value.and(1).NE(1))
6 return value;
7 else
8 return 0;
1 rlinm R4,R3,0x0,0x1f,0x1f
2 cmpiW cr1,R4,0x1
3 bne 1 3c
1 mov edx eax
2 and edx 1
3 cmp edx 1
4 mov edx 0
5 cmovne edx eax
6 mov eax edx
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004
Read Barrier Overhead
mean of SPECjvm98 & SPECjbb
25%
Overhead
20%
15%
amd
p4
10%
ppc
5%
0%
Unconditional
Tuesday, October 26, 2004
Conditonal
International Symposium on Memory Management
Vancouver BC, October 2004
AMD Athlon 2600+ 1.9GHz
Read Barrier
40%
35%
30%
25%
20%
15%
10%
5%
0%
Unconditional
n
m
ea
jb
b
ud
o
ps
e
_2
2
8_
j
ac
k
tr
t
7_
m
o
_2
2
_m
pe
g
au
di
va
c
_2
22
13
_j
a
_2
_d
b
09
tr
ac
ra
y
5_
Tuesday, October 26, 2004
_2
e
s
es
_2
02
_j
_2
0
_2
0
1_
c
om
pr
es
s
Conditional
International Symposium on Memory Management
Vancouver BC, October 2004
Intel P4 2.6GHz
Read Barrier
35%
30%
25%
20%
Unconditional
15%
Conditional
10%
5%
Tuesday, October 26, 2004
n
ea
m
o
_2
27
_m
tr
t
_2
28
_j
ac
k
ps
eu
do
jb
b
pe
ga
ud
i
c
_2
22
_m
3_
ja
va
_d
b
09
_2
_2
1
05
_r
a
yt
ra
ce
s
je
s
2_
_2
0
_2
_2
0
1_
co
m
pr
es
s
0%
International Symposium on Memory Management
Vancouver BC, October 2004
G5 PowerPC 970 1.6GHz
Read Barrier
16%
14%
12%
10%
8%
Unconditional
6%
4%
Conditional
Tuesday, October 26, 2004
ea
n
m
b
jb
do
ps
eu
8_
j
ac
k
t
_2
2
7_
m
tr
io
_2
2
_2
22
_m
_j
13
_2
pe
ga
ud
av
ac
b
_d
09
_2
tr
ac
e
05
_r
ay
es
s
_2
02
_j
_2
_2
01
_c
om
pr
-2%
-4%
es
s
2%
0%
International Symposium on Memory Management
Vancouver BC, October 2004
Conclusions
• New methodology: available in MMTk
– Specific barrier patches at:
http://cs.anu.edu.au/~Steve.Blackburn/pubs/wb-ismm-2004.tgz
• Barrier costs (often) surprisingly low
• Barrier costs very architecturally sensitive
– GC developers: think about your target arch.
– GC papers: what architecture did they use?
– Architects: choices impact OO languages in
surprising ways.
Tuesday, October 26, 2004
International Symposium on Memory Management
Vancouver BC, October 2004

Documents pareils