Intermediate Representaions - Concepts of Programming

Transcription

Intermediate Representaions - Concepts of Programming
Intermediate Representaions
Concepts of Programming Languages (CoPL)
Malte Skambath
[email protected]
November 16, 2015
Intermediate
Representaions
Overview
Malte Skambath
We need Compilers!
Classical Compiler Process
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Machine Models
Stack Machines
Register Machines
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
Implementations
LLVM
CIL
Conclusion
2 / 34
Developing Software
Intermediate
Representaions
We need compilers!
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
?
CIL
Conclusion
x86
3 / 34
Developing Software
Intermediate
Representaions
We need compilers!
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
?
x86
CIL
Conclusion
AMD64
ARM
3 / 34
Developing Software
Intermediate
Representaions
We need compilers!
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
?
x86
CIL
Conclusion
AMD64
ARM
3 / 34
Intermediate Representation
Intermediate
Representaions
The solution!
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Intermediate Representation
x86
AMD64
Conclusion
ARM
4 / 34
Intermediate
Representaions
Intermediate Representation
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Definition
An intermediate representation (IR) is data structure as
representation of a program between a high-level
programming language and machine code.
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
An intermediate language (IL) is a low-level assembly
language as IR for a virtual machine.
5 / 34
Intermediate
Representaions
Classical Compiler Process
Malte Skambath
We need Compilers!
Classical Compiler
Process
Lexical Analysis (Scanner)
Machine Models
Stack Machines
Tokens
Register Machines
Three-Address Code
Syntax Analysis (Parser)
Frontend
ST/AST
Static-Single-Assignment
Implementations
LLVM
CIL
Semantic Analysis
Conclusion
CFG
Optimization
CFG
Code Generation
Backend
6 / 34
Intermediate
Representaions
Abstract Sytax Tree
Malte Skambath
An abstract syntax tree (AST) . . .
. . . describes the syntactical structure of a program
. . . depends on the programming language
. . . is generated during by the parser
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
program
Three-Address Code
Static-Single-Assignment
Implementations
block
return
while
LLVM
CIL
Conclusion
...
condition
body
...
assign
variable sum
variable sum
bin op: *
variable i
7 / 34
Intermediate
Representaions
Control-Flow-Graph
Malte Skambath
We need Compilers!
Classical Compiler
Process
int s = 1;
for(int i=1; i<=10; i++)
s += i;
return (s);
Machine Models
Stack Machines
i ← 1, s ← 0
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
i ≤ 10
no
yes
LLVM
CIL
Conclusion
ret(s)
i←i+1
s←s+i
8 / 34
Intermediate
Representaions
Stack Machines
Malte Skambath
Definition
We need Compilers!
A general Stack Machine has
Classical Compiler
Process
I
a stack as storage
I
a set of instructions / operations op = F(a1 , a2 , . . . , an )
including (push and pop)
Machine Models
Executing an operation takes the arguments from top of
the stack, computes the result in the accumulator, and
pushes the result back the stack.
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
Example
push 1
push 2
push 3
add
pop
3
1
2
1
2
1
5
1
1
9 / 34
Stack-machines
Intermediate
Representaions
Code Generation
Malte Skambath
We need Compilers!
We can generate the control by traversing the
psyntax tree.
Assume we have to compute the expression x 2 + y2 .
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
10 / 34
Stack-machines
Intermediate
Representaions
Code Generation
Malte Skambath
We need Compilers!
We can generate the control by traversing the
psyntax tree.
Assume we have to compute the expression x 2 + y2 .
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
AST
sqrt
Static-Single-Assignment
push
push
mul
push
push
mul
add
sqrt
add
mul
x
mul
x
y
x
x
Implementations
LLVM
CIL
y
y
Conclusion
y
10 / 34
Stack Machines
Intermediate
Representaions
Summary
Malte Skambath
We need Compilers!
Classical Compiler
Process
I
I
Programs for stack machines are short
Only the opcodes ( or constants) in the byte code.
In practical use stack machines can be extended
1. An external memory to store and load values
(computations are still limited to the stack)
2. Top-Level registers
3. Metainformations (see CIL later)
I
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
Problem: Most processor-architectures use registers.
⇒ Hybrid Models, Special informations in the
intermediate representation.
11 / 34
Intermediate
Representaions
Register Machines
Malte Skambath
We need Compilers!
Definition
A register machine . . .
I
consists of an infinite number of memory cells named
registers
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
I
each register is accessible
I
has a limited set of instruction / operations:
LLVM
CIL
Conclusion
1. Arithmetical Operations: Computes a function F using
selected registers ho1 , i . . . , hon i as operands and stores
the result in a target register hri
2. Jumps/Branches
12 / 34
Intermediate
Representaions
Three-Address Code (3AC/TAC)
I
I
Each TAC is a sequence of instructions I1 , I2 , . . . , In for a
register machine.
Instructions can be
1. Assignments r1 := r0
2. Unconditional Jumps (Instructions can be labeled)
L0: goto L1
...
L1: r0 := 1
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
3. Conditional Branches
if a<b then goto L1
CIL
Conclusion
4. Arithmetical operations r3 := add(r1,r2)
I
Each instruction contains at most 3 registers
13 / 34
Intermediate
Representaions
Three-Address Code (3AC/TAC)
I
I
Each TAC is a sequence of instructions I1 , I2 , . . . , In for a
register machine.
Instructions can be
1. Assignments r1 := r0
2. Unconditional Jumps (Instructions can be labeled)
L0: goto L1
...
L1: r0 := 1
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
3. Conditional Branches
if a<b then goto L1
CIL
Conclusion
4. Arithmetical operations r3 := add(r1,r2)
I
Each instruction contains at most 3 registers
Example (
p
x2 + y2 )
t1 := x * x
t2 := y * y
t3 := t1 + t2
result := sqrt(t3)
13 / 34
Three-Address Code (3AC/TAC)
Intermediate
Representaions
How to design the Byte-Code
Malte Skambath
For practical use we should store TAC in byte code format.
I
Each operation has an opcode for the virtual machine
I
Each instruction can be represented by tuples
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
t1
t2
t1
res
Quadruples
opcode op1
MUL
x
MUL
y
ADD
t1
SQRT
t1
op2
x
y
t2
-
Triples
opcode op1
MUL
x
MUL
y
ADD
(1)
SQRT
(3)
Static-Single-Assignment
op2
x
y
(2)
-
Implementations
LLVM
CIL
Conclusion
Note
Registers can be assigned implicitly (Triples). But then each
register has to be assigned only once.
14 / 34
Intermediate
Representaions
Static-Single-Assignment
Malte Skambath
We need Compilers!
Classical Compiler
Process
Definition (Static-Single Assignment)
A Three-Adress Code is in Static-Single Assignment-from if
each register gets assigned once in the code.
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
p
Example ( x 2 + y2 )
Implementations
LLVM
CIL
Not in SSA
L1:
L2:
L3:
L4:
x
y
x
z
:=
:=
:=
:=
SSA
x * x
y * y
x + y
sqrt(x)
L1:
L2:
L3:
L4:
Conclusion
x0
y0
x1
z
:=
:=
:=
:=
x * x
y * y
x0 + y0
sqrt(x1)
15 / 34
Static-Single-Assigment
Intermediate
Representaions
How to get SSA-form?
Malte Skambath
We need Compilers!
A simple Algorithm
I
For each used register: hRi
1. Check if hRi gets assigned more than once
2. For each assignment/definition of hRi:
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
I
Rename on the left side to hR.ii if this assignment is the
i-th
assignment to hRi
LLVM
CIL
Conclusion
3. For each use of hRi:
I
Replace hRi with hR.ji where hR.ji was the previous
replacement for hRi.
Is this algorithm correct?
16 / 34
Static-Single-Assigment
Intermediate
Representaions
How to get SSA-form?
Malte Skambath
We need Compilers!
A simple Algorithm
I
For each used register: hRi
1. Check if hRi gets assigned more than once
2. For each assignment/definition of hRi:
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
I
Rename on the left side to hR.ii if this assignment is the
i-th
assignment to hRi
LLVM
CIL
Conclusion
3. For each use of hRi:
I
Replace hRi with hR.ji where hR.ji was the previous
replacement for hRi.
Is this algorithm correct?
No!
16 / 34
Static-Single-Assignment
Intermediate
Representaions
What if we have branches?
Malte Skambath
We need Compilers!
Classical Compiler
Process
if a>b then goto L_A
max := b;
goto L_END
L_A:
max := a;
goto L_END
L_END:
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
a>b?
max:=a
max:=b
a>b?
m1:=a
m2:=b
max:=m?
17 / 34
Static-Single-Assignment
Intermediate
Representaions
The Φ-function
Malte Skambath
We need Compilers!
The Φ-function computes the value depending on the
incoming branch.
Classical Compiler
Process
Machine Models
Stack Machines
a←1
b←2
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
x ← Φ(a, b)
CIL
Conclusion
Note
There is no real operation like Φ in real machines. After
optimization Φ-statements have to be removed.
18 / 34
Static-Single-Assignment
Intermediate
Representaions
The Φ-function
Malte Skambath
We need Compilers!
The Φ-function computes the value depending on the
incoming branch.
Classical Compiler
Process
Machine Models
Stack Machines
a←1
b←2
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
x ← Φ(a, b)
CIL
Conclusion
x has value 1
Note
There is no real operation like Φ in real machines. After
optimization Φ-statements have to be removed.
18 / 34
Static-Single-Assignment
Intermediate
Representaions
The Φ-function
Malte Skambath
We need Compilers!
The Φ-function computes the value depending on the
incoming branch.
Classical Compiler
Process
Machine Models
Stack Machines
a←1
b←2
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
x ← Φ(a, b)
CIL
Conclusion
x has value 2
Note
There is no real operation like Φ in real machines. After
optimization Φ-statements have to be removed.
18 / 34
Intermediate
Representaions
Getting Code in SSA-form.
Malte Skambath
We need Compilers!
L1:
if r_a < r_b then goto L3:
L2:
t_1 := r_a
goto L4
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
L3:
t_2 := r_b
goto L4
LLVM
CIL
Conclusion
L4: max := phi t_1 [from L2], t_2 [from L3]
19 / 34
Intermediate
Representaions
Converting to SSA-Form
Malte Skambath
We need Compilers!
1. Place Φ-function terms
Classical Compiler
Process
2. Rename registers to achieve SSA-form
Machine Models
Stack Machines
Register Machines
z ← ...
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
a ← ...
b ← ...
CIL
Conclusion
z ← Φ(z, z)
Using the Φ-function after each branch for previous
registers is an unpractical solution.
20 / 34
Intermediate
Representaions
Dominance Frontiers
Malte Skambath
Definition
We need Compilers!
We say x dominates y (x dom y) if on all paths to Y in the CFG
the program has to run over X.
Classical Compiler
Process
Machine Models
Stack Machines
Definition
Register Machines
y is in the dominance frontier of x (DF(x)) iff not x dom y and
y has a direct predecessor on all paths to y
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
1
Conclusion
1
5
2
2
3
5
4
7
8
9
7
6
3
4
8
6
DOM(5) = {5, 6, 7, 8}
DF(5) = {4, 9}
9
21 / 34
Intermediate
Representaions
Dominance Frontiers
Malte Skambath
We need Compilers!
Assume that node 3 defines variable x, DF(3) = {5}
x ∈ Def(3)
1
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
2
3
Implementations
LLVM
CIL
5
4
Conclusion
6
Is 5 the only node we need to insert a Φ-function for x?
22 / 34
Intermediate
Representaions
Dominance Frontiers
Malte Skambath
We need Compilers!
Assume that node 3 defines variable x, DF(3) = {5}
x ∈ Def(3)
1
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
2
3
Implementations
LLVM
CIL
5
4
Conclusion
6
Is 5 the only node we need to insert a Φ-function for x?
No, at node 6. Why?
22 / 34
Intermediate
Representaions
Architecture of the LLVM-compiler process
Malte Skambath
We need Compilers!
C
x86
Backend
clang
x86
Classical Compiler
Process
Machine Models
Stack Machines
Fortran
llvmgcc
LLVM
Optimizer
ARM
Backend
Register Machines
ARM
Three-Address Code
Static-Single-Assignment
Implementations
Haskell
GHC
PowerPC
Backend
LLVM
PPC
CIL
Conclusion
LLVM uses a special intermediate representation (LLVM-IR)
for a virtual register machine.
23 / 34
2 The LLVM Approach
LLVM Compilation Strategy
The centerpiece of the system is the LLVM virtual instruction set. It combines a low-level representation with hig
level information about operands, by using Static Single Assignment (SSA) form and high-level, language independ
type information (more detail is provided in Section 3). The RISC like, low-level instructions of LLVM allow for ma
traditional optimizations to be directly applied to LLVM code, while the high-level type information allows for no
high-level transformations as well. This common representation enables the powerful multi-phase compilation strate
shown in Figure 1.
Offline Reoptimizer
Libraries
LLVM
Static Compiler 1
Runtime Optimizer
LLVM
LLVM
.
.
Static Compiler N
Native
.o files
.exe
Optimizing Linker
.exe
(llvm +
native)
Profile
& Trace
Info
Profile
& Trace
Info
Optimized
Code
Host Machine
LLVM
C. Lattner, The LLVM Instruction Set and Compilation Strategy, 2002
Figure 1: LLVM Compilation Strategy Overview
As Figure 1 illustrates, the LLVM compilation strategy exactly matches the standard compile-link-execute mo
of program development, with the addition of a runtime and offline optimizer. Unlike a traditional compiler, howev
the .o files generated by an LLVM static compiler do not contain any machine code at all – they contain LLVM co
in a compressed format.
The LLVM optimizing linker combines LLVM object files, applies interprocedural optimizations, generates nat
code, and links in libraries provided in native code form. An interesting feature of the executables produced by
2
Intermediate
Representaions
LLVM-IR
@.str = private unnamed_addr constant [11 x i8] c"
%d <= %d \00", align 1
; Function Attrs: nounwind uwtable
define void @minmax(i32 %a, i32 %b) #0 {
%1 = icmp sgt i32 %a, %b
br i1 %1, label %2, label %3
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
; <label>:2
br label %4
; <label>:3
br label %4
; preds = %0
Implementations
LLVM
CIL
; preds = %0
Conclusion
; <label>:4
; preds = %3, %2
%max.0 = phi i32 [ %a, %2 ], [ %b, %3 ]
%min.0 = phi i32 [ %b, %2 ], [ %a, %3 ]
%5 = call i32 (i8*, ...) @printf(i8*
getelementptr inbounds ([11 x i8], [11 x i8]*
@.str, i32 0, i32 0), i32 %min.0, i32 %max.0)
ret void
}
25 / 34
Intermediate
Representaions
LLVM-IR
I
I
I
LLVM is register-based. Registers are written as
%<registername> (e.g. %R1 = ...)
@ is used for global variables (e.g. function names)
LLVM use types i1, i8, i32 for boolean, Byte and
32-Bit Integer values
Reduced instruction-set
I
I
I
Memory Access %ptr = alloca i32
Comparing %res = icmp <opt> <type> %a, %b
Conditional Branches
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
br i1 %cond, label %IfLabel, label %ElseLabel
I
I
I
Function calls %res = call
phi-Instruction for assignments depending on the
control flow
Functions:
define <type> @FctName(<type> %arg1,...){...}
I
Metadata
26 / 34
Intermediate
Representaions
LLVM-IR
@.str = private unnamed_addr constant [11 x i8]
c" %d <= %d \00", align 1
; Function Attrs: nounwind uwtable
define void @minmax(i32 %a, i32 %b) #0 {
%1 = icmp sgt i32 %a, %b
br i1 %1, label %2, label %3
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
; <label>:2
br label %4
; <label>:3
br label %4
; preds = %0
Implementations
LLVM
CIL
; preds = %0
; <label>:4
; preds = %3, %2
%max.0 = phi i32 [ %a, %2 ], [ %b, %3 ]
%min.0 = phi i32 [ %b, %2 ], [ %a, %3 ]
%5 = call i32 (i8*, ...) @printf(i8*
getelementptr inbounds ([11 x i8], [11 x i8]*
@.str, i32 0, i32 0), i32 %min.0, i32 %max.0)
ret void
}
Conclusion
27 / 34
LLVM-IR
Intermediate
Representaions
Another example
Malte Skambath
define i32 @main() #0 {
%c = alloca [10 x i32], align 16
br label %1
1:
%sum.0 = phi i32 [ 0, %0 ], [ %4,
%i.0
= phi i32 [ 1, %0 ], [ %8,
%2 = icmp sle i32 %i.0, 10
br i1 %2, label %3, label %9
3:
;
%4 = add nsw i32 %sum.0, %i.0
%5 = sext i32 %i.0 to i64
%6 = getelementptr inbounds [10 x
i32]* %c, i32 0, i64 %5
store i32 %4, i32* %6, align 4
br label %7
7:
;
%8 = add nsw i32 %i.0, 1
br label %1
9:
;
ret i32 0
}
We need Compilers!
Classical Compiler
Process
Machine Models
%7 ]
%7 ]
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
preds = %1
LLVM
CIL
Conclusion
i32], [10 x
preds = %3
preds = %1
28 / 34
Intermediate
Representaions
Common Intermediate Language
Malte Skambath
C#
VB.net
F#
We need Compilers!
Classical Compiler
Process
Common Intermediate Language (CIL)
Executable-File containing CIL
Machine Models
Stack Machines
Register Machines
Three-Address Code
Common Language Runtime (CLR)
Static-Single-Assignment
Implementations
JIT-C
Libraries
LLVM
CIL
Conclusion
Common Language Infrastructure (CLI)
The CLR is the CLI-Implementation of Microsoft and part of
the .net-Framework. The CLI also speciefies a Type System
(CTS) and a basic set of class libraries.
29 / 34
Intermediate
Representaions
CIL
Malte Skambath
We need Compilers!
I
Stack based virtual machine.
Classical Compiler
Process
I
Each method has a header
Machine Models
I
Typed instruction-set (e. g. ldc.i4.0 load constant 0
as 4-Byte int)
I
Access to local variables ldloc.<index>,
Stack Machines
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
stloc.<index>
I
Conclusion
Object oriented
I
Load field values
I
ldfld string Program/Person::prename
Create new objects newobj instance void class
<CLASS>’.ctor’(...)
30 / 34
CIL
Intermediate
Representaions
An Example
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
.method public static hidebysig
default int32 sum (int32 a, int32 b)
managed {
.maxstack 2
.locals init (int32 V_0, int32 V_1)
IL_0000: ldc.i4.0
//
...
}
Stack Machines
Register Machines
cil
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
31 / 34
CIL
Intermediate
Representaions
An Example
Malte Skambath
IL_0000:
IL_0001:
IL_0002:
IL_0003:
(i=a)
IL_0004:
IL_0009:
IL_000a:
IL_000b:
IL_000c:
IL_000d:
IL_000e:
IL_000f:
IL_0010:
IL_0011:
IL_0012:
IL_0013:
IL_0018:
IL_0019:
ldc.i4.0
stloc.0
ldarg.0
stloc.1
//
// sum = 0
// load a on the stack
// store a in first var
br IL_0011
ldloc.0
ldloc.1
add
stloc.0
ldloc.1
ldc.i4.1
add
stloc.1
ldloc.1
ldarg.1
ble IL_0009
ldloc.0
ret
//
//
//
//
//
//
//
//
//
//
//
//
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
--+
|
<--+
|
|
|
|
|
|
|
.
|
.
<-+
.
load b
|
i<=b
-+
Register Machines
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
32 / 34
Intermediate
Representaions
Conclusion
Malte Skambath
Intermediate Representations . . .
I
allow a clean and general
compiler-architecture/infrastructure
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
I
allow mixing different programming languages
I
programmer may loose control on the real
control-flow
Implementations
I
program-flow can be optimized
Conclusion
I
adaption to different hardware configurations
(including GPU-support).
I
improve the development of new programming
languages
I
can realize translations between different languages
Register Machines
Three-Address Code
Static-Single-Assignment
LLVM
CIL
33 / 34
Intermediate
Representaions
Malte Skambath
We need Compilers!
Classical Compiler
Process
Machine Models
Stack Machines
Register Machines
Any Questions?
Three-Address Code
Static-Single-Assignment
Implementations
LLVM
CIL
Conclusion
34 / 34

Documents pareils