Intermediate Representaions - Concepts of Programming
Transcription
Intermediate Representaions - Concepts of Programming
Intermediate Representaions Concepts of Programming Languages (CoPL) Malte Skambath [email protected] November 16, 2015 Intermediate Representaions Overview Malte Skambath We need Compilers! Classical Compiler Process We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Machine Models Stack Machines Register Machines Static-Single-Assignment Implementations LLVM CIL Conclusion Implementations LLVM CIL Conclusion 2 / 34 Developing Software Intermediate Representaions We need compilers! Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM ? CIL Conclusion x86 3 / 34 Developing Software Intermediate Representaions We need compilers! Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM ? x86 CIL Conclusion AMD64 ARM 3 / 34 Developing Software Intermediate Representaions We need compilers! Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM ? x86 CIL Conclusion AMD64 ARM 3 / 34 Intermediate Representation Intermediate Representaions The solution! Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Intermediate Representation x86 AMD64 Conclusion ARM 4 / 34 Intermediate Representaions Intermediate Representation Malte Skambath We need Compilers! Classical Compiler Process Machine Models Definition An intermediate representation (IR) is data structure as representation of a program between a high-level programming language and machine code. Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion An intermediate language (IL) is a low-level assembly language as IR for a virtual machine. 5 / 34 Intermediate Representaions Classical Compiler Process Malte Skambath We need Compilers! Classical Compiler Process Lexical Analysis (Scanner) Machine Models Stack Machines Tokens Register Machines Three-Address Code Syntax Analysis (Parser) Frontend ST/AST Static-Single-Assignment Implementations LLVM CIL Semantic Analysis Conclusion CFG Optimization CFG Code Generation Backend 6 / 34 Intermediate Representaions Abstract Sytax Tree Malte Skambath An abstract syntax tree (AST) . . . . . . describes the syntactical structure of a program . . . depends on the programming language . . . is generated during by the parser We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines program Three-Address Code Static-Single-Assignment Implementations block return while LLVM CIL Conclusion ... condition body ... assign variable sum variable sum bin op: * variable i 7 / 34 Intermediate Representaions Control-Flow-Graph Malte Skambath We need Compilers! Classical Compiler Process int s = 1; for(int i=1; i<=10; i++) s += i; return (s); Machine Models Stack Machines i ← 1, s ← 0 Register Machines Three-Address Code Static-Single-Assignment Implementations i ≤ 10 no yes LLVM CIL Conclusion ret(s) i←i+1 s←s+i 8 / 34 Intermediate Representaions Stack Machines Malte Skambath Definition We need Compilers! A general Stack Machine has Classical Compiler Process I a stack as storage I a set of instructions / operations op = F(a1 , a2 , . . . , an ) including (push and pop) Machine Models Executing an operation takes the arguments from top of the stack, computes the result in the accumulator, and pushes the result back the stack. Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion Example push 1 push 2 push 3 add pop 3 1 2 1 2 1 5 1 1 9 / 34 Stack-machines Intermediate Representaions Code Generation Malte Skambath We need Compilers! We can generate the control by traversing the psyntax tree. Assume we have to compute the expression x 2 + y2 . Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion 10 / 34 Stack-machines Intermediate Representaions Code Generation Malte Skambath We need Compilers! We can generate the control by traversing the psyntax tree. Assume we have to compute the expression x 2 + y2 . Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code AST sqrt Static-Single-Assignment push push mul push push mul add sqrt add mul x mul x y x x Implementations LLVM CIL y y Conclusion y 10 / 34 Stack Machines Intermediate Representaions Summary Malte Skambath We need Compilers! Classical Compiler Process I I Programs for stack machines are short Only the opcodes ( or constants) in the byte code. In practical use stack machines can be extended 1. An external memory to store and load values (computations are still limited to the stack) 2. Top-Level registers 3. Metainformations (see CIL later) I Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion Problem: Most processor-architectures use registers. ⇒ Hybrid Models, Special informations in the intermediate representation. 11 / 34 Intermediate Representaions Register Machines Malte Skambath We need Compilers! Definition A register machine . . . I consists of an infinite number of memory cells named registers Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations I each register is accessible I has a limited set of instruction / operations: LLVM CIL Conclusion 1. Arithmetical Operations: Computes a function F using selected registers ho1 , i . . . , hon i as operands and stores the result in a target register hri 2. Jumps/Branches 12 / 34 Intermediate Representaions Three-Address Code (3AC/TAC) I I Each TAC is a sequence of instructions I1 , I2 , . . . , In for a register machine. Instructions can be 1. Assignments r1 := r0 2. Unconditional Jumps (Instructions can be labeled) L0: goto L1 ... L1: r0 := 1 Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM 3. Conditional Branches if a<b then goto L1 CIL Conclusion 4. Arithmetical operations r3 := add(r1,r2) I Each instruction contains at most 3 registers 13 / 34 Intermediate Representaions Three-Address Code (3AC/TAC) I I Each TAC is a sequence of instructions I1 , I2 , . . . , In for a register machine. Instructions can be 1. Assignments r1 := r0 2. Unconditional Jumps (Instructions can be labeled) L0: goto L1 ... L1: r0 := 1 Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM 3. Conditional Branches if a<b then goto L1 CIL Conclusion 4. Arithmetical operations r3 := add(r1,r2) I Each instruction contains at most 3 registers Example ( p x2 + y2 ) t1 := x * x t2 := y * y t3 := t1 + t2 result := sqrt(t3) 13 / 34 Three-Address Code (3AC/TAC) Intermediate Representaions How to design the Byte-Code Malte Skambath For practical use we should store TAC in byte code format. I Each operation has an opcode for the virtual machine I Each instruction can be represented by tuples We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code t1 t2 t1 res Quadruples opcode op1 MUL x MUL y ADD t1 SQRT t1 op2 x y t2 - Triples opcode op1 MUL x MUL y ADD (1) SQRT (3) Static-Single-Assignment op2 x y (2) - Implementations LLVM CIL Conclusion Note Registers can be assigned implicitly (Triples). But then each register has to be assigned only once. 14 / 34 Intermediate Representaions Static-Single-Assignment Malte Skambath We need Compilers! Classical Compiler Process Definition (Static-Single Assignment) A Three-Adress Code is in Static-Single Assignment-from if each register gets assigned once in the code. Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment p Example ( x 2 + y2 ) Implementations LLVM CIL Not in SSA L1: L2: L3: L4: x y x z := := := := SSA x * x y * y x + y sqrt(x) L1: L2: L3: L4: Conclusion x0 y0 x1 z := := := := x * x y * y x0 + y0 sqrt(x1) 15 / 34 Static-Single-Assigment Intermediate Representaions How to get SSA-form? Malte Skambath We need Compilers! A simple Algorithm I For each used register: hRi 1. Check if hRi gets assigned more than once 2. For each assignment/definition of hRi: Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations I Rename on the left side to hR.ii if this assignment is the i-th assignment to hRi LLVM CIL Conclusion 3. For each use of hRi: I Replace hRi with hR.ji where hR.ji was the previous replacement for hRi. Is this algorithm correct? 16 / 34 Static-Single-Assigment Intermediate Representaions How to get SSA-form? Malte Skambath We need Compilers! A simple Algorithm I For each used register: hRi 1. Check if hRi gets assigned more than once 2. For each assignment/definition of hRi: Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations I Rename on the left side to hR.ii if this assignment is the i-th assignment to hRi LLVM CIL Conclusion 3. For each use of hRi: I Replace hRi with hR.ji where hR.ji was the previous replacement for hRi. Is this algorithm correct? No! 16 / 34 Static-Single-Assignment Intermediate Representaions What if we have branches? Malte Skambath We need Compilers! Classical Compiler Process if a>b then goto L_A max := b; goto L_END L_A: max := a; goto L_END L_END: Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion a>b? max:=a max:=b a>b? m1:=a m2:=b max:=m? 17 / 34 Static-Single-Assignment Intermediate Representaions The Φ-function Malte Skambath We need Compilers! The Φ-function computes the value depending on the incoming branch. Classical Compiler Process Machine Models Stack Machines a←1 b←2 Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM x ← Φ(a, b) CIL Conclusion Note There is no real operation like Φ in real machines. After optimization Φ-statements have to be removed. 18 / 34 Static-Single-Assignment Intermediate Representaions The Φ-function Malte Skambath We need Compilers! The Φ-function computes the value depending on the incoming branch. Classical Compiler Process Machine Models Stack Machines a←1 b←2 Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM x ← Φ(a, b) CIL Conclusion x has value 1 Note There is no real operation like Φ in real machines. After optimization Φ-statements have to be removed. 18 / 34 Static-Single-Assignment Intermediate Representaions The Φ-function Malte Skambath We need Compilers! The Φ-function computes the value depending on the incoming branch. Classical Compiler Process Machine Models Stack Machines a←1 b←2 Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM x ← Φ(a, b) CIL Conclusion x has value 2 Note There is no real operation like Φ in real machines. After optimization Φ-statements have to be removed. 18 / 34 Intermediate Representaions Getting Code in SSA-form. Malte Skambath We need Compilers! L1: if r_a < r_b then goto L3: L2: t_1 := r_a goto L4 Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations L3: t_2 := r_b goto L4 LLVM CIL Conclusion L4: max := phi t_1 [from L2], t_2 [from L3] 19 / 34 Intermediate Representaions Converting to SSA-Form Malte Skambath We need Compilers! 1. Place Φ-function terms Classical Compiler Process 2. Rename registers to achieve SSA-form Machine Models Stack Machines Register Machines z ← ... Three-Address Code Static-Single-Assignment Implementations LLVM a ← ... b ← ... CIL Conclusion z ← Φ(z, z) Using the Φ-function after each branch for previous registers is an unpractical solution. 20 / 34 Intermediate Representaions Dominance Frontiers Malte Skambath Definition We need Compilers! We say x dominates y (x dom y) if on all paths to Y in the CFG the program has to run over X. Classical Compiler Process Machine Models Stack Machines Definition Register Machines y is in the dominance frontier of x (DF(x)) iff not x dom y and y has a direct predecessor on all paths to y Three-Address Code Static-Single-Assignment Implementations LLVM CIL 1 Conclusion 1 5 2 2 3 5 4 7 8 9 7 6 3 4 8 6 DOM(5) = {5, 6, 7, 8} DF(5) = {4, 9} 9 21 / 34 Intermediate Representaions Dominance Frontiers Malte Skambath We need Compilers! Assume that node 3 defines variable x, DF(3) = {5} x ∈ Def(3) 1 Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment 2 3 Implementations LLVM CIL 5 4 Conclusion 6 Is 5 the only node we need to insert a Φ-function for x? 22 / 34 Intermediate Representaions Dominance Frontiers Malte Skambath We need Compilers! Assume that node 3 defines variable x, DF(3) = {5} x ∈ Def(3) 1 Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment 2 3 Implementations LLVM CIL 5 4 Conclusion 6 Is 5 the only node we need to insert a Φ-function for x? No, at node 6. Why? 22 / 34 Intermediate Representaions Architecture of the LLVM-compiler process Malte Skambath We need Compilers! C x86 Backend clang x86 Classical Compiler Process Machine Models Stack Machines Fortran llvmgcc LLVM Optimizer ARM Backend Register Machines ARM Three-Address Code Static-Single-Assignment Implementations Haskell GHC PowerPC Backend LLVM PPC CIL Conclusion LLVM uses a special intermediate representation (LLVM-IR) for a virtual register machine. 23 / 34 2 The LLVM Approach LLVM Compilation Strategy The centerpiece of the system is the LLVM virtual instruction set. It combines a low-level representation with hig level information about operands, by using Static Single Assignment (SSA) form and high-level, language independ type information (more detail is provided in Section 3). The RISC like, low-level instructions of LLVM allow for ma traditional optimizations to be directly applied to LLVM code, while the high-level type information allows for no high-level transformations as well. This common representation enables the powerful multi-phase compilation strate shown in Figure 1. Offline Reoptimizer Libraries LLVM Static Compiler 1 Runtime Optimizer LLVM LLVM . . Static Compiler N Native .o files .exe Optimizing Linker .exe (llvm + native) Profile & Trace Info Profile & Trace Info Optimized Code Host Machine LLVM C. Lattner, The LLVM Instruction Set and Compilation Strategy, 2002 Figure 1: LLVM Compilation Strategy Overview As Figure 1 illustrates, the LLVM compilation strategy exactly matches the standard compile-link-execute mo of program development, with the addition of a runtime and offline optimizer. Unlike a traditional compiler, howev the .o files generated by an LLVM static compiler do not contain any machine code at all – they contain LLVM co in a compressed format. The LLVM optimizing linker combines LLVM object files, applies interprocedural optimizations, generates nat code, and links in libraries provided in native code form. An interesting feature of the executables produced by 2 Intermediate Representaions LLVM-IR @.str = private unnamed_addr constant [11 x i8] c" %d <= %d \00", align 1 ; Function Attrs: nounwind uwtable define void @minmax(i32 %a, i32 %b) #0 { %1 = icmp sgt i32 %a, %b br i1 %1, label %2, label %3 Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment ; <label>:2 br label %4 ; <label>:3 br label %4 ; preds = %0 Implementations LLVM CIL ; preds = %0 Conclusion ; <label>:4 ; preds = %3, %2 %max.0 = phi i32 [ %a, %2 ], [ %b, %3 ] %min.0 = phi i32 [ %b, %2 ], [ %a, %3 ] %5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0), i32 %min.0, i32 %max.0) ret void } 25 / 34 Intermediate Representaions LLVM-IR I I I LLVM is register-based. Registers are written as %<registername> (e.g. %R1 = ...) @ is used for global variables (e.g. function names) LLVM use types i1, i8, i32 for boolean, Byte and 32-Bit Integer values Reduced instruction-set I I I Memory Access %ptr = alloca i32 Comparing %res = icmp <opt> <type> %a, %b Conditional Branches Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion br i1 %cond, label %IfLabel, label %ElseLabel I I I Function calls %res = call phi-Instruction for assignments depending on the control flow Functions: define <type> @FctName(<type> %arg1,...){...} I Metadata 26 / 34 Intermediate Representaions LLVM-IR @.str = private unnamed_addr constant [11 x i8] c" %d <= %d \00", align 1 ; Function Attrs: nounwind uwtable define void @minmax(i32 %a, i32 %b) #0 { %1 = icmp sgt i32 %a, %b br i1 %1, label %2, label %3 Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Three-Address Code Static-Single-Assignment ; <label>:2 br label %4 ; <label>:3 br label %4 ; preds = %0 Implementations LLVM CIL ; preds = %0 ; <label>:4 ; preds = %3, %2 %max.0 = phi i32 [ %a, %2 ], [ %b, %3 ] %min.0 = phi i32 [ %b, %2 ], [ %a, %3 ] %5 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0), i32 %min.0, i32 %max.0) ret void } Conclusion 27 / 34 LLVM-IR Intermediate Representaions Another example Malte Skambath define i32 @main() #0 { %c = alloca [10 x i32], align 16 br label %1 1: %sum.0 = phi i32 [ 0, %0 ], [ %4, %i.0 = phi i32 [ 1, %0 ], [ %8, %2 = icmp sle i32 %i.0, 10 br i1 %2, label %3, label %9 3: ; %4 = add nsw i32 %sum.0, %i.0 %5 = sext i32 %i.0 to i64 %6 = getelementptr inbounds [10 x i32]* %c, i32 0, i64 %5 store i32 %4, i32* %6, align 4 br label %7 7: ; %8 = add nsw i32 %i.0, 1 br label %1 9: ; ret i32 0 } We need Compilers! Classical Compiler Process Machine Models %7 ] %7 ] Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations preds = %1 LLVM CIL Conclusion i32], [10 x preds = %3 preds = %1 28 / 34 Intermediate Representaions Common Intermediate Language Malte Skambath C# VB.net F# We need Compilers! Classical Compiler Process Common Intermediate Language (CIL) Executable-File containing CIL Machine Models Stack Machines Register Machines Three-Address Code Common Language Runtime (CLR) Static-Single-Assignment Implementations JIT-C Libraries LLVM CIL Conclusion Common Language Infrastructure (CLI) The CLR is the CLI-Implementation of Microsoft and part of the .net-Framework. The CLI also speciefies a Type System (CTS) and a basic set of class libraries. 29 / 34 Intermediate Representaions CIL Malte Skambath We need Compilers! I Stack based virtual machine. Classical Compiler Process I Each method has a header Machine Models I Typed instruction-set (e. g. ldc.i4.0 load constant 0 as 4-Byte int) I Access to local variables ldloc.<index>, Stack Machines Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL stloc.<index> I Conclusion Object oriented I Load field values I ldfld string Program/Person::prename Create new objects newobj instance void class <CLASS>’.ctor’(...) 30 / 34 CIL Intermediate Representaions An Example Malte Skambath We need Compilers! Classical Compiler Process Machine Models .method public static hidebysig default int32 sum (int32 a, int32 b) managed { .maxstack 2 .locals init (int32 V_0, int32 V_1) IL_0000: ldc.i4.0 // ... } Stack Machines Register Machines cil Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion 31 / 34 CIL Intermediate Representaions An Example Malte Skambath IL_0000: IL_0001: IL_0002: IL_0003: (i=a) IL_0004: IL_0009: IL_000a: IL_000b: IL_000c: IL_000d: IL_000e: IL_000f: IL_0010: IL_0011: IL_0012: IL_0013: IL_0018: IL_0019: ldc.i4.0 stloc.0 ldarg.0 stloc.1 // // sum = 0 // load a on the stack // store a in first var br IL_0011 ldloc.0 ldloc.1 add stloc.0 ldloc.1 ldc.i4.1 add stloc.1 ldloc.1 ldarg.1 ble IL_0009 ldloc.0 ret // // // // // // // // // // // // We need Compilers! Classical Compiler Process Machine Models Stack Machines --+ | <--+ | | | | | | | . | . <-+ . load b | i<=b -+ Register Machines Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion 32 / 34 Intermediate Representaions Conclusion Malte Skambath Intermediate Representations . . . I allow a clean and general compiler-architecture/infrastructure We need Compilers! Classical Compiler Process Machine Models Stack Machines I allow mixing different programming languages I programmer may loose control on the real control-flow Implementations I program-flow can be optimized Conclusion I adaption to different hardware configurations (including GPU-support). I improve the development of new programming languages I can realize translations between different languages Register Machines Three-Address Code Static-Single-Assignment LLVM CIL 33 / 34 Intermediate Representaions Malte Skambath We need Compilers! Classical Compiler Process Machine Models Stack Machines Register Machines Any Questions? Three-Address Code Static-Single-Assignment Implementations LLVM CIL Conclusion 34 / 34