Jeux d`instructions
Transcription
Jeux d`instructions
Jeux d’instructions Architecture des microprocesseurs – GIF-3000 Professeur : Christian Gagné Semaine 3 : 13 septembre 2010 GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 1 / 45 Partie I Caractérisation des jeux d’instructions GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 2 / 45 Stockage interne et registres Type de stockage interne I I I Architecture à pile (dessus de la pile) Architecture à accumulateur (registre accumulateur) Architecture à registres d’usage général (opérandes explicites) Type de registres I I I Architecture registre-mémoire Architecture load-store Architecture mémoire-mémoire (désuet) GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 3 / 45 ... B-4 B Instruction Set Principles and Examples OpérandesAppendix et architectures ALU (a) Stack (b) Accumulator Processor Memory ALU ALU ... ... (c) Register-memory ... ... ... ... ... ALU (d) Register-register/load-store ... TOS ... ALU ... ALU ... ALU ... ... ALU Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU ...operation, or both an...input and result. Lighter ...shades indicate inputs, ...and the dark shade indicates theMemory result. In (a), a Top Of Stack register (TOS), points to the top input operand, which is combined with the operand below. The first operand is removed from the stack, the result takes the place of the second operand, and TOS is updated to point to the result. All operands are implicit. In (b), the Accumulator is both an implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a ... ... to memory only...via separegister. All operands are registers in ... (d) and, like the stack architecture, can be transferred rate instructions: push or pop for (a) and load or store for (d). Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU operation, or both an input and result. Lighter shades indicate inputs, and the dark shade indicates the result. In (a), a Top Of Stack Register register (TOS), points to the top input operand, which is comStack Accumulator (register-memory) Register bined with the operand below. The first operand is removed from the stack, the result takes (load-store) the place of the second operand, andPush TOS isA updatedLoad to point to the result. All operands (b), the Accumulator is both an A Load R1,A are implicit. InLoad R1,A implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a Push B are registers Add in B (d) and, like the stack Add architecture, R3,R1,B can be transferred Load R2,B register. All operands to memory only via sepaAdd push or pop Store C load or storeStore Add R3,R1,R2 rate instructions: for (a) and for (d). R3,C Pop C Store R3,C Figure B.2 The code sequence for C = A + B for four classes of instruction sets. Note that the Add instruction has implicit operands for stack and accumulator architectures, Register GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 4 / 45 Architectures modernes (après 1980) Architectures modernes utilisent des registres d’usage général Registres plus rapides que la mémoire I Réduit également le traffic vers la mémoire Registres plus faciles à exploiter par le compilateur I P.ex. (A × B) − (B × C ) − (A × D) Compilateur exploite plus facilement des registres généraux I Allocation plus difficile avec contraintes sur les registres Nommage de registres plus compact que nommage d’adresses en mémoire GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 5 / 45 Nombre et type d’opérandes B-6 Appendix B Instruction Set Principles and Examples Number of memory addresses Maximum number of operands allowed Type of architecture Examples 0 3 Load-store Alpha, ARM, MIPS, PowerPC, SPARC, SuperH, TM32 1 2 Register-memory IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x 2 2 Memory-memory VAX (also has three-operand formats) 3 3 Memory-memory VAX (also has two-operand formats) Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand. Type Register-register (0, 3) Advantages Disadvantages Simple, fixed-length instruction encoding. Higher instruction count than architectures with Simple code generation model. Instructions memory references in instructions. More instructions GIF-3000 (U. Laval) d’instructions C.larger Gagné 6 / 45 and lower instruction density leads to take similar numbers of clocksJeux to execute 2 2 Types d’architectures 3 3 Memory-memory VAX (also has three-operand formats) Memory-memory VAX (also has two-operand formats) Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand. Type Advantages Disadvantages Register-register (0, 3) Simple, fixed-length instruction encoding. Simple code generation model. Instructions take similar numbers of clocks to execute (see App. A). Higher instruction count than architectures with memory references in instructions. More instructions and lower instruction density leads to larger programs. Register-memory (1, 2) Data can be accessed without a separate load Operands are not equivalent since a source operand in instruction first. Instruction format tends to be a binary operation is destroyed. Encoding a register number and a memory address in each instruction easy to encode and yields good density. may restrict the number of registers. Clocks per instruction vary by operand location. Memory-memory (2, 2) or (3, 3) Most compact. Doesn’t waste registers for temporaries. Large variation in instruction size, especially for three-operand instructions. In addition, large variation in work per instruction. Memory accesses create memory bottleneck. (Not used today.) Figure B.4 Advantages and disadvantages of the three most common types of general-purpose register computers. The notation (m, n) means m memory operands and n total operands. In general, computers with fewer alternatives simplify the compiler’s task since there are fewer decisions for the compiler to make (see Section B.8). Computers with a wide variety of flexible instruction formats reduce the number of bits required to encode the program. The number of registers also affects the instruction size since you need log2 (number of registers) for each register specifier in an instruction. Thus, doubling the number of registers takes 3 extra bits for a register-register architecture, or about 10% of a 32-bit instruction. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 7 / 45 Conventions d’adressage Conventions d’adressage I Exemple, accès aux mot 0Ah 0Bh 0Ch 0Dh Convention Little Endian I I Adressage : Mem[x]=0Dh ; Mem[x+1]=0Ch ; Mem[x+2]=0Bh ; Mem[x+3]=0Ah Windows, Linux et Mac OS sur x86 et x64 Convention Big Endian I I Adressage : Mem[x]=0Ah ; Mem[x+1]=0Bh ; Mem[x+2]=0Ch ; Mem[x+3]=0Dh Solaris sur SPARC, Linux et Mac OS sur PowerPC Pas de différence en performance entre les conventions d’adressage I I Mais créé des maux de tête pour l’échange de données entre des ordinateurs utilisant des conventions différentes Convention Little Endian inverse l’ordre des caractères d’une chaı̂ne (« EESREVNI ») GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 8 / 45 Alignement Donnée de plus d’un octet sont alignées I Matériel généralement aligné sur multiple de mots ou mots doubles Appendix B Instruction Set Principles and Examples Accès à données non-alignées requiert plusieurs accès mémoires alignés I B-8 Value of 3 low-order bits of byte address Width of object 1 byte (byte) 2 bytes (half word) 2 bytes (half word) 4 bytes (word) 4 bytes (word) 4 bytes (word) 0 1 Aligned 2 Aligned 3 Aligned Aligned 4 Aligned Aligned Misaligned 8 bytes (double word) 6 Aligned 7 Aligned Aligned Misaligned Aligned Aligned Misaligned Aligned Misaligned Aligned Misaligned Misaligned Misaligned Misaligned Misaligned 4 bytes (word) 8 bytes (double word) 5 Aligned Misaligned Aligned Misaligned Misaligned 8 bytes (double word) Misaligned 8 bytes (double word) Misaligned 8 bytes (double word) Misaligned 8 bytes (double word) Misaligned 8 bytes (double word) Misaligned 8 bytes (double word) Figure B.5 Aligned and misaligned addresses of byte, half-word, word, and double-word objects for byteaddressed computers. For each misaligned example some objects require two memory accesses to complete. Every aligned object can always complete in one memory access, as long as the memory is as wide as the object. The figure GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 9 / 45 Modes d’adressage Mode d’adressage : comment spécifier des adresses mémoire dans les instructions I I I I I Registre, immédiat : pas d’accès mémoire Déplacement, registre indirect, indexé : adresse mémoire dans registres Direct, indirect : adresse mémoire dans espace mémoire Autoincrément, autodécrément : utile pour l’exécutions de boucles Mis à l’échelle : utile pour traiter des tableaux de valeurs B.3 Memory Addressing B-11 ■ Utilisation de modes d’adressage sophistiquées et/ou variés I I I Peut réduire le nombre d’instructions d’un programme Augmente la complexité du matériel Règle de conception : se concentrer sur le cas usuel GIF-3000 (U. Laval) Jeux Memory indirect TeX spice gcc Scaled TeX spice gcc Register indirect TeX spice gcc Immediate TeX spice gcc Displacement TeX spice gcc 1% 6% 1% 0% 16% 6% 24% 3% 11% 43% 17% 39% 32% 55% 40% 0% 10% 20% 30% 40% 50% 60% Frequency of the addressing mode Figure B.7 Summary of use of memory addressing modes (including immediates). These major addressing modes account for all but a few percent (0% to 3%) of the memory accesses. Register modes, which are not counted, account for one-half of the operand references, while memory addressing modes (including immediate) account for the other half. Of course, the compiler affects whatC. addressing used; d’instructions Gagnémodes are10 / see 45 of an object they will access. Addressing modes specify constants and registers in addition to locations in memory. When a memory location is used, the actual memory address specified by the addressing mode is called the effective address. Figure B.6 shows all the data addressing modes that have been used in recent computers. Immediates or literals are usually considered memory addressing Modes d’adressage Addressing mode Example instruction Meaning When used Register Add R4,R3 Regs[R4] ← Regs[R4] + Regs[R3] When a value is in a register. Immediate Add R4,#3 Regs[R4] ← Regs[R4] + 3 For constants. Displacement Add R4,100(R1) Regs[R4] ← Regs[R4] + Mem[100+Regs[R1]] Accessing local variables (+ simulates register indirect, direct addressing modes). Register indirect Add R4,(R1) Regs[R4] ← Regs[R4] + Mem[Regs[R1]] Accessing using a pointer or a computed address. Indexed Add R3,(R1+R2) Regs[R3] ← Regs[R3] + Mem[Regs[R1]+Regs[R2]] Sometimes useful in array addressing: R1 = base of array; R2 = index amount. Direct or absolute Add R1,(1001) Regs[R1] ← Regs[R1] + Mem[1001] Sometimes useful for accessing static data; address constant may need to be large. Memory indirect Add R1,@(R3) Regs[R1] ← Regs[R1] + Mem[Mem[Regs[R3]]] If R3 is the address of a pointer p, then mode yields *p. Autoincrement Add R1,(R2)+ Regs[R1] ← Regs[R1] + Mem[Regs[R2]] Regs[R2] ← Regs[R2] + d Useful for stepping through arrays within a loop. R2 points to start of array; each reference increments R2 by size of an element, d. Autodecrement Add R1,–(R2) Regs[R2] ← Regs[R2] – d Regs[R1] ← Regs[R1] + Mem[Regs[R2]] Same use as autoincrement. Autodecrement/-increment can also act as push/pop to implement a stack. Scaled Add R1,100(R2)[R3] Regs[R1] ← Regs[R1] + Mem[100+Regs[R2] + Regs[R3]*d] Used to index arrays. May be applied to any indexed addressing mode in some computers. Figure B.6 Selection of addressing modes with examples, meaning, and usage. In autoincrement/-decrement and scaled addressing modes, the variable d designates the size of the data item being accessed (i.e., whether the instruction is accessing 1, 2, 4, or 8 bytes). These modes are only useful when the elementsC.being GIF-3000 (U. Laval) Jeux addressing d’instructions Gagné 11 / 45 Adressage par déplacement B-12 Appendix B Instruction Set Principles and Examples 40% 35% Integer average 30% 25% Percentage of displacement 20% Floating-point average 15% 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of bits of displacement Figure B.8 Displacement values are widely distributed. There are both a large number of small values and a fair number of large values. The wide distribution of displacement values is due to multiple storage areas for variables Variété de valeurs de them déplacements abscisse) and different displacements to access (see Section B.8) (échelle as well as thelog overall addressing scheme the compiler 2 en uses. The I x-axis is log2 of the displacement; that is, the size of a field needed to represent the magnitude of the disSigne du déplacement absent ; majorité des grands déplacements sont placement. Zero on the x-axis shows the percentage of displacements of value 0. The graph does not include the négatifs sign bit, which is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements, they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimizaGIF-3000 (U. Laval) Jeux d’instructions C. Gagné 12 / 45 sign bit, which is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements, they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimization (see Section B.8) for SPEC CPU2000, showing the average of integer programs (CINT2000) and the average of floating-point programs (CFP2000). Opérandes immédiates Floating-point average Integer average 22% 23% Loads 19% ALU operations 25% 16% All instructions B.4 Type and Size of Operands 25% 30% ■ B-13 21% 0% 5% 10% 15% 20% 45% B.9 About one-quarter of data transfers and ALU operations have an immeFigure diate operand. The bottom bars show that integer programs use immediates in about 40% of the instructions, while floating-point programs use immediates in about one-fifth one-sixth of the instructions. For loads, the load immediate instruction loads 16 bits into either half of a 32-bit register. Load immediates are not loads in a strict sense 35% because they do not access memory. Occasionally a pair of load immediates is used to Floating-point average load a 32-bit constant, but this is rare. (For ALU operations, shifts by a constant amount 30% are included as operations with immediate operands.) The programs and computer used to collect these statistics are the same as in Figure B.8. 25% Percentage of immediates 20% 15% Integer average 10% 5% 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of bits needed for immediate Figure B.10 The distribution of immediate values. The x-axis shows the number of bits needed to represent the magnitude immediate value—0 means theJeux immediate field value was 0. The majority of the immediate values GIF-3000 of (U.anLaval) d’instructions C. Gagné 13 / 45 Types d’opérandes Type des opérandes est généralement spécifié par l’opcode I I Opérations sur entiers, nombres à points flottant, caractères, etc. Taille données par le type F Caractères (8 bits), flottants à simple précision (32 bits), flottants à double précision (64 bits), B.5 etc. Operations in the Instruction Set ■ B-15 Distribution des accès selon la taille (architecture 64 bits) Double word (64 bits) 59% Word (32 bits) Half word (16 bits) Byte (8 bits) 70% 29% 26% 0% 0% Floating-point average Integer average 5% 1% 10% 20% 40% 60% 80% Figure B.11 Distribution of data accesses by size for the benchmark programs. The double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all double-word accesses in integer programs would become single-word accesses. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 14 / 45 double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all double-word accesses in integer programs would become single-word accesses. Types d’opérandes Operator type Examples Arithmetic and logical Integer arithmetic and logical operations: add, subtract, and, or, multiply, divide Data transfer Loads-stores (move instructions on computers with memory addressing) Control Branch, jump, procedure call and return, traps System Operating system call, virtual memory management instructions Floating point Floating-point operations: add, multiply, divide, compare Decimal Decimal add, decimal multiply, decimal-to-character conversions String String move, string compare, string search Graphics Pixel and vertex operations, compression/decompression operations Figure B.12 Categories of instruction operators and examples of each. All computers generally provide a full set of operations for the first three categories. The support for system functions in the instruction set varies widely among architectures, but all computers must have some instruction support for basic system functions. The amount of support in the instruction set forJeux the d’instructions last four categories may vary from none to an GIF-3000 (U. Laval) C. Gagné 15 / 45 6 Fréquence des instructions Appendix B Instruction Set Principles and Examples Dix instructions les plus fréquentes sur x86 (SPECint92) Integer average (% total executed) Rank 80x86 instruction 1 load 2 conditional branch 20% 3 compare 16% 4 store 12% 5 add 8% 6 and 6% 7 sub 5% 8 move register-register 4% 9 call 1% 10 return Total 22% 1% 96% Figure B.13 The top 10 instructions for the 80x86. Simple instructions dominate this list and are responsible for 96% of the instructions executed. These percentages are the average of the five SPECint92 programs. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 16 / 45 Instructions modifiant flot de contrôle Modifications du flot de contrôle I I I I Branchements conditionnels (si-alors) Sauts (jumps) B.6 Instructions for Control Flow Appels de procédures Retour de procédures Call/return Jump B-17 Floating-point average 8% Integer average 19% 10% 6% 82% 75% Conditional branch 0% ■ 25% 50% 75% 100% Frequency of branch instructions Figure B.14 Breakdown of control flow instructions into three classes: calls or returns, jumps, and conditional branches. Conditional branches clearly dominate. Each type is counted in one of three bars. The programs and computer used to collect these statistics are the same as those in Figure B.8. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 17 / 45 Adressage dans le flot de contrôle Adressage explicite (plus courant) I I I Spécifie l’adresse mémoire relativement au compteur de programme (registre PC) Sauts généralement proches en mémoire Adresse indépendante de la région en mémoire où le programme est chargé Adressage dynamique I I Adresse inconnue à la compilation Adresse spécifiée dans un registre F F I Pas de limite sur l’adresse Requiert une instruction pour placer adresse dans un registre Exemples F F F F Retour de fonction (adresse de retour sur la pile d’appels) Switch-cases Pointeur de fonction, fonction virtuelle Librairie dynamique GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 18 / 45 ing; at a minimum the return address must be saved somewhere, sometimes in a special link register or just a GPR. Some older architectures provide a mechanism to save many registers, while newer architectures require the compiler to generate stores and loads for each register saved and restored. There are two basic conventions in use to save registers: either at the call site or inside the procedure being called. Caller saving means that the calling procedure must save the registers that it wants preserved for access after the call, and thus the called procedure need not worry about registers. Callee saving is the opposite: the called procedure must save the registers it wants to use, leaving the caller unrestrained.There are times when caller save must be used because of Branchements conditionnels Comment indiquer si un branchement a lieu ? Name Examples How condition is tested Condition code (CC) 80x86, ARM, PowerPC, SPARC, SuperH Tests special bits set by Sometimes condition ALU operations, possibly is set for free. under program control. Advantages CC is extra state. Condition codes constrain the ordering of instructions since they pass information from one instruction to a branch. Disadvantages Condition register Alpha, MIPS Tests arbitrary register with the result of a comparison. Simple. Uses up a register. Compare and branch PA-RISC, VAX Compare is part of the branch. Often compare is limited to subset. One instruction rather May be too much work per than two for a branch. instruction for pipelined execution. Figure B.16 The major methods for evaluating branch conditions, their advantages, and their disadvantages. condition codes can be set by ALU operations that are needed for other purposes, measurements on proCode deAlthough condition (x86, ARM, PowerPC, etc.) grams show that this rarely happens. The major implementation problems with condition codes arise when the con- I I dition code is set by a large or haphazardly chosen subset of the instructions, rather than being controlled by a bit in État de l’ALU après une and opération :Z (zéro), Nand(négatif), C (retenue), the instruction. Computers with compare branch often limit the set of compares use a condition register for more complex compares. Often, different techniques are used for branches based on floating-point comparison verO (débordement) sus those based on integer comparison. This dichotomy is reasonable since the number of branches that depend on floating-point comparisons is much smaller than the number depending on integer comparisons. Contraint l’ordre d’exécution des instructions GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 19 / 45 Encodage des jeux d’instructions Éléments d’encodage I I I Opcode : spécifie l’opérations Opérandes ou mode d’adressage Adresses, s’il y a lieu Types d’encodages I I I Taille fixe : opération et mode d’adressage dans opcode Taille variable : mode d’adressage indépendant Hybride : réduire variabilité mais permet différentes longueurs Exemple : instruction sur x86 add EAX,1000(EBX) I I I Opcode (1 octet) : addition de deux entiers de 32 bits (add) Spécifieur d’adresse (1-2 octets) : registre source/destination (EAX), mode d’adressage (déplacement) et registre de base du deuxième opérande (EBX) Adresse (1 ou 4 octets) : adresse du déplacement (1000) GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 20 / 45 add EAX,1000(EBX) Encodage d’instructions Operation and Address no. of operands specifier 1 Address field 1 Address specifier n Address field n (a) Variable (e.g., Intel 80x86, VAX) Operation Address field 1 Address field 2 Address field 3 (b) Fixed (e.g., Alpha, ARM, MIPS, PowerPC, SPARC, SuperH) Operation Address specifier Address field Operation Address specifier 1 Address specifier 2 Address field Operation Address specifier Address field 1 Address field 2 (c) Hybrid (e.g., IBM 360/370, MIPS16, Thumb, TI TMS320C54x) GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 21 / 45 Taille fixe vs taille variable Taille fixe I I Réduction de complexité de la logique pour traiter les instructions Programme plus gros qu’avec taille variable F Taille de programme est un élément crucial en informatique embarquée Nombre de registres disponibles I I I Plus de registres est mieux pour la compilation Facilite la parallélisation pour le pipelining Impact direct sur la taille des instructions (nb. bits d’encodage) GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 22 / 45 Partie II Rôle des compilateurs GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 23 / 45 Compilation Approche moderne pour la programmation d’applications I I Programmée dans un langage de haut-niveau (p.ex. C, C++) Instructions exécutées sont produites par un compilateur Compilateur est une technologie clée en architecture I Choix architecturaux influencent la capacité du compilateur à exploiter le matériel Phases de la compilation I I I I Transformer code dans une forme intermédiare Optimisation de haut-niveau (transformation des boucles, intégration de procédures) Optimisation globale (allocation registres, raffinements) Génération du code GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 24 / 45 B.8 Crosscutting Issues: The Role of Compilers Phases de compilation Dependencies Language dependent; machine independent Front end per language ■ B-25 Function Transform language to common intermediate form Intermediate representation Somewhat language dependent; largely machine independent Small language dependencies; machine dependencies slight (e.g., register counts/types) Highly machine dependent; language independent High-level optimizations Global optimizer Code generator For example, loop transformations and procedure inlining (also called procedure integration) Including global and local optimizations + register allocation Detailed instruction selection and machine-dependent optimizations; may include or be followed by assembler Figure B.19 Compilers typically consist of two to four passes, with more highly optiGIF-3000 (U. Laval) Jeux d’instructions C. Gagné 25 / 45 Optimisation de la compilation B-28 Appendix B Instruction Set Principles and Examples Optimization name Explanation High-level At or near the source level; processorindependent Percentage of the total number of optimizing transforms Procedure integration Replace procedure call by procedure body Local Within straight-line code N.M. Common subexpression elimination Replace two instances of the same computation by single copy 18% Constant propagation Replace all instances of a variable that is assigned a constant with the constant 22% Stack height reduction Rearrange expression tree to minimize resources needed for expression evaluation N.M. Global Across a branch Global common subexpression elimination Same as local, but this version crosses branches 13% Copy propagation Replace all instances of a variable A that has been assigned X (i.e., A = X) with X 11% Code motion Remove code from a loop that computes same value each iteration of the loop 16% Induction variable elimination Simplify/eliminate array addressing calculations within loops 2% Processor-dependent Depends on processor knowledge Strength reduction Many examples, such as replace multiply by a constant with adds and shifts N.M. Pipeline scheduling Reorder instructions to improve pipeline performance N.M. Branch offset optimization Choose the shortest branch displacement that reaches target N.M. Figure B.20 Major types of optimizations and examples in each class. These data tell us about the relative frequency of occurrence of various optimizations. The third column lists the static frequency with which some of the optimizations are applied in a set ofJeux 12 small FORTRAN and Pascal programs. There are nine local and GIF-3000common (U. Laval) d’instructions C.gloGagné 26 / 45 Allocation de variables Types d’allocations de variables I Variables locales F F F I Variables globales F F F I Allouées sur la pile Empilées et dépilées selon appels et retours de procédures Peuvent être allouées efficacement aux registres Accessibles partout dans le programme Généralement des tableaux ou structures de données Plus difficile d’allouer aux registres Objets dynamiques F F F Alloués sur le tas Accédés par des pointeurs Généralement impossibles à allouer à des registres GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 27 / 45 Simplifier la vie du compilateur Principe général : rendre le cas fréquent rapide, et le cas plus rare correct Régularité I Orthogonalité dans les choix F F F I I Opérations Types de données Mode d’adressage Exemple : opérations avec accès mémoire devrait avoir accès à tous les modes d’adressage permis Contre-exemple : limiter les registres accessibles pour une instruction Fournir des primitives, pas des solutions I Fournir des instructions simples et générales Simplifier les choix I Quand doit-on assigner une variable à un registre ? Permettre aux constante d’être traitées comme tel I Éviter d’évaluer à l’exécution des valeurs connues à la compilation GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 28 / 45 Partie III Exemple 1 : architecture MIPS GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 29 / 45 Architecture MIPS64 Architecture MIPS I I I Architecture registre-registre (load-store) Processeur RISC Développée dans le milieu des années 1980 F I I Travaux de Hennessy sur les pipelines, début 1980 Implémente les principales idées d’architectures présentées dans le cours Technologies des processeurs de la PS2 et PSP, maintenant essentiellement en informatique embarquée Registres I I 32 registres 64 bits d’usage général (R0, R1, . . ., R31) 32 registres 64 bits à point flottant (F0, F1, . . ., F31) Modes d’adressage I I Immédiat : add R4, R1, #8 → R4 = R1 + 8 Déplacement (16 bits) F F F LD R4, 30(R2) → R4 <- Mem[30+R2] Registre indirect : utiliser déplacement de 0 Registre absolu : utiliser valeur 0 dans registre (registre R0) GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 30 / 45 Architecture MIPS64 Accès mémoire I I I I Uniquement par load and store Adresses de 64 bits, avec accès aux octets Supporte little endian et big endian Accès alignés seulement Format des instructions I I Mode d’adressage dans opcode Instructions de 32 bits, opcode de 6 bits GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 31 / 45 B.9 Putting It All Together: The MIPS Architecture Format instructions MIPS I-type instruction 6 Opcode 5 rs 5 ■ B-35 16 rt Immediate Encodes: Loads and stores of bytes, half words, words, ‹ double words. All immediates (rt rs op immediate) Conditional branch instructions (rs is register, rd unused) Jump register, jump and link register (rd = 0, rs = destination, immediate = 0) R-type instruction 6 Opcode 5 rs 5 rt 5 5 rd shamt 6 funct Register-register ALU operations: rd rs funct rt Function encodes the data path operation: Add, Sub, . . . Read/write special registers and moves J-type instruction 6 Opcode 26 Offset added to PC Jump and jump and link Trap and return from exception GIF-3000 (U. Laval) Figure B.22 Instruction d’instructions C. of Gagné layout for Jeux MIPS. All instructions are encoded in one three 32 / 45 ■ A superscript is used to replicate a field (e.g., 048 yields a field of zeros of length 48 bits). Instructions Load et Store ■ The symbol ## is used to concatenate two fields and may appear on either side of a data transfer. Example instruction Instruction name Meaning LD Load double word Regs[R1]←64 Mem[30+Regs[R2]] LD R1,1000(R0) Load double word Regs[R1]←64 Mem[1000+0] LW R1,60(R2) Load word Regs[R1]←64 (Mem[60+Regs[R2]]0)32 ## Mem[60+Regs[R2]] LB R1,40(R3) Load byte Regs[R1]←64 (Mem[40+Regs[R3]]0)56 ## Mem[40+Regs[R3]] LBU R1,40(R3) Load byte unsigned Regs[R1]←64 056 ## Mem[40+Regs[R3]] LH R1,40(R3) Load half word Regs[R1]←64 (Mem[40+Regs[R3]]0)48 ## Mem[40+Regs[R3]] ## Mem[41+Regs[R3]] L.S F0,50(R3) Load FP single Regs[F0]←64 Mem[50+Regs[R3]] ## 032 L.D F0,50(R2) Load FP double Regs[F0]←64 Mem[50+Regs[R2]] SD R3,500(R4) Store double word Mem[500+Regs[R4]]←64 Regs[R3] SW R3,500(R4) Store word Mem[500+Regs[R4]]←32 Regs[R3]32..63 S.S F0,40(R3) Store FP single Mem[40+Regs[R3]]←32 Regs[F0]0..31 S.D F0,40(R3) Store FP double Mem[40+Regs[R3]]←64 Regs[F0] SH R3,502(R2) Store half Mem[502+Regs[R2]]←16 Regs[R3]48..63 SB Store byte Mem[41+Regs[R3]]←8 Regs[R2]56..63 R1,30(R2) R2,41(R3) Figure B.23 The load and store instructions in MIPS. All use a single addressing mode and require that the memory value be aligned. Of course, both loads and stores are available for all the data types shown. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 33 / 45 Architecture MIPS64 Comparaison I I Comparer deux registres, ou un registre et une constante Résultat dans registre destination Points flottants I Opère sur nombres à 32 bits et 64 bits Contrôle I I Sauts Branchements conditionnels GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 34 / 45 Instructions arithmétiques/logiques B.9 Putting It All Together: The MIPS Architecture ■ B-37 Example instruction Instruction name Meaning DADDU R1,R2,R3 Add unsigned Regs[R1]←Regs[R2]+Regs[R3] DADDIU R1,R2,#3 Add immediate unsigned Regs[R1]←Regs[R2]+3 LUI R1,#42 Load upper immediate Regs[R1]←032##42##016 DSLL R1,R2,#5 Shift left logical Regs[R1]←Regs[R2]<<5 Set less than if (Regs[R2]<Regs[R3]) Regs[R1]←1 else Regs[R1]←0 SLT R1,R2,R3 Figure B.24 Examples of arithmetic/logical instructions on MIPS, both with and without immediates. As an example, assuming that R8 and R10 are 64-bit registers: 24 ## Mem[Regs[R8]]C. Gagné Regs[R10] GIF-3000 (U. Laval) Jeux d’instructions 32..63 ← 32(Mem[Regs[R8]] 0) 35 / 45 Instructions de contrôle Instruction Set Principles and Examples Example instruction Instruction name Meaning J name Jump PC36..63←name JAL name Jump and link Regs[R31]←PC+8; PC36..63←name; ((PC+4)–227) ≤ name < ((PC+4)+227) JALR R2 Jump and link register Regs[R31]←PC+8; PC←Regs[R2] JR Jump register PC←Regs[R3] Branch equal zero if (Regs[R4]==0) PC←name; ((PC+4)–217) ≤ name < ((PC+4)+217) R3 BEQZ R4,name BNE R3,R4,name Branch not equal zero MOVZ R1,R2,R3 Conditional move if zero if (Regs[R3]!= Regs[R4]) PC←name; ((PC+4)–217) ≤ name < ((PC+4)+217) if (Regs[R3]==0) Regs[R1]←Regs[R2] Figure B.25 Typical control flow instructions in MIPS. All control instructions, except jumps to an address in a register, are PC-relative. Note that the branch distances are longer than the address field would suggest; since MIPS instructions are all bits long, GIF-3000 (U. Laval) Jeux d’instructions C. 32 Gagné 36 / 45 Profil d’exécution MIPS, SPECint2000 B.10 Fallacies and Pitfalls ■ B-41 gap gcc gzip mcf perlbmk Integer average load 26.5% 25.1% 20.1% 30.3% 28.7% 26% store 10.3% 13.2% 5.1% 4.3% 16.2% 10% add 21.1% 19.0% 26.9% 10.1% 16.7% 19% sub 1.7% 2.2% 5.1% 3.7% 2.5% 3% mul 1.4% 0.1% compare 2.8% 6.1% 6.6% 6.3% 3.8% load imm 4.8% 2.5% 1.5% 0.1% 1.7% 2% cond branch 9.3% 12.1% 11.0% 17.5% 10.9% 12% cond move 0.4% 0.6% 1.1% 0.1% 1.9% 1% jump 0.8% 0.7% 0.8% 0.7% 1.7% 1% call 1.6% 0.6% 0.4% 3.2% 1.1% 1% return 1.6% 0.6% 0.4% 3.2% 1.1% 1% shift 3.8% 1.1% 2.1% 1.1% 0.5% 2% and 4.3% 4.6% 9.4% 0.2% 1.2% 4% or 7.9% 8.5% 4.8% 17.6% 8.7% 9% xor 1.8% 2.1% 4.4% 1.5% 2.8% 3% other logical 0.1% 0.4% 0.1% 0.1% 0.3% Instruction 0% 5% 0% load FP 0% store FP 0% add FP 0% sub FP 0% mul FP 0% div FP 0% mov reg-reg FP 0% compare FP 0% cond mov FP 0% other FP 0% Figure B.27 MIPS dynamic instruction mix for five SPECint2000 programs. Note that integer register-register move instructions are included in the or instruction. Blank entries have the value 0.0%. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 37 / 45 Partie IV Exemple 2 : architecture x86 GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 38 / 45 Architecture x86 « The x86 isn’t all that complex – it just doesn’t make a lot of sense. » Mike Johnson, AMD, 1994. « ... its checkered ancestry has led to an architecture that is difficult to explain and impossible to love. » Hennessy et Patterson, Computer Architecture, p. J-46. Est malgré tout l’architecture x86 est la plus vendue dans le monde, soit approximativement 500 millions ! GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 39 / 45 Historique x86 1989 : 80486 1978 : 8086 I I I Extension du 8080 (8 bits, accumulateur) 16 bits, registres Entre accumulateur et registres d’usage général 1980 : 8087 I I Co-processeur à point flottant, 60 nouvelles instructions Entre architecture à pile et registres d’usage général 1992 : Pentium (80586) 1996 : P6 (Pentium Pro) I 1997 : Pentium II I I I I Processeur 32 bits GIF-3000 (U. Laval) Nouvelle architecture SSE2 : 144 instructions 2003 : x64 (AMD64) I 1985 : 80386 I SSE : 70 instructions 2001 : Pentium 4 I Espace d’adresse à 24 bits Compatibilité avec 8086 MMX : 57 instructions 1999 : Pentium III 1982 : 80286 I Augment. de performance, 4 nouvelles instructions I Jeux d’instructions Passage à 16 registres généraux 64 bits Mode compatibilité 32 bits C. Gagné 40 / 45 Registres x86 J-48 ■ Appendix J Survey of Instruction Set Architectures 80 x386, 80x486, Pentium 80 x86, 80 x286 31 15 8 7 0 GPR 0 EAX AX AH AL Accumulator GPR 1 ECX CX CH CL Count reg: string, loop GPR 2 EDX DX DH DL Data reg: multiply, divide GPR 3 EBX BX BH BL GPR 4 ESP SP Stack ptr. GPR 5 EBP BP Base ptr. (for base of stack seg.) GPR 6 ESI SI GPR 7 EDI DI PC Index reg, string source ptr. Index reg, string dest. ptr. CS Code segment ptr. SS Stack segment ptr. (top of stack) DS Data segment ptr. ES Extra data segment ptr. FS Data segment ptr. 2 GS Data segment ptr. 3 IP Instruction ptr. (PC) EIP EFLAGS Base addr. reg FLAGS Condition codes 79 0 FPR 0 FPR 1 FPR 2 FPR 3 FPR 4 FPR 5 FPR 6 FPR 7 15 Status 0 Top of FP stack, FP condition codes Figure J.37 The 80x86 has evolved over time, and so has its register set. The original set is shown in black, and the extended set in gray. The 8086 divided the first four registers in half so that they could be used either as one 16-bit GIF-3000 (U. Laval) Jeux d’instructions register or as two 8-bit registers. Starting with the 80386, the top eight registers were extended to 32 bits and could C. Gagné 41 / 45 Modes d’adressage Modes d’adressages variés I I I Absolu, registre indirect, base, indexé, base indexé + déplacement, etc. Déplacements de 8, 16, 32 bits Contraintes sur registres utilisables selon le mode Adressage little endian GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 42 / 45 Instructions x86 J.3 The Intel 80x86 ■ J-53 Instruction Meaning Control Conditional and unconditional branches JNZ, JZ Jump if condition to IP + 8-bit offset; JNE (for JNZ), JE (for JZ) are alternative names JMP, JMPF Unconditional jump—8- or 16-bit offset intrasegment (near), and intersegment (far) versions CALL, CALLF Subroutine call—16-bit offset; return address pushed; near and far versions RET, RETF Pops return address from stack and jumps to it; near and far versions LOOP Loop branch—decrement CX; jump to IP + 8-bit displacement if CX ≠ 0 Data transfer Move data between registers or between register and memory MOV Move between two registers or between register and memory PUSH Push source operand on stack POP Pop operand from stack top to a register LES Load ES and one of the GPRs from memory Arithmetic/logical Arithmetic and logical operations using the data registers and memory ADD Add source to destination; register-memory format SUB Subtract source from destination; register-memory format CMP Compare source and destination; register-memory format SHL Shift left SHR Shift logical right RCR Rotate right with carry as fill CBW Convert byte in AL to word in AX TEST Logical AND of source and destination sets flags INC Increment destination; register-memory format DEC Decrement destination; register-memory format OR Logical OR; register-memory format XOR Exclusive OR; register-memory format String instructions Move between string operands; length given by a repeat prefix MOVS Copies from string source to destination; may be repeated LODS Loads a byte or word of a string into the A register Figure J.41 Some typical operations on the 80x86. Many operations use register-memory format, where either the source or the destination may be memory and the other may be a register or immediate operand. GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 43 / 45 Format des instructions J.3 struction Set Architectures Repeat 4 8 Condition Displacement ■ J-57 a. JE PC + displacement Lock Seg. override 4 JE The Intel 80x86 8 16 16 CALLF Offset Segment number Prefixes Addr. override b. CALLF Size override 6 2 8 8 d/w r-m postbyte Displacement Opcode MOV Opcode Opcode ext. mod, reg, r/m sc, index, base c. MOV BX, [DI + 45] Address specifiers 3 Reg d. PUSH SI Disp8 Disp16 5 PUSH Displacement 4 3 1 16 ADD Reg w Constant Disp24 e. ADD AX, #6765 Disp32 Imm8 Imm16 Imm24 Immediate 6 2 8 SHL v/w r-r postbyte f. SHL BX, 1 7 1 8 8 TEST w Postbyte Immediate Imm32 g. TEST DX, #42 43 The instruction format of the 8086 (black type) and Figure the extensions for instruction formats. The encoding of the postbyte is shown J.44 Typical 8086 in Figure J.45. Many instructions contain the 1-bit field w, which whether the44 oper6 (shaded type). Every (U. fieldLaval) is optional except the opcode. GIF-3000 Jeux d’instructions C.says Gagné / 45 x86 en bref C’est compliqué ! I I N’est pas une architecture orthogonale Exceptions variées Complexitée due aux modifications incrémentales I I I Conserve compatibilité arrière Basé sur un processeur 16 bits Base logicielle importante Excellente performance grâce à la loi de Moore et améliorations architecturales I I De l’extérieur, architecture CISC (Complex Instruction Set Computer) À l’interne, émule architecture RISC depuis le P6 GIF-3000 (U. Laval) Jeux d’instructions C. Gagné 45 / 45