Jeux d`instructions

Commentaires

Transcription

Jeux d`instructions
Jeux d’instructions
Architecture des microprocesseurs – GIF-3000
Professeur : Christian Gagné
Semaine 3 : 13 septembre 2010
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
1 / 45
Partie I
Caractérisation des jeux d’instructions
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
2 / 45
Stockage interne et registres
Type de stockage interne
I
I
I
Architecture à pile (dessus de la pile)
Architecture à accumulateur (registre accumulateur)
Architecture à registres d’usage général (opérandes explicites)
Type de registres
I
I
I
Architecture registre-mémoire
Architecture load-store
Architecture mémoire-mémoire (désuet)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
3 / 45
...
B-4
B Instruction Set Principles and Examples
OpérandesAppendix
et architectures
ALU
(a) Stack
(b) Accumulator
Processor
Memory
ALU
ALU
...
...
(c) Register-memory
...
...
...
...
...
ALU
(d) Register-register/load-store
...
TOS
...
ALU
...
ALU
...
ALU
...
...
ALU
Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU
...operation, or both an...input and result. Lighter
...shades indicate inputs,
...and the
dark shade indicates theMemory
result. In (a), a Top Of Stack register (TOS), points to the top input operand, which is combined with the operand below. The first operand is removed from the stack, the result takes the place of the second
operand, and TOS is updated to point to the result. All operands are implicit. In (b), the Accumulator is both an
implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a
...
... to memory only...via separegister. All operands are registers in ...
(d) and, like the stack architecture,
can be transferred
rate instructions: push or pop for (a) and load or store for (d).
Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU operation, or both an input and result. Lighter shades indicate inputs, and the
dark shade indicates the result. In (a), a Top Of Stack Register
register (TOS), points to the top input operand, which is comStack
Accumulator
(register-memory)
Register
bined with the
operand below.
The first operand is removed
from the stack, the result
takes (load-store)
the place of the second
operand, andPush
TOS isA updatedLoad
to point
to
the
result.
All operands
(b), the
Accumulator is both an
A
Load
R1,A are implicit. InLoad
R1,A
implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a
Push B are registers
Add in
B (d) and, like the stack
Add architecture,
R3,R1,B can be transferred
Load R2,B
register. All operands
to memory only via sepaAdd push or pop Store
C load or storeStore
Add
R3,R1,R2
rate instructions:
for (a) and
for (d). R3,C
Pop C
Store R3,C
Figure B.2 The code sequence for C = A + B for four classes of instruction sets. Note
that the Add instruction has implicit
operands
for stack
and accumulator architectures,
Register
GIF-3000 (U. Laval)
Jeux
d’instructions
C. Gagné
4 / 45
Architectures modernes (après 1980)
Architectures modernes utilisent des registres d’usage général
Registres plus rapides que la mémoire
I
Réduit également le traffic vers la mémoire
Registres plus faciles à exploiter par le compilateur
I
P.ex. (A × B) − (B × C ) − (A × D)
Compilateur exploite plus facilement des registres généraux
I
Allocation plus difficile avec contraintes sur les registres
Nommage de registres plus compact que nommage d’adresses en
mémoire
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
5 / 45
Nombre et type d’opérandes
B-6
Appendix B Instruction Set Principles and Examples
Number of
memory
addresses
Maximum number
of operands
allowed
Type of architecture
Examples
0
3
Load-store
Alpha, ARM, MIPS, PowerPC, SPARC, SuperH,
TM32
1
2
Register-memory
IBM 360/370, Intel 80x86, Motorola 68000,
TI TMS320C54x
2
2
Memory-memory
VAX (also has three-operand formats)
3
3
Memory-memory
VAX (also has two-operand formats)
Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with
examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand.
Type
Register-register
(0, 3)
Advantages
Disadvantages
Simple, fixed-length instruction encoding.
Higher instruction count than architectures with
Simple code generation model. Instructions memory references in instructions. More instructions
GIF-3000 (U. Laval)
d’instructions
C.larger
Gagné
6 / 45
and lower instruction density leads to
take similar numbers of clocksJeux
to execute
2
2
Types d’architectures
3
3
Memory-memory
VAX (also has three-operand formats)
Memory-memory
VAX (also has two-operand formats)
Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with
examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand.
Type
Advantages
Disadvantages
Register-register
(0, 3)
Simple, fixed-length instruction encoding.
Simple code generation model. Instructions
take similar numbers of clocks to execute
(see App. A).
Higher instruction count than architectures with
memory references in instructions. More instructions
and lower instruction density leads to larger
programs.
Register-memory
(1, 2)
Data can be accessed without a separate load Operands are not equivalent since a source operand in
instruction first. Instruction format tends to be a binary operation is destroyed. Encoding a register
number and a memory address in each instruction
easy to encode and yields good density.
may restrict the number of registers. Clocks per
instruction vary by operand location.
Memory-memory
(2, 2) or (3, 3)
Most compact. Doesn’t waste registers for
temporaries.
Large variation in instruction size, especially for
three-operand instructions. In addition, large
variation in work per instruction. Memory accesses
create memory bottleneck. (Not used today.)
Figure B.4 Advantages and disadvantages of the three most common types of general-purpose register computers. The notation (m, n) means m memory operands and n total operands. In general, computers with fewer alternatives simplify the compiler’s task since there are fewer decisions for the compiler to make (see Section B.8).
Computers with a wide variety of flexible instruction formats reduce the number of bits required to encode the program. The number of registers also affects the instruction size since you need log2 (number of registers) for each register specifier in an instruction. Thus, doubling the number of registers takes 3 extra bits for a register-register
architecture, or about 10% of a 32-bit instruction.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
7 / 45
Conventions d’adressage
Conventions d’adressage
I
Exemple, accès aux mot 0Ah 0Bh 0Ch 0Dh
Convention Little Endian
I
I
Adressage : Mem[x]=0Dh ; Mem[x+1]=0Ch ;
Mem[x+2]=0Bh ; Mem[x+3]=0Ah
Windows, Linux et Mac OS sur x86 et x64
Convention Big Endian
I
I
Adressage : Mem[x]=0Ah ; Mem[x+1]=0Bh ;
Mem[x+2]=0Ch ; Mem[x+3]=0Dh
Solaris sur SPARC, Linux et Mac OS sur PowerPC
Pas de différence en performance entre les conventions
d’adressage
I
I
Mais créé des maux de tête pour l’échange de données
entre des ordinateurs utilisant des conventions
différentes
Convention Little Endian inverse l’ordre des caractères
d’une chaı̂ne (« EESREVNI »)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
8 / 45
Alignement
Donnée de plus d’un octet sont alignées
I
Matériel généralement aligné sur multiple de mots ou mots doubles
Appendix
B Instruction
Set Principles and Examples
Accès
à données
non-alignées
requiert plusieurs accès mémoires alignés
I B-8
Value of 3 low-order bits of byte address
Width of object
1 byte (byte)
2 bytes (half word)
2 bytes (half word)
4 bytes (word)
4 bytes (word)
4 bytes (word)
0
1
Aligned
2
Aligned
3
Aligned
Aligned
4
Aligned
Aligned
Misaligned
8 bytes (double word)
6
Aligned
7
Aligned
Aligned
Misaligned
Aligned
Aligned
Misaligned
Aligned
Misaligned
Aligned
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
4 bytes (word)
8 bytes (double word)
5
Aligned
Misaligned
Aligned
Misaligned
Misaligned
8 bytes (double word)
Misaligned
8 bytes (double word)
Misaligned
8 bytes (double word)
Misaligned
8 bytes (double word)
Misaligned
8 bytes (double word)
Misaligned
8 bytes (double word)
Figure B.5 Aligned and misaligned addresses of byte, half-word, word, and double-word objects for byteaddressed computers. For each misaligned example some objects require two memory accesses to complete. Every
aligned object can always complete in one memory access, as long as the memory is as wide as the object. The figure
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
9 / 45
Modes d’adressage
Mode d’adressage : comment spécifier des adresses mémoire dans les
instructions
I
I
I
I
I
Registre, immédiat : pas d’accès mémoire
Déplacement, registre indirect, indexé : adresse mémoire dans registres
Direct, indirect : adresse mémoire dans espace mémoire
Autoincrément, autodécrément : utile pour l’exécutions de boucles
Mis à l’échelle : utile pour traiter des tableaux de valeurs
B.3 Memory Addressing
B-11
■
Utilisation de modes d’adressage sophistiquées et/ou variés
I
I
I
Peut réduire le nombre
d’instructions d’un programme
Augmente la complexité du
matériel
Règle de conception : se
concentrer sur le cas usuel
GIF-3000 (U. Laval)
Jeux
Memory indirect
TeX
spice
gcc
Scaled
TeX
spice
gcc
Register indirect
TeX
spice
gcc
Immediate
TeX
spice
gcc
Displacement
TeX
spice
gcc
1%
6%
1%
0%
16%
6%
24%
3%
11%
43%
17%
39%
32%
55%
40%
0%
10%
20%
30%
40%
50%
60%
Frequency of the addressing mode
Figure B.7 Summary of use of memory addressing modes (including immediates).
These major addressing modes account for all but a few percent (0% to 3%) of the
memory accesses. Register modes, which are not counted, account for one-half of the
operand references, while memory addressing modes (including immediate) account
for the other half. Of course, the compiler affects whatC.
addressing
used;
d’instructions
Gagnémodes are10
/ see
45
of an object they will access. Addressing modes specify constants and registers in
addition to locations in memory. When a memory location is used, the actual
memory address specified by the addressing mode is called the effective address.
Figure B.6 shows all the data addressing modes that have been used in recent
computers. Immediates or literals are usually considered memory addressing
Modes d’adressage
Addressing mode
Example instruction
Meaning
When used
Register
Add R4,R3
Regs[R4] ← Regs[R4]
+ Regs[R3]
When a value is in a register.
Immediate
Add R4,#3
Regs[R4] ← Regs[R4] + 3
For constants.
Displacement
Add R4,100(R1)
Regs[R4] ← Regs[R4]
+ Mem[100+Regs[R1]]
Accessing local variables
(+ simulates register indirect,
direct addressing modes).
Register indirect
Add R4,(R1)
Regs[R4] ← Regs[R4]
+ Mem[Regs[R1]]
Accessing using a pointer or a
computed address.
Indexed
Add R3,(R1+R2)
Regs[R3] ← Regs[R3]
+ Mem[Regs[R1]+Regs[R2]]
Sometimes useful in array
addressing: R1 = base of array;
R2 = index amount.
Direct or
absolute
Add R1,(1001)
Regs[R1] ← Regs[R1]
+ Mem[1001]
Sometimes useful for accessing
static data; address constant may
need to be large.
Memory indirect
Add R1,@(R3)
Regs[R1] ← Regs[R1]
+ Mem[Mem[Regs[R3]]]
If R3 is the address of a pointer p,
then mode yields *p.
Autoincrement
Add R1,(R2)+
Regs[R1] ← Regs[R1]
+ Mem[Regs[R2]]
Regs[R2] ← Regs[R2] + d
Useful for stepping through arrays
within a loop. R2 points to start of
array; each reference increments
R2 by size of an element, d.
Autodecrement
Add R1,–(R2)
Regs[R2] ← Regs[R2] – d
Regs[R1] ← Regs[R1]
+ Mem[Regs[R2]]
Same use as autoincrement.
Autodecrement/-increment can
also act as push/pop to implement
a stack.
Scaled
Add R1,100(R2)[R3]
Regs[R1] ← Regs[R1]
+ Mem[100+Regs[R2]
+ Regs[R3]*d]
Used to index arrays. May be
applied to any indexed addressing
mode in some computers.
Figure B.6 Selection of addressing modes with examples, meaning, and usage. In autoincrement/-decrement
and scaled addressing modes, the variable d designates the size of the data item being accessed (i.e., whether the
instruction
is accessing 1, 2, 4, or 8 bytes). These
modes are only useful when the elementsC.being
GIF-3000
(U. Laval)
Jeux addressing
d’instructions
Gagné
11 / 45
Adressage
par
déplacement
B-12
Appendix
B Instruction
Set Principles and Examples
40%
35%
Integer average
30%
25%
Percentage of
displacement
20%
Floating-point average
15%
10%
5%
0%
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Number of bits of displacement
Figure B.8 Displacement values are widely distributed. There are both a large number of small values and a fair
number of large values. The wide distribution of displacement values is due to multiple storage areas for variables
Variété
de valeurs
de them
déplacements
abscisse)
and different
displacements
to access
(see Section B.8) (échelle
as well as thelog
overall
addressing
scheme the compiler
2 en
uses. The I
x-axis is log2 of the displacement; that is, the size of a field needed to represent the magnitude of the disSigne du déplacement absent ; majorité des grands déplacements sont
placement. Zero on the x-axis shows the percentage of displacements of value 0. The graph does not include the
négatifs
sign bit, which
is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest
displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements,
they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimizaGIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
12 / 45
sign bit, which is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest
displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements,
they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimization (see Section B.8) for SPEC CPU2000, showing the average of integer programs (CINT2000) and the average of
floating-point programs (CFP2000).
Opérandes immédiates
Floating-point average
Integer average
22%
23%
Loads
19%
ALU operations
25%
16%
All instructions
B.4
Type and Size of Operands
25%
30%
■
B-13
21%
0%
5%
10%
15%
20%
45% B.9 About one-quarter of data transfers and ALU operations have an immeFigure
diate operand. The bottom bars show that integer programs use immediates in about
40% of the instructions, while floating-point programs use immediates in about
one-fifth
one-sixth of the instructions. For loads, the load immediate instruction loads 16 bits
into
either half of a 32-bit register. Load immediates are not loads in a strict sense
35%
because they do not access memory.
Occasionally
a pair of load immediates is used to
Floating-point
average
load
a 32-bit constant, but this is rare. (For ALU operations, shifts by a constant amount
30%
are included as operations with immediate operands.) The programs and computer
used to collect these statistics are the same as in Figure B.8.
25%
Percentage of
immediates
20%
15%
Integer average
10%
5%
0%
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Number of bits needed for immediate
Figure B.10 The distribution of immediate values. The x-axis shows the number of bits needed to represent the
magnitude
immediate value—0 means theJeux
immediate
field value was 0. The majority of the immediate
values
GIF-3000 of
(U.anLaval)
d’instructions
C. Gagné
13 / 45
Types d’opérandes
Type des opérandes est généralement spécifié par l’opcode
I
I
Opérations sur entiers, nombres à points flottant, caractères, etc.
Taille données par le type
F
Caractères (8 bits), flottants à simple précision (32 bits), flottants à
double précision (64 bits),
B.5 etc.
Operations in the Instruction Set ■ B-15
Distribution des accès selon la taille (architecture 64 bits)
Double word
(64 bits)
59%
Word
(32 bits)
Half word
(16 bits)
Byte
(8 bits)
70%
29%
26%
0%
0%
Floating-point average
Integer average
5%
1%
10%
20%
40%
60%
80%
Figure B.11 Distribution of data accesses by size for the benchmark programs. The
double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address
computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all
double-word accesses in integer programs would become single-word accesses.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
14 / 45
double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address
computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all
double-word accesses in integer programs would become single-word accesses.
Types d’opérandes
Operator type
Examples
Arithmetic and logical Integer arithmetic and logical operations: add, subtract, and, or,
multiply, divide
Data transfer
Loads-stores (move instructions on computers with memory
addressing)
Control
Branch, jump, procedure call and return, traps
System
Operating system call, virtual memory management instructions
Floating point
Floating-point operations: add, multiply, divide, compare
Decimal
Decimal add, decimal multiply, decimal-to-character conversions
String
String move, string compare, string search
Graphics
Pixel and vertex operations, compression/decompression
operations
Figure B.12 Categories of instruction operators and examples of each. All computers generally provide a full set of operations for the first three categories. The support
for system functions in the instruction set varies widely among architectures, but all
computers must have some instruction support for basic system functions. The amount
of support
in the instruction set forJeux
the d’instructions
last four categories may vary from none
to an
GIF-3000
(U. Laval)
C. Gagné
15 / 45
6
Fréquence des instructions
Appendix B Instruction Set Principles and Examples
Dix instructions les plus fréquentes sur x86 (SPECint92)
Integer average
(% total executed)
Rank
80x86 instruction
1
load
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
move register-register
4%
9
call
1%
10
return
Total
22%
1%
96%
Figure B.13 The top 10 instructions for the 80x86. Simple instructions dominate this
list and are responsible for 96% of the instructions executed. These percentages are the
average of the five SPECint92 programs.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
16 / 45
Instructions modifiant flot de contrôle
Modifications du flot de contrôle
I
I
I
I
Branchements conditionnels (si-alors)
Sauts (jumps)
B.6 Instructions for Control Flow
Appels de procédures
Retour de procédures
Call/return
Jump
B-17
Floating-point average
8%
Integer average
19%
10%
6%
82%
75%
Conditional branch
0%
■
25%
50%
75%
100%
Frequency of branch instructions
Figure B.14 Breakdown of control flow instructions into three classes: calls or
returns, jumps, and conditional branches. Conditional branches clearly dominate.
Each type is counted in one of three bars. The programs and computer used to collect
these statistics are the same as those in Figure B.8.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
17 / 45
Adressage dans le flot de contrôle
Adressage explicite (plus courant)
I
I
I
Spécifie l’adresse mémoire relativement au compteur de programme
(registre PC)
Sauts généralement proches en mémoire
Adresse indépendante de la région en mémoire où le programme est
chargé
Adressage dynamique
I
I
Adresse inconnue à la compilation
Adresse spécifiée dans un registre
F
F
I
Pas de limite sur l’adresse
Requiert une instruction pour placer adresse dans un registre
Exemples
F
F
F
F
Retour de fonction (adresse de retour sur la pile d’appels)
Switch-cases
Pointeur de fonction, fonction virtuelle
Librairie dynamique
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
18 / 45
ing; at a minimum the return address must be saved somewhere, sometimes in a
special link register or just a GPR. Some older architectures provide a mechanism to save many registers, while newer architectures require the compiler to
generate stores and loads for each register saved and restored.
There are two basic conventions in use to save registers: either at the call site
or inside the procedure being called. Caller saving means that the calling procedure must save the registers that it wants preserved for access after the call, and
thus the called procedure need not worry about registers. Callee saving is the
opposite: the called procedure must save the registers it wants to use, leaving the
caller unrestrained.There are times when caller save must be used because of
Branchements conditionnels
Comment indiquer si un branchement a lieu ?
Name
Examples
How condition is tested
Condition
code (CC)
80x86, ARM,
PowerPC,
SPARC, SuperH
Tests special bits set by
Sometimes condition
ALU operations, possibly is set for free.
under program control.
Advantages
CC is extra state. Condition
codes constrain the ordering of
instructions since they pass
information from one instruction
to a branch.
Disadvantages
Condition
register
Alpha, MIPS
Tests arbitrary register
with the result of a
comparison.
Simple.
Uses up a register.
Compare
and branch
PA-RISC, VAX
Compare is part of the
branch. Often compare is
limited to subset.
One instruction rather May be too much work per
than two for a branch. instruction for pipelined
execution.
Figure B.16 The major methods for evaluating branch conditions, their advantages, and their disadvantages.
condition codes can be set by ALU operations that are needed for other purposes, measurements on proCode deAlthough
condition
(x86, ARM, PowerPC, etc.)
grams show that this rarely happens. The major implementation problems with condition codes arise when the con-
I
I
dition code is set by a large or haphazardly chosen subset of the instructions, rather than being controlled by a bit in
État
de l’ALU
après
une and
opération
:Z
(zéro),
Nand(négatif),
C (retenue),
the instruction.
Computers
with compare
branch often limit
the set
of compares
use a condition register
for
more complex compares. Often, different techniques are used for branches based on floating-point comparison verO (débordement)
sus those based on integer comparison. This dichotomy is reasonable since the number of branches that depend on
floating-point comparisons is much smaller than the number depending on integer comparisons.
Contraint
l’ordre d’exécution des instructions
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
19 / 45
Encodage des jeux d’instructions
Éléments d’encodage
I
I
I
Opcode : spécifie l’opérations
Opérandes ou mode d’adressage
Adresses, s’il y a lieu
Types d’encodages
I
I
I
Taille fixe : opération et mode d’adressage dans opcode
Taille variable : mode d’adressage indépendant
Hybride : réduire variabilité mais permet différentes longueurs
Exemple : instruction sur x86
add EAX,1000(EBX)
I
I
I
Opcode (1 octet) : addition de deux entiers de 32 bits (add)
Spécifieur d’adresse (1-2 octets) : registre source/destination (EAX),
mode d’adressage (déplacement) et registre de base du deuxième
opérande (EBX)
Adresse (1 ou 4 octets) : adresse du déplacement (1000)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
20 / 45
add EAX,1000(EBX)
Encodage
d’instructions
Operation and
Address
no. of operands specifier 1
Address
field 1
Address
specifier n
Address
field n
(a) Variable (e.g., Intel 80x86, VAX)
Operation
Address
field 1
Address
field 2
Address
field 3
(b) Fixed (e.g., Alpha, ARM, MIPS, PowerPC, SPARC, SuperH)
Operation
Address
specifier
Address
field
Operation
Address
specifier 1
Address
specifier 2
Address
field
Operation
Address
specifier
Address
field 1
Address
field 2
(c) Hybrid (e.g., IBM 360/370, MIPS16, Thumb, TI TMS320C54x)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
21 / 45
Taille fixe vs taille variable
Taille fixe
I
I
Réduction de complexité de la logique pour traiter les instructions
Programme plus gros qu’avec taille variable
F
Taille de programme est un élément crucial en informatique embarquée
Nombre de registres disponibles
I
I
I
Plus de registres est mieux pour la compilation
Facilite la parallélisation pour le pipelining
Impact direct sur la taille des instructions (nb. bits d’encodage)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
22 / 45
Partie II
Rôle des compilateurs
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
23 / 45
Compilation
Approche moderne pour la programmation d’applications
I
I
Programmée dans un langage de haut-niveau (p.ex. C, C++)
Instructions exécutées sont produites par un compilateur
Compilateur est une technologie clée en architecture
I
Choix architecturaux influencent la capacité du compilateur à exploiter
le matériel
Phases de la compilation
I
I
I
I
Transformer code dans une forme intermédiare
Optimisation de haut-niveau (transformation des boucles, intégration
de procédures)
Optimisation globale (allocation registres, raffinements)
Génération du code
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
24 / 45
B.8
Crosscutting Issues: The Role of Compilers
Phases de compilation
Dependencies
Language dependent;
machine independent
Front end per
language
■
B-25
Function
Transform language to
common intermediate form
Intermediate
representation
Somewhat language dependent;
largely machine independent
Small language dependencies;
machine dependencies slight
(e.g., register counts/types)
Highly machine dependent;
language independent
High-level
optimizations
Global
optimizer
Code generator
For example, loop
transformations and
procedure inlining
(also called
procedure integration)
Including global and local
optimizations + register
allocation
Detailed instruction selection
and machine-dependent
optimizations; may include
or be followed by assembler
Figure B.19 Compilers typically consist of two to four passes, with more highly optiGIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
25 / 45
Optimisation
de la compilation
B-28
Appendix B Instruction Set Principles and Examples
Optimization name
Explanation
High-level
At or near the source level; processorindependent
Percentage of the total number of
optimizing transforms
Procedure integration
Replace procedure call by procedure body
Local
Within straight-line code
N.M.
Common subexpression
elimination
Replace two instances of the same
computation by single copy
18%
Constant propagation
Replace all instances of a variable that
is assigned a constant with the constant
22%
Stack height reduction
Rearrange expression tree to minimize
resources needed for expression evaluation
N.M.
Global
Across a branch
Global common subexpression
elimination
Same as local, but this version crosses
branches
13%
Copy propagation
Replace all instances of a variable A that has
been assigned X (i.e., A = X) with X
11%
Code motion
Remove code from a loop that computes
same value each iteration of the loop
16%
Induction variable elimination
Simplify/eliminate array addressing
calculations within loops
2%
Processor-dependent
Depends on processor knowledge
Strength reduction
Many examples, such as replace multiply by
a constant with adds and shifts
N.M.
Pipeline scheduling
Reorder instructions to improve pipeline
performance
N.M.
Branch offset optimization
Choose the shortest branch displacement that
reaches target
N.M.
Figure B.20 Major types of optimizations and examples in each class. These data tell us about the relative frequency of occurrence of various optimizations. The third column lists the static frequency with which some of the
optimizations are applied in a set ofJeux
12 small
FORTRAN and Pascal programs. There are nine local and
GIF-3000common
(U. Laval)
d’instructions
C.gloGagné
26 / 45
Allocation de variables
Types d’allocations de variables
I
Variables locales
F
F
F
I
Variables globales
F
F
F
I
Allouées sur la pile
Empilées et dépilées selon appels et retours de procédures
Peuvent être allouées efficacement aux registres
Accessibles partout dans le programme
Généralement des tableaux ou structures de données
Plus difficile d’allouer aux registres
Objets dynamiques
F
F
F
Alloués sur le tas
Accédés par des pointeurs
Généralement impossibles à allouer à des registres
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
27 / 45
Simplifier la vie du compilateur
Principe général : rendre le cas fréquent rapide, et le cas plus rare
correct
Régularité
I
Orthogonalité dans les choix
F
F
F
I
I
Opérations
Types de données
Mode d’adressage
Exemple : opérations avec accès mémoire devrait avoir accès à tous les
modes d’adressage permis
Contre-exemple : limiter les registres accessibles pour une instruction
Fournir des primitives, pas des solutions
I
Fournir des instructions simples et générales
Simplifier les choix
I
Quand doit-on assigner une variable à un registre ?
Permettre aux constante d’être traitées comme tel
I
Éviter d’évaluer à l’exécution des valeurs connues à la compilation
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
28 / 45
Partie III
Exemple 1 : architecture MIPS
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
29 / 45
Architecture MIPS64
Architecture MIPS
I
I
I
Architecture registre-registre (load-store)
Processeur RISC
Développée dans le milieu des années 1980
F
I
I
Travaux de Hennessy sur les pipelines, début 1980
Implémente les principales idées d’architectures présentées dans le cours
Technologies des processeurs de la PS2 et PSP, maintenant
essentiellement en informatique embarquée
Registres
I
I
32 registres 64 bits d’usage général (R0, R1, . . ., R31)
32 registres 64 bits à point flottant (F0, F1, . . ., F31)
Modes d’adressage
I
I
Immédiat : add R4, R1, #8 → R4 = R1 + 8
Déplacement (16 bits)
F
F
F
LD R4, 30(R2) → R4 <- Mem[30+R2]
Registre indirect : utiliser déplacement de 0
Registre absolu : utiliser valeur 0 dans registre (registre R0)
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
30 / 45
Architecture MIPS64
Accès mémoire
I
I
I
I
Uniquement par load and store
Adresses de 64 bits, avec accès aux octets
Supporte little endian et big endian
Accès alignés seulement
Format des instructions
I
I
Mode d’adressage dans opcode
Instructions de 32 bits, opcode de 6 bits
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
31 / 45
B.9
Putting It All Together: The MIPS Architecture
Format instructions MIPS
I-type instruction
6
Opcode
5
rs
5
■
B-35
16
rt
Immediate
Encodes: Loads and stores of bytes, half words, words,
‹
double words. All immediates
(rt rs op immediate)
Conditional branch instructions (rs is register, rd unused)
Jump register, jump and link register
(rd = 0, rs = destination, immediate = 0)
R-type instruction
6
Opcode
5
rs
5
rt
5
5
rd
shamt
6
funct
Register-register ALU operations: rd rs funct rt
Function encodes the data path operation: Add, Sub, . . .
Read/write special registers and moves
J-type instruction
6
Opcode
26
Offset added to PC
Jump and jump and link
Trap and return from exception
GIF-3000
(U. Laval)
Figure B.22
Instruction
d’instructions
C. of
Gagné
layout for Jeux
MIPS.
All instructions are encoded in one
three
32 / 45
■
A superscript is used to replicate a field (e.g., 048 yields a field of zeros of
length 48 bits).
Instructions Load et Store
■
The symbol ## is used to concatenate two fields and may appear on either
side of a data transfer.
Example instruction
Instruction name
Meaning
LD
Load double word
Regs[R1]←64 Mem[30+Regs[R2]]
LD R1,1000(R0)
Load double word
Regs[R1]←64 Mem[1000+0]
LW R1,60(R2)
Load word
Regs[R1]←64 (Mem[60+Regs[R2]]0)32 ## Mem[60+Regs[R2]]
LB R1,40(R3)
Load byte
Regs[R1]←64 (Mem[40+Regs[R3]]0)56 ##
Mem[40+Regs[R3]]
LBU R1,40(R3)
Load byte unsigned
Regs[R1]←64 056 ## Mem[40+Regs[R3]]
LH R1,40(R3)
Load half word
Regs[R1]←64 (Mem[40+Regs[R3]]0)48 ##
Mem[40+Regs[R3]] ## Mem[41+Regs[R3]]
L.S F0,50(R3)
Load FP single
Regs[F0]←64 Mem[50+Regs[R3]] ## 032
L.D F0,50(R2)
Load FP double
Regs[F0]←64 Mem[50+Regs[R2]]
SD R3,500(R4)
Store double word
Mem[500+Regs[R4]]←64 Regs[R3]
SW R3,500(R4)
Store word
Mem[500+Regs[R4]]←32 Regs[R3]32..63
S.S F0,40(R3)
Store FP single
Mem[40+Regs[R3]]←32 Regs[F0]0..31
S.D F0,40(R3)
Store FP double
Mem[40+Regs[R3]]←64 Regs[F0]
SH R3,502(R2)
Store half
Mem[502+Regs[R2]]←16 Regs[R3]48..63
SB
Store byte
Mem[41+Regs[R3]]←8 Regs[R2]56..63
R1,30(R2)
R2,41(R3)
Figure B.23 The load and store instructions in MIPS. All use a single addressing mode and require that the memory value be aligned. Of course, both loads and stores are available for all the data types shown.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
33 / 45
Architecture MIPS64
Comparaison
I
I
Comparer deux registres, ou un registre et une constante
Résultat dans registre destination
Points flottants
I
Opère sur nombres à 32 bits et 64 bits
Contrôle
I
I
Sauts
Branchements conditionnels
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
34 / 45
Instructions arithmétiques/logiques
B.9
Putting It All Together: The MIPS Architecture
■
B-37
Example instruction
Instruction name
Meaning
DADDU R1,R2,R3
Add unsigned
Regs[R1]←Regs[R2]+Regs[R3]
DADDIU R1,R2,#3
Add immediate unsigned
Regs[R1]←Regs[R2]+3
LUI
R1,#42
Load upper immediate
Regs[R1]←032##42##016
DSLL
R1,R2,#5
Shift left logical
Regs[R1]←Regs[R2]<<5
Set less than
if (Regs[R2]<Regs[R3])
Regs[R1]←1 else Regs[R1]←0
SLT
R1,R2,R3
Figure B.24 Examples of arithmetic/logical instructions on MIPS, both with and
without immediates.
As an example, assuming that R8 and R10 are 64-bit registers:
24
## Mem[Regs[R8]]C. Gagné
Regs[R10]
GIF-3000 (U. Laval)
Jeux d’instructions
32..63 ← 32(Mem[Regs[R8]]
0)
35 / 45
Instructions de contrôle
Instruction Set Principles and Examples
Example
instruction
Instruction name
Meaning
J
name
Jump
PC36..63←name
JAL
name
Jump and link
Regs[R31]←PC+8; PC36..63←name;
((PC+4)–227) ≤ name < ((PC+4)+227)
JALR R2
Jump and link register Regs[R31]←PC+8; PC←Regs[R2]
JR
Jump register
PC←Regs[R3]
Branch equal zero
if (Regs[R4]==0) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217)
R3
BEQZ R4,name
BNE
R3,R4,name Branch not equal zero
MOVZ R1,R2,R3
Conditional move
if zero
if (Regs[R3]!= Regs[R4]) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217)
if (Regs[R3]==0) Regs[R1]←Regs[R2]
Figure B.25 Typical control flow instructions in MIPS. All control instructions, except
jumps to an address in a register, are PC-relative. Note that the branch distances are
longer
than the
address field would suggest;
since MIPS instructions are all
bits long,
GIF-3000
(U. Laval)
Jeux d’instructions
C. 32
Gagné
36 / 45
Profil d’exécution MIPS, SPECint2000
B.10
Fallacies and Pitfalls
■
B-41
gap
gcc
gzip
mcf
perlbmk
Integer
average
load
26.5%
25.1%
20.1%
30.3%
28.7%
26%
store
10.3%
13.2%
5.1%
4.3%
16.2%
10%
add
21.1%
19.0%
26.9%
10.1%
16.7%
19%
sub
1.7%
2.2%
5.1%
3.7%
2.5%
3%
mul
1.4%
0.1%
compare
2.8%
6.1%
6.6%
6.3%
3.8%
load imm
4.8%
2.5%
1.5%
0.1%
1.7%
2%
cond branch
9.3%
12.1%
11.0%
17.5%
10.9%
12%
cond move
0.4%
0.6%
1.1%
0.1%
1.9%
1%
jump
0.8%
0.7%
0.8%
0.7%
1.7%
1%
call
1.6%
0.6%
0.4%
3.2%
1.1%
1%
return
1.6%
0.6%
0.4%
3.2%
1.1%
1%
shift
3.8%
1.1%
2.1%
1.1%
0.5%
2%
and
4.3%
4.6%
9.4%
0.2%
1.2%
4%
or
7.9%
8.5%
4.8%
17.6%
8.7%
9%
xor
1.8%
2.1%
4.4%
1.5%
2.8%
3%
other logical
0.1%
0.4%
0.1%
0.1%
0.3%
Instruction
0%
5%
0%
load FP
0%
store FP
0%
add FP
0%
sub FP
0%
mul FP
0%
div FP
0%
mov reg-reg FP
0%
compare FP
0%
cond mov FP
0%
other FP
0%
Figure B.27 MIPS dynamic instruction mix for five SPECint2000 programs. Note that integer register-register
move instructions are included in the or instruction. Blank entries have the value 0.0%.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
37 / 45
Partie IV
Exemple 2 : architecture x86
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
38 / 45
Architecture x86
« The x86 isn’t all that complex – it just doesn’t make a lot of
sense. »
Mike Johnson, AMD, 1994.
« ... its checkered ancestry has led to an architecture that is difficult
to explain and impossible to love. »
Hennessy et Patterson, Computer Architecture, p. J-46.
Est malgré tout l’architecture x86 est la plus vendue dans le monde,
soit approximativement 500 millions !
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
39 / 45
Historique x86
1989 : 80486
1978 : 8086
I
I
I
Extension du 8080 (8 bits,
accumulateur)
16 bits, registres
Entre accumulateur et
registres d’usage général
1980 : 8087
I
I
Co-processeur à point
flottant, 60 nouvelles
instructions
Entre architecture à pile et
registres d’usage général
1992 : Pentium (80586)
1996 : P6 (Pentium Pro)
I
1997 : Pentium II
I
I
I
I
Processeur 32 bits
GIF-3000 (U. Laval)
Nouvelle architecture
SSE2 : 144 instructions
2003 : x64 (AMD64)
I
1985 : 80386
I
SSE : 70 instructions
2001 : Pentium 4
I
Espace d’adresse à 24 bits
Compatibilité avec 8086
MMX : 57 instructions
1999 : Pentium III
1982 : 80286
I
Augment. de performance, 4
nouvelles instructions
I
Jeux d’instructions
Passage à 16 registres
généraux 64 bits
Mode compatibilité 32 bits
C. Gagné
40 / 45
Registres x86
J-48
■
Appendix J Survey of Instruction Set Architectures
80 x386, 80x486, Pentium
80 x86, 80 x286
31
15
8 7
0
GPR 0
EAX
AX
AH
AL
Accumulator
GPR 1
ECX
CX
CH
CL
Count reg: string, loop
GPR 2
EDX
DX
DH
DL
Data reg: multiply, divide
GPR 3
EBX
BX
BH
BL
GPR 4
ESP
SP
Stack ptr.
GPR 5
EBP
BP
Base ptr. (for base of stack seg.)
GPR 6
ESI
SI
GPR 7
EDI
DI
PC
Index reg, string source ptr.
Index reg, string dest. ptr.
CS
Code segment ptr.
SS
Stack segment ptr. (top of stack)
DS
Data segment ptr.
ES
Extra data segment ptr.
FS
Data segment ptr. 2
GS
Data segment ptr. 3
IP
Instruction ptr. (PC)
EIP
EFLAGS
Base addr. reg
FLAGS
Condition codes
79
0
FPR 0
FPR 1
FPR 2
FPR 3
FPR 4
FPR 5
FPR 6
FPR 7
15
Status
0
Top of FP stack,
FP condition codes
Figure J.37 The 80x86 has evolved over time, and so has its register set. The original set is shown in black, and the
extended set in gray. The 8086 divided the first four registers in half so that they could be used either as one 16-bit
GIF-3000 (U. Laval)
Jeux
d’instructions
register or as two 8-bit registers. Starting with
the 80386,
the top eight registers were extended to 32 bits and could
C. Gagné
41 / 45
Modes d’adressage
Modes d’adressages variés
I
I
I
Absolu, registre indirect, base, indexé, base indexé + déplacement, etc.
Déplacements de 8, 16, 32 bits
Contraintes sur registres utilisables selon le mode
Adressage little endian
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
42 / 45
Instructions x86
J.3 The Intel 80x86
■
J-53
Instruction
Meaning
Control
Conditional and unconditional branches
JNZ, JZ
Jump if condition to IP + 8-bit offset; JNE (for JNZ), JE (for JZ) are alternative names
JMP, JMPF
Unconditional jump—8- or 16-bit offset intrasegment (near), and intersegment (far) versions
CALL, CALLF
Subroutine call—16-bit offset; return address pushed; near and far versions
RET, RETF
Pops return address from stack and jumps to it; near and far versions
LOOP
Loop branch—decrement CX; jump to IP + 8-bit displacement if CX ≠ 0
Data transfer
Move data between registers or between register and memory
MOV
Move between two registers or between register and memory
PUSH
Push source operand on stack
POP
Pop operand from stack top to a register
LES
Load ES and one of the GPRs from memory
Arithmetic/logical
Arithmetic and logical operations using the data registers and memory
ADD
Add source to destination; register-memory format
SUB
Subtract source from destination; register-memory format
CMP
Compare source and destination; register-memory format
SHL
Shift left
SHR
Shift logical right
RCR
Rotate right with carry as fill
CBW
Convert byte in AL to word in AX
TEST
Logical AND of source and destination sets flags
INC
Increment destination; register-memory format
DEC
Decrement destination; register-memory format
OR
Logical OR; register-memory format
XOR
Exclusive OR; register-memory format
String instructions
Move between string operands; length given by a repeat prefix
MOVS
Copies from string source to destination; may be repeated
LODS
Loads a byte or word of a string into the A register
Figure J.41 Some typical operations on the 80x86. Many operations use register-memory format, where either the
source or the destination may be memory and the other may be a register or immediate operand.
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
43 / 45
Format des instructions
J.3
struction Set Architectures
Repeat
4
8
Condition
Displacement
■
J-57
a. JE PC + displacement
Lock
Seg. override
4
JE
The Intel 80x86
8
16
16
CALLF
Offset
Segment number
Prefixes
Addr. override
b. CALLF
Size override
6
2
8
8
d/w
r-m
postbyte
Displacement
Opcode
MOV
Opcode
Opcode ext.
mod, reg, r/m
sc, index, base
c. MOV BX, [DI + 45]
Address
specifiers
3
Reg
d. PUSH SI
Disp8
Disp16
5
PUSH
Displacement
4
3
1
16
ADD
Reg
w
Constant
Disp24
e. ADD AX, #6765
Disp32
Imm8
Imm16
Imm24
Immediate
6
2
8
SHL
v/w
r-r
postbyte
f. SHL BX, 1
7
1
8
8
TEST
w
Postbyte
Immediate
Imm32
g. TEST DX, #42
43 The instruction format of the 8086 (black type) and Figure
the extensions
for instruction formats. The encoding of the postbyte is shown
J.44 Typical 8086
in Figure
J.45. Many instructions contain the 1-bit field w, which
whether the44
oper6 (shaded type).
Every (U.
fieldLaval)
is optional except the opcode.
GIF-3000
Jeux
d’instructions
C.says
Gagné
/ 45
x86 en bref
C’est compliqué !
I
I
N’est pas une architecture orthogonale
Exceptions variées
Complexitée due aux modifications incrémentales
I
I
I
Conserve compatibilité arrière
Basé sur un processeur 16 bits
Base logicielle importante
Excellente performance grâce à la loi de Moore et améliorations
architecturales
I
I
De l’extérieur, architecture CISC (Complex Instruction Set Computer)
À l’interne, émule architecture RISC depuis le P6
GIF-3000 (U. Laval)
Jeux d’instructions
C. Gagné
45 / 45