Jeux d`instructions

Transcription

Jeux d`instructions

Jeux d’instructions
Architecture des microprocesseurs – GIF-3000
Professeur : Christian Gagné
Semaine 3 : 13 septembre 2010
GIF-3000 (U. Laval)
C. Gagné
1 / 45
Partie I
Caractérisation des jeux d’instructions
GIF-3000 (U. Laval)
C. Gagné
2 / 45
Stockage interne et registres
Type de stockage interne
I
I
I
Architecture à pile (dessus de la pile)
Architecture à accumulateur (registre accumulateur)
Architecture à registres d’usage général (opérandes explicites)
Type de registres
I
I
I
Architecture registre-mémoire
Architecture load-store
Architecture mémoire-mémoire (désuet)
GIF-3000 (U. Laval)
C. Gagné
3 / 45
...
B-4
B Instruction Set Principles and Examples
OpérandesAppendix
et architectures
ALU
(a) Stack
(b) Accumulator
Processor
Memory
ALU
ALU
...
...
(c) Register-memory
...
...
...
...
...
ALU
(d) Register-register/load-store
...
TOS
...
ALU
...
ALU
...
ALU
...
...
ALU
Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU
...operation, or both an...input and result. Lighter
...shades indicate inputs,
...and the
dark shade indicates theMemory
result. In (a), a Top Of Stack register (TOS), points to the top input operand, which is combined with the operand below. The first operand is removed from the stack, the result takes the place of the second
operand, and TOS is updated to point to the result. All operands are implicit. In (b), the Accumulator is both an
implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a
...
... to memory only...via separegister. All operands are registers in ...
(d) and, like the stack architecture,
can be transferred
rate instructions: push or pop for (a) and load or store for (d).
Figure B.1 Operand locations for four instruction set architecture classes. The arrows indicate whether the operand is an input or the result of the ALU operation, or both an input and result. Lighter shades indicate inputs, and the
dark shade indicates the result. In (a), a Top Of Stack Register
register (TOS), points to the top input operand, which is comStack
Accumulator
(register-memory)
Register
bined with the
operand below.
The first operand is removed
from the stack, the result
takes (load-store)
the place of the second
operand, andPush
TOS isA updatedLoad
to point
to
the
result.
All operands
(b), the
Accumulator is both an
A
Load
R1,A are implicit. InLoad
R1,A
implicit input operand and a result. In (c), one input operand is a register, one is in memory, and the result goes to a
Push B are registers
Add in
B (d) and, like the stack
Add architecture,
R3,R1,B can be transferred
Load R2,B
register. All operands
to memory only via sepaAdd push or pop Store
C load or storeStore
Add
R3,R1,R2
rate instructions:
for (a) and
for (d). R3,C
Pop C
Store R3,C
Figure B.2 The code sequence for C = A + B for four classes of instruction sets. Note
that the Add instruction has implicit
operands
for stack
and accumulator architectures,
Register
GIF-3000 (U. Laval)
Jeux
d’instructions
C. Gagné
4 / 45
Architectures modernes (après 1980)
Architectures modernes utilisent des registres d’usage général
Registres plus rapides que la mémoire
I
Réduit également le traffic vers la mémoire
Registres plus faciles à exploiter par le compilateur
I
P.ex. (A × B) − (B × C ) − (A × D)
Compilateur exploite plus facilement des registres généraux
I
Allocation plus difficile avec contraintes sur les registres
Nommage de registres plus compact que nommage d’adresses en
mémoire
GIF-3000 (U. Laval)
C. Gagné
5 / 45
Nombre et type d’opérandes
B-6
Appendix B Instruction Set Principles and Examples
Number of
memory
addresses
Maximum number
of operands
allowed
Type of architecture
Examples
0
3
Load-store
Alpha, ARM, MIPS, PowerPC, SPARC, SuperH,
TM32
1
2
Register-memory
IBM 360/370, Intel 80x86, Motorola 68000,
TI TMS320C54x
2
2
Memory-memory
VAX (also has three-operand formats)
3
3
Memory-memory
VAX (also has two-operand formats)
Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with
examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand.
Type
Register-register
(0, 3)
Advantages
Disadvantages
Simple, fixed-length instruction encoding.
Higher instruction count than architectures with
Simple code generation model. Instructions memory references in instructions. More instructions
GIF-3000 (U. Laval)
d’instructions
C.larger
Gagné
6 / 45
and lower instruction density leads to
take similar numbers of clocksJeux
to execute
2
2
Types d’architectures
3
3
Memory-memory
VAX (also has three-operand formats)
Memory-memory
VAX (also has two-operand formats)
Figure B.3 Typical combinations of memory operands and total operands per typical ALU instruction with
examples of computers. Computers with no memory reference per ALU instruction are called load-store or registerregister computers. Instructions with multiple memory operands per typical ALU instruction are called registermemory or memory-memory, according to whether they have one or more than one memory operand.
Type
Advantages
Disadvantages
Register-register
(0, 3)
Simple, fixed-length instruction encoding.
Simple code generation model. Instructions
take similar numbers of clocks to execute
(see App. A).
Higher instruction count than architectures with
memory references in instructions. More instructions
and lower instruction density leads to larger
programs.
Register-memory
(1, 2)
Data can be accessed without a separate load Operands are not equivalent since a source operand in
instruction first. Instruction format tends to be a binary operation is destroyed. Encoding a register
number and a memory address in each instruction
easy to encode and yields good density.
may restrict the number of registers. Clocks per
instruction vary by operand location.
Memory-memory
(2, 2) or (3, 3)
Most compact. Doesn’t waste registers for
temporaries.
Large variation in instruction size, especially for
three-operand instructions. In addition, large
variation in work per instruction. Memory accesses
create memory bottleneck. (Not used today.)
Figure B.4 Advantages and disadvantages of the three most common types of general-purpose register computers. The notation (m, n) means m memory operands and n total operands. In general, computers with fewer alternatives simplify the compiler’s task since there are fewer decisions for the compiler to make (see Section B.8).
Computers with a wide variety of flexible instruction formats reduce the number of bits required to encode the program. The number of registers also affects the instruction size since you need log2 (number of registers) for each register specifier in an instruction. Thus, doubling the number of registers takes 3 extra bits for a register-register
architecture, or about 10% of a 32-bit instruction.
GIF-3000 (U. Laval)
C. Gagné
7 / 45
Conventions d’adressage
Conventions d’adressage
I
Exemple, accès aux mot 0Ah 0Bh 0Ch 0Dh
Convention Little Endian
I
I
Adressage : Mem[x]=0Dh ; Mem[x+1]=0Ch ;
Mem[x+2]=0Bh ; Mem[x+3]=0Ah
Windows, Linux et Mac OS sur x86 et x64
Convention Big Endian
I
I
Adressage : Mem[x]=0Ah ; Mem[x+1]=0Bh ;
Mem[x+2]=0Ch ; Mem[x+3]=0Dh
Solaris sur SPARC, Linux et Mac OS sur PowerPC
Pas de différence en performance entre les conventions
d’adressage
I
I
Mais créé des maux de tête pour l’échange de données
entre des ordinateurs utilisant des conventions
différentes
Convention Little Endian inverse l’ordre des caractères
d’une chaı̂ne (« EESREVNI »)
GIF-3000 (U. Laval)
C. Gagné
8 / 45
Alignement
Donnée de plus d’un octet sont alignées
I
Matériel généralement aligné sur multiple de mots ou mots doubles
Appendix
B Instruction
Set Principles and Examples
Accès
à données
non-alignées
requiert plusieurs accès mémoires alignés
I B-8
Value of 3 low-order bits of byte address
Width of object
1 byte (byte)
2 bytes (half word)
2 bytes (half word)
4 bytes (word)
4 bytes (word)
4 bytes (word)
0
1
Aligned
2
Aligned
3
Aligned
Aligned
4
Aligned
Aligned
Misaligned
8 bytes (double word)
6
Aligned
7
Aligned
Aligned
Misaligned
Aligned
Aligned
Misaligned
Aligned
Misaligned
Aligned
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
4 bytes (word)
5
Aligned
Misaligned
Aligned
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
Misaligned
Figure B.5 Aligned and misaligned addresses of byte, half-word, word, and double-word objects for byteaddressed computers. For each misaligned example some objects require two memory accesses to complete. Every
aligned object can always complete in one memory access, as long as the memory is as wide as the object. The figure
GIF-3000 (U. Laval)
C. Gagné
9 / 45
Modes d’adressage
Mode d’adressage : comment spécifier des adresses mémoire dans les
instructions
I
I
I
I
I
Registre, immédiat : pas d’accès mémoire
Déplacement, registre indirect, indexé : adresse mémoire dans registres
Direct, indirect : adresse mémoire dans espace mémoire
Autoincrément, autodécrément : utile pour l’exécutions de boucles
Mis à l’échelle : utile pour traiter des tableaux de valeurs
B.3 Memory Addressing
B-11
■
Utilisation de modes d’adressage sophistiquées et/ou variés
I
I
I
Peut réduire le nombre
d’instructions d’un programme
Augmente la complexité du
matériel
Règle de conception : se
concentrer sur le cas usuel
GIF-3000 (U. Laval)
Jeux
Memory indirect
TeX
spice
gcc
Scaled
TeX
spice
gcc
Register indirect
TeX
spice
gcc
Immediate
TeX
spice
gcc
Displacement
TeX
spice
gcc
1%
6%
1%
0%
16%
6%
24%
3%
11%
43%
17%
39%
32%
55%
40%
0%
10%
20%
30%
40%
50%
60%
Frequency of the addressing mode
Figure B.7 Summary of use of memory addressing modes (including immediates).
These major addressing modes account for all but a few percent (0% to 3%) of the
memory accesses. Register modes, which are not counted, account for one-half of the
operand references, while memory addressing modes (including immediate) account
for the other half. Of course, the compiler affects whatC.
addressing
used;
d’instructions
Gagnémodes are10
/ see
45
of an object they will access. Addressing modes specify constants and registers in
addition to locations in memory. When a memory location is used, the actual
memory address specified by the addressing mode is called the effective address.
Figure B.6 shows all the data addressing modes that have been used in recent
computers. Immediates or literals are usually considered memory addressing
Modes d’adressage
Addressing mode
Example instruction
Meaning
When used
Register
Add R4,R3
Regs[R4] ← Regs[R4]
+ Regs[R3]
When a value is in a register.
Immediate
Add R4,#3
Regs[R4] ← Regs[R4] + 3
For constants.
Displacement
Add R4,100(R1)
+ Mem[100+Regs[R1]]
Accessing local variables
(+ simulates register indirect,
direct addressing modes).
Register indirect
Add R4,(R1)
+ Mem[Regs[R1]]
Accessing using a pointer or a
computed address.
Indexed
Add R3,(R1+R2)
+ Mem[Regs[R1]+Regs[R2]]
Sometimes useful in array
addressing: R1 = base of array;
R2 = index amount.
Direct or
absolute
Add R1,(1001)
+ Mem[1001]
Sometimes useful for accessing
static data; address constant may
need to be large.
Memory indirect
Add R1,@(R3)
+ Mem[Mem[Regs[R3]]]
If R3 is the address of a pointer p,
then mode yields *p.
Autoincrement
Add R1,(R2)+
+ Mem[Regs[R2]]
Regs[R2] ← Regs[R2] + d
Useful for stepping through arrays
within a loop. R2 points to start of
array; each reference increments
R2 by size of an element, d.
Autodecrement
Add R1,–(R2)
Regs[R2] ← Regs[R2] – d
+ Mem[Regs[R2]]
Same use as autoincrement.
Autodecrement/-increment can
also act as push/pop to implement
a stack.
Scaled
Add R1,100(R2)[R3]
+ Mem[100+Regs[R2]
+ Regs[R3]*d]
Used to index arrays. May be
applied to any indexed addressing
mode in some computers.
Figure B.6 Selection of addressing modes with examples, meaning, and usage. In autoincrement/-decrement
and scaled addressing modes, the variable d designates the size of the data item being accessed (i.e., whether the
instruction
is accessing 1, 2, 4, or 8 bytes). These
modes are only useful when the elementsC.being
GIF-3000
(U. Laval)
Jeux addressing
d’instructions
Gagné
11 / 45
Adressage
par
déplacement
B-12
Appendix
B Instruction
Set Principles and Examples
40%
35%
Integer average
30%
25%
Percentage of
displacement
20%
Floating-point average
15%
10%
5%
0%
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Number of bits of displacement
Figure B.8 Displacement values are widely distributed. There are both a large number of small values and a fair
number of large values. The wide distribution of displacement values is due to multiple storage areas for variables
Variété
de valeurs
de them
déplacements
abscisse)
and different
displacements
to access
(see Section B.8) (échelle
as well as thelog
overall
addressing
scheme the compiler
2 en
uses. The I
x-axis is log2 of the displacement; that is, the size of a field needed to represent the magnitude of the disSigne du déplacement absent ; majorité des grands déplacements sont
placement. Zero on the x-axis shows the percentage of displacements of value 0. The graph does not include the
négatifs
sign bit, which
is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest
displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements,
they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimizaGIF-3000 (U. Laval)
C. Gagné
12 / 45
sign bit, which is heavily affected by the storage layout. Most displacements are positive, but a majority of the largest
displacements (14+ bits) are negative. Since these data were collected on a computer with 16-bit displacements,
they cannot tell us about longer displacements. These data were taken on the Alpha architecture with full optimization (see Section B.8) for SPEC CPU2000, showing the average of integer programs (CINT2000) and the average of
floating-point programs (CFP2000).
Opérandes immédiates
Integer average
22%
23%
Loads
19%
ALU operations
25%
16%
All instructions
B.4
Type and Size of Operands
25%
30%
■
B-13
21%
0%
5%
10%
15%
20%
45% B.9 About one-quarter of data transfers and ALU operations have an immeFigure
diate operand. The bottom bars show that integer programs use immediates in about
40% of the instructions, while floating-point programs use immediates in about
one-fifth
one-sixth of the instructions. For loads, the load immediate instruction loads 16 bits
into
either half of a 32-bit register. Load immediates are not loads in a strict sense
35%
because they do not access memory.
Occasionally
a pair of load immediates is used to
Floating-point
average
load
a 32-bit constant, but this is rare. (For ALU operations, shifts by a constant amount
30%
are included as operations with immediate operands.) The programs and computer
used to collect these statistics are the same as in Figure B.8.
25%
Percentage of
immediates
20%
15%
Integer average
10%
5%
0%
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Number of bits needed for immediate
Figure B.10 The distribution of immediate values. The x-axis shows the number of bits needed to represent the
magnitude
immediate value—0 means theJeux
immediate
field value was 0. The majority of the immediate
values
GIF-3000 of
(U.anLaval)
d’instructions
C. Gagné
13 / 45
Types d’opérandes
Type des opérandes est généralement spécifié par l’opcode
I
I
Opérations sur entiers, nombres à points flottant, caractères, etc.
Taille données par le type
F
Caractères (8 bits), flottants à simple précision (32 bits), flottants à
double précision (64 bits),
B.5 etc.
Operations in the Instruction Set ■ B-15
Distribution des accès selon la taille (architecture 64 bits)
Double word
(64 bits)
59%
Word
(32 bits)
Half word
(16 bits)
Byte
(8 bits)
70%
29%
26%
0%
0%
Integer average
5%
1%
10%
20%
40%
60%
80%
Figure B.11 Distribution of data accesses by size for the benchmark programs. The
double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address
computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all
double-word accesses in integer programs would become single-word accesses.
GIF-3000 (U. Laval)
C. Gagné
14 / 45
double-word data type is used for double-precision floating point in floating-point programs and for addresses, since the computer uses 64-bit addresses. On a 32-bit address
computer the 64-bit addresses would be replaced by 32-bit addresses, and so almost all
double-word accesses in integer programs would become single-word accesses.
Types d’opérandes
Operator type
Examples
Arithmetic and logical Integer arithmetic and logical operations: add, subtract, and, or,
multiply, divide
Data transfer
Loads-stores (move instructions on computers with memory
addressing)
Control
Branch, jump, procedure call and return, traps
System
Operating system call, virtual memory management instructions
Floating point
Floating-point operations: add, multiply, divide, compare
Decimal
Decimal add, decimal multiply, decimal-to-character conversions
String
String move, string compare, string search
Graphics
Pixel and vertex operations, compression/decompression
operations
Figure B.12 Categories of instruction operators and examples of each. All computers generally provide a full set of operations for the first three categories. The support
for system functions in the instruction set varies widely among architectures, but all
computers must have some instruction support for basic system functions. The amount
of support
in the instruction set forJeux
the d’instructions
last four categories may vary from none
to an
GIF-3000
(U. Laval)
C. Gagné
15 / 45
6
Fréquence des instructions
Dix instructions les plus fréquentes sur x86 (SPECint92)
Integer average
(% total executed)
Rank
80x86 instruction
1
load
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
move register-register
4%
9
call
1%
10
return
Total
22%
1%
96%
Figure B.13 The top 10 instructions for the 80x86. Simple instructions dominate this
list and are responsible for 96% of the instructions executed. These percentages are the
average of the five SPECint92 programs.
GIF-3000 (U. Laval)
C. Gagné
16 / 45
Instructions modifiant flot de contrôle
Modifications du flot de contrôle
I
I
I
I
Branchements conditionnels (si-alors)
Sauts (jumps)
B.6 Instructions for Control Flow
Appels de procédures
Retour de procédures
Call/return
Jump
B-17
8%
Integer average
19%
10%
6%
82%
75%
Conditional branch
0%
■
25%
50%
75%
100%
Frequency of branch instructions
Figure B.14 Breakdown of control flow instructions into three classes: calls or
returns, jumps, and conditional branches. Conditional branches clearly dominate.
Each type is counted in one of three bars. The programs and computer used to collect
these statistics are the same as those in Figure B.8.
GIF-3000 (U. Laval)
C. Gagné
17 / 45
Adressage dans le flot de contrôle
Adressage explicite (plus courant)
I
I
I
Spécifie l’adresse mémoire relativement au compteur de programme
(registre PC)
Sauts généralement proches en mémoire
Adresse indépendante de la région en mémoire où le programme est
chargé
Adressage dynamique
I
I
Adresse inconnue à la compilation
Adresse spécifiée dans un registre
F
F
I
Pas de limite sur l’adresse
Requiert une instruction pour placer adresse dans un registre
Exemples
F
F
F
F
Retour de fonction (adresse de retour sur la pile d’appels)
Switch-cases
Pointeur de fonction, fonction virtuelle
Librairie dynamique
GIF-3000 (U. Laval)
C. Gagné
18 / 45
ing; at a minimum the return address must be saved somewhere, sometimes in a
special link register or just a GPR. Some older architectures provide a mechanism to save many registers, while newer architectures require the compiler to
generate stores and loads for each register saved and restored.
There are two basic conventions in use to save registers: either at the call site
or inside the procedure being called. Caller saving means that the calling procedure must save the registers that it wants preserved for access after the call, and
thus the called procedure need not worry about registers. Callee saving is the
opposite: the called procedure must save the registers it wants to use, leaving the
caller unrestrained.There are times when caller save must be used because of
Branchements conditionnels
Comment indiquer si un branchement a lieu ?
Name
Examples
How condition is tested
Condition
code (CC)
80x86, ARM,
PowerPC,
SPARC, SuperH
Tests special bits set by
Sometimes condition
ALU operations, possibly is set for free.
under program control.
Advantages
CC is extra state. Condition
codes constrain the ordering of
instructions since they pass
information from one instruction
to a branch.
Disadvantages
Condition
register
Alpha, MIPS
Tests arbitrary register
with the result of a
comparison.
Simple.
Uses up a register.
Compare
and branch
PA-RISC, VAX
Compare is part of the
branch. Often compare is
limited to subset.
One instruction rather May be too much work per
than two for a branch. instruction for pipelined
execution.
Figure B.16 The major methods for evaluating branch conditions, their advantages, and their disadvantages.
condition codes can be set by ALU operations that are needed for other purposes, measurements on proCode deAlthough
condition
(x86, ARM, PowerPC, etc.)
grams show that this rarely happens. The major implementation problems with condition codes arise when the con-
I
I
dition code is set by a large or haphazardly chosen subset of the instructions, rather than being controlled by a bit in
État
de l’ALU
après
une and
opération
:Z
(zéro),
Nand(négatif),
C (retenue),
the instruction.
Computers
with compare
branch often limit
the set
of compares
use a condition register
for
more complex compares. Often, different techniques are used for branches based on floating-point comparison verO (débordement)
sus those based on integer comparison. This dichotomy is reasonable since the number of branches that depend on
floating-point comparisons is much smaller than the number depending on integer comparisons.
Contraint
l’ordre d’exécution des instructions
GIF-3000 (U. Laval)
C. Gagné
19 / 45
Encodage des jeux d’instructions
Éléments d’encodage
I
I
I
Opcode : spécifie l’opérations
Opérandes ou mode d’adressage
Adresses, s’il y a lieu
Types d’encodages
I
I
I
Taille fixe : opération et mode d’adressage dans opcode
Taille variable : mode d’adressage indépendant
Hybride : réduire variabilité mais permet différentes longueurs
Exemple : instruction sur x86
add EAX,1000(EBX)
I
I
I
Opcode (1 octet) : addition de deux entiers de 32 bits (add)
Spécifieur d’adresse (1-2 octets) : registre source/destination (EAX),
mode d’adressage (déplacement) et registre de base du deuxième
opérande (EBX)
Adresse (1 ou 4 octets) : adresse du déplacement (1000)
GIF-3000 (U. Laval)
C. Gagné
20 / 45
add EAX,1000(EBX)
Encodage
d’instructions
Operation and
Address
no. of operands specifier 1
Address
field 1
Address
specifier n
Address
field n
(a) Variable (e.g., Intel 80x86, VAX)
Operation
Address
field 1
Address
field 2
Address
field 3
(b) Fixed (e.g., Alpha, ARM, MIPS, PowerPC, SPARC, SuperH)
Operation
Address
specifier
Address
field
Operation
Address
specifier 1
Address
specifier 2
Address
field
Operation
Address
specifier
Address
field 1
Address
field 2
(c) Hybrid (e.g., IBM 360/370, MIPS16, Thumb, TI TMS320C54x)
GIF-3000 (U. Laval)
C. Gagné
21 / 45
Taille fixe vs taille variable
Taille fixe
I
I
Réduction de complexité de la logique pour traiter les instructions
Programme plus gros qu’avec taille variable
F
Taille de programme est un élément crucial en informatique embarquée
Nombre de registres disponibles
I
I
I
Plus de registres est mieux pour la compilation
Facilite la parallélisation pour le pipelining
Impact direct sur la taille des instructions (nb. bits d’encodage)
GIF-3000 (U. Laval)
C. Gagné
22 / 45
Partie II
Rôle des compilateurs
GIF-3000 (U. Laval)
C. Gagné
23 / 45
Compilation
Approche moderne pour la programmation d’applications
I
I
Programmée dans un langage de haut-niveau (p.ex. C, C++)
Instructions exécutées sont produites par un compilateur
Compilateur est une technologie clée en architecture
I
Choix architecturaux influencent la capacité du compilateur à exploiter
le matériel
Phases de la compilation
I
I
I
I
Transformer code dans une forme intermédiare
Optimisation de haut-niveau (transformation des boucles, intégration
de procédures)
Optimisation globale (allocation registres, raffinements)
Génération du code
GIF-3000 (U. Laval)
C. Gagné
24 / 45
B.8
Crosscutting Issues: The Role of Compilers
Phases de compilation
Dependencies
Language dependent;
machine independent
Front end per
language
■
B-25
Function
Transform language to
common intermediate form
Intermediate
representation
Somewhat language dependent;
largely machine independent
Small language dependencies;
machine dependencies slight
(e.g., register counts/types)
Highly machine dependent;
language independent
High-level
optimizations
Global
optimizer
Code generator
For example, loop
transformations and
procedure inlining
(also called
procedure integration)
Including global and local
optimizations + register
allocation
Detailed instruction selection
and machine-dependent
optimizations; may include
or be followed by assembler
Figure B.19 Compilers typically consist of two to four passes, with more highly optiGIF-3000 (U. Laval)
C. Gagné
25 / 45
Optimisation
de la compilation
B-28
Optimization name
Explanation
High-level
At or near the source level; processorindependent
Percentage of the total number of
optimizing transforms
Procedure integration
Replace procedure call by procedure body
Local
Within straight-line code
N.M.
Common subexpression
elimination
Replace two instances of the same
computation by single copy
18%
Constant propagation
Replace all instances of a variable that
is assigned a constant with the constant
22%
Stack height reduction
Rearrange expression tree to minimize
resources needed for expression evaluation
N.M.
Global
Across a branch
Global common subexpression
elimination
Same as local, but this version crosses
branches
13%
Copy propagation
Replace all instances of a variable A that has
been assigned X (i.e., A = X) with X
11%
Code motion
Remove code from a loop that computes
same value each iteration of the loop
16%
Induction variable elimination
Simplify/eliminate array addressing
calculations within loops
2%
Processor-dependent
Depends on processor knowledge
Strength reduction
Many examples, such as replace multiply by
a constant with adds and shifts
N.M.
Pipeline scheduling
Reorder instructions to improve pipeline
performance
N.M.
Branch offset optimization
Choose the shortest branch displacement that
reaches target
N.M.
Figure B.20 Major types of optimizations and examples in each class. These data tell us about the relative frequency of occurrence of various optimizations. The third column lists the static frequency with which some of the
optimizations are applied in a set ofJeux
12 small
FORTRAN and Pascal programs. There are nine local and
GIF-3000common
(U. Laval)
d’instructions
C.gloGagné
26 / 45
Allocation de variables
Types d’allocations de variables
I
Variables locales
F
F
F
I
Variables globales
F
F
F
I
Allouées sur la pile
Empilées et dépilées selon appels et retours de procédures
Peuvent être allouées efficacement aux registres
Accessibles partout dans le programme
Généralement des tableaux ou structures de données
Plus difficile d’allouer aux registres
Objets dynamiques
F
F
F
Alloués sur le tas
Accédés par des pointeurs
Généralement impossibles à allouer à des registres
GIF-3000 (U. Laval)
C. Gagné
27 / 45
Simplifier la vie du compilateur
Principe général : rendre le cas fréquent rapide, et le cas plus rare
correct
Régularité
I
Orthogonalité dans les choix
F
F
F
I
I
Opérations
Types de données
Mode d’adressage
Exemple : opérations avec accès mémoire devrait avoir accès à tous les
modes d’adressage permis
Contre-exemple : limiter les registres accessibles pour une instruction
Fournir des primitives, pas des solutions
I
Fournir des instructions simples et générales
Simplifier les choix
I
Quand doit-on assigner une variable à un registre ?
Permettre aux constante d’être traitées comme tel
I
Éviter d’évaluer à l’exécution des valeurs connues à la compilation
GIF-3000 (U. Laval)
C. Gagné
28 / 45
Partie III
Exemple 1 : architecture MIPS
GIF-3000 (U. Laval)
C. Gagné
29 / 45
Architecture MIPS64
Architecture MIPS
I
I
I
Architecture registre-registre (load-store)
Processeur RISC
Développée dans le milieu des années 1980
F
I
I
Travaux de Hennessy sur les pipelines, début 1980
Implémente les principales idées d’architectures présentées dans le cours
Technologies des processeurs de la PS2 et PSP, maintenant
essentiellement en informatique embarquée
Registres
I
I
32 registres 64 bits d’usage général (R0, R1, . . ., R31)
32 registres 64 bits à point flottant (F0, F1, . . ., F31)
Modes d’adressage
I
I
Immédiat : add R4, R1, #8 → R4 = R1 + 8
Déplacement (16 bits)
F
F
F
LD R4, 30(R2) → R4 <- Mem[30+R2]
Registre indirect : utiliser déplacement de 0
Registre absolu : utiliser valeur 0 dans registre (registre R0)
GIF-3000 (U. Laval)
C. Gagné
30 / 45
Architecture MIPS64
Accès mémoire
I
I
I
I
Uniquement par load and store
Adresses de 64 bits, avec accès aux octets
Supporte little endian et big endian
Accès alignés seulement
Format des instructions
I
I
Mode d’adressage dans opcode
Instructions de 32 bits, opcode de 6 bits
GIF-3000 (U. Laval)
C. Gagné
31 / 45
B.9
Putting It All Together: The MIPS Architecture
Format instructions MIPS
I-type instruction
6
Opcode
5
rs
5
■
B-35
16
rt
Immediate
Encodes: Loads and stores of bytes, half words, words,
‹
double words. All immediates
(rt rs op immediate)
Conditional branch instructions (rs is register, rd unused)
Jump register, jump and link register
(rd = 0, rs = destination, immediate = 0)
R-type instruction
6
Opcode
5
rs
5
rt
5
5
rd
shamt
6
funct
Register-register ALU operations: rd rs funct rt
Function encodes the data path operation: Add, Sub, . . .
Read/write special registers and moves
J-type instruction
6
Opcode
26
Offset added to PC
Jump and jump and link
Trap and return from exception
GIF-3000
(U. Laval)
Figure B.22
Instruction
d’instructions
C. of
Gagné
layout for Jeux
MIPS.
All instructions are encoded in one
three
32 / 45
■
A superscript is used to replicate a field (e.g., 048 yields a field of zeros of
length 48 bits).
Instructions Load et Store
■
The symbol ## is used to concatenate two fields and may appear on either
side of a data transfer.
Example instruction
Instruction name
Meaning
LD
Load double word
Regs[R1]←64 Mem[30+Regs[R2]]
LD R1,1000(R0)
Load double word
Regs[R1]←64 Mem[1000+0]
LW R1,60(R2)
Load word
Regs[R1]←64 (Mem[60+Regs[R2]]0)32 ## Mem[60+Regs[R2]]
LB R1,40(R3)
Load byte
Regs[R1]←64 (Mem[40+Regs[R3]]0)56 ##
Mem[40+Regs[R3]]
LBU R1,40(R3)
Load byte unsigned
Regs[R1]←64 056 ## Mem[40+Regs[R3]]
LH R1,40(R3)
Load half word
Regs[R1]←64 (Mem[40+Regs[R3]]0)48 ##
Mem[40+Regs[R3]] ## Mem[41+Regs[R3]]
L.S F0,50(R3)
Load FP single
Regs[F0]←64 Mem[50+Regs[R3]] ## 032
L.D F0,50(R2)
Load FP double
Regs[F0]←64 Mem[50+Regs[R2]]
SD R3,500(R4)
Store double word
Mem[500+Regs[R4]]←64 Regs[R3]
SW R3,500(R4)
Store word
Mem[500+Regs[R4]]←32 Regs[R3]32..63
S.S F0,40(R3)
Store FP single
Mem[40+Regs[R3]]←32 Regs[F0]0..31
S.D F0,40(R3)
Store FP double
Mem[40+Regs[R3]]←64 Regs[F0]
SH R3,502(R2)
Store half
Mem[502+Regs[R2]]←16 Regs[R3]48..63
SB
Store byte
Mem[41+Regs[R3]]←8 Regs[R2]56..63
R1,30(R2)
R2,41(R3)
Figure B.23 The load and store instructions in MIPS. All use a single addressing mode and require that the memory value be aligned. Of course, both loads and stores are available for all the data types shown.
GIF-3000 (U. Laval)
C. Gagné
33 / 45
Architecture MIPS64
Comparaison
I
I
Comparer deux registres, ou un registre et une constante
Résultat dans registre destination
Points flottants
I
Opère sur nombres à 32 bits et 64 bits
Contrôle
I
I
Sauts
Branchements conditionnels
GIF-3000 (U. Laval)
C. Gagné
34 / 45
Instructions arithmétiques/logiques
B.9
Putting It All Together: The MIPS Architecture
■
B-37
Example instruction
Instruction name
Meaning
DADDU R1,R2,R3
Add unsigned
Regs[R1]←Regs[R2]+Regs[R3]
DADDIU R1,R2,#3
Add immediate unsigned
Regs[R1]←Regs[R2]+3
LUI
R1,#42
Load upper immediate
Regs[R1]←032##42##016
DSLL
R1,R2,#5
Shift left logical
Regs[R1]←Regs[R2]<<5
Set less than
if (Regs[R2]<Regs[R3])
Regs[R1]←1 else Regs[R1]←0
SLT
R1,R2,R3
Figure B.24 Examples of arithmetic/logical instructions on MIPS, both with and
without immediates.
As an example, assuming that R8 and R10 are 64-bit registers:
24
## Mem[Regs[R8]]C. Gagné
Regs[R10]
GIF-3000 (U. Laval)
32..63 ← 32(Mem[Regs[R8]]
0)
35 / 45
Instructions de contrôle
Instruction Set Principles and Examples
Example
instruction
Instruction name
Meaning
J
name
Jump
PC36..63←name
JAL
name
Jump and link
Regs[R31]←PC+8; PC36..63←name;
((PC+4)–227) ≤ name < ((PC+4)+227)
JALR R2
Jump and link register Regs[R31]←PC+8; PC←Regs[R2]
JR
Jump register
PC←Regs[R3]
Branch equal zero
if (Regs[R4]==0) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217)
R3
BEQZ R4,name
BNE
R3,R4,name Branch not equal zero
MOVZ R1,R2,R3
Conditional move
if zero
if (Regs[R3]!= Regs[R4]) PC←name;
((PC+4)–217) ≤ name < ((PC+4)+217)
if (Regs[R3]==0) Regs[R1]←Regs[R2]
Figure B.25 Typical control flow instructions in MIPS. All control instructions, except
jumps to an address in a register, are PC-relative. Note that the branch distances are
longer
than the
address field would suggest;
since MIPS instructions are all
bits long,
GIF-3000
(U. Laval)
C. 32
Gagné
36 / 45
Profil d’exécution MIPS, SPECint2000
B.10
Fallacies and Pitfalls
■
B-41
gap
gcc
gzip
mcf
perlbmk
Integer
average
load
26.5%
25.1%
20.1%
30.3%
28.7%
26%
store
10.3%
13.2%
5.1%
4.3%
16.2%
10%
add
21.1%
19.0%
26.9%
10.1%
16.7%
19%
sub
1.7%
2.2%
5.1%
3.7%
2.5%
3%
mul
1.4%
0.1%
compare
2.8%
6.1%
6.6%
6.3%
3.8%
load imm
4.8%
2.5%
1.5%
0.1%
1.7%
2%
cond branch
9.3%
12.1%
11.0%
17.5%
10.9%
12%
cond move
0.4%
0.6%
1.1%
0.1%
1.9%
1%
jump
0.8%
0.7%
0.8%
0.7%
1.7%
1%
call
1.6%
0.6%
0.4%
3.2%
1.1%
1%
return
1.6%
0.6%
0.4%
3.2%
1.1%
1%
shift
3.8%
1.1%
2.1%
1.1%
0.5%
2%
and
4.3%
4.6%
9.4%
0.2%
1.2%
4%
or
7.9%
8.5%
4.8%
17.6%
8.7%
9%
xor
1.8%
2.1%
4.4%
1.5%
2.8%
3%
other logical
0.1%
0.4%
0.1%
0.1%
0.3%
Instruction
0%
5%
0%
load FP
0%
store FP
0%
add FP
0%
sub FP
0%
mul FP
0%
div FP
0%
mov reg-reg FP
0%
compare FP
0%
cond mov FP
0%
other FP
0%
Figure B.27 MIPS dynamic instruction mix for five SPECint2000 programs. Note that integer register-register
move instructions are included in the or instruction. Blank entries have the value 0.0%.
GIF-3000 (U. Laval)
C. Gagné
37 / 45
Partie IV
Exemple 2 : architecture x86
GIF-3000 (U. Laval)
C. Gagné
38 / 45
Architecture x86
« The x86 isn’t all that complex – it just doesn’t make a lot of
sense. »
Mike Johnson, AMD, 1994.
« ... its checkered ancestry has led to an architecture that is difficult
to explain and impossible to love. »
Hennessy et Patterson, Computer Architecture, p. J-46.
Est malgré tout l’architecture x86 est la plus vendue dans le monde,
soit approximativement 500 millions !
GIF-3000 (U. Laval)
C. Gagné
39 / 45
Historique x86
1989 : 80486
1978 : 8086
I
I
I
Extension du 8080 (8 bits,
accumulateur)
16 bits, registres
Entre accumulateur et
registres d’usage général
1980 : 8087
I
I
Co-processeur à point
flottant, 60 nouvelles
instructions
Entre architecture à pile et
registres d’usage général
1992 : Pentium (80586)
1996 : P6 (Pentium Pro)
I
1997 : Pentium II
I
I
I
I
Processeur 32 bits
GIF-3000 (U. Laval)
Nouvelle architecture
SSE2 : 144 instructions
2003 : x64 (AMD64)
I
1985 : 80386
I
SSE : 70 instructions
2001 : Pentium 4
I
Espace d’adresse à 24 bits
Compatibilité avec 8086
MMX : 57 instructions
1999 : Pentium III
1982 : 80286
I
Augment. de performance, 4
nouvelles instructions
I
Passage à 16 registres
généraux 64 bits
Mode compatibilité 32 bits
C. Gagné
40 / 45
Registres x86
J-48
■
Appendix J Survey of Instruction Set Architectures
80 x386, 80x486, Pentium
80 x86, 80 x286
31
15
8 7
0
GPR 0
EAX
AX
AH
AL
Accumulator
GPR 1
ECX
CX
CH
CL
Count reg: string, loop
GPR 2
EDX
DX
DH
DL
Data reg: multiply, divide
GPR 3
EBX
BX
BH
BL
GPR 4
ESP
SP
Stack ptr.
GPR 5
EBP
BP
Base ptr. (for base of stack seg.)
GPR 6
ESI
SI
GPR 7
EDI
DI
PC
Index reg, string source ptr.
Index reg, string dest. ptr.
CS
Code segment ptr.
SS
Stack segment ptr. (top of stack)
DS
Data segment ptr.
ES
Extra data segment ptr.
FS
Data segment ptr. 2
GS
Data segment ptr. 3
IP
Instruction ptr. (PC)
EIP
EFLAGS
Base addr. reg
FLAGS
Condition codes
79
0
FPR 0
FPR 1
FPR 2
FPR 3
FPR 4
FPR 5
FPR 6
FPR 7
15
Status
0
Top of FP stack,
FP condition codes
Figure J.37 The 80x86 has evolved over time, and so has its register set. The original set is shown in black, and the
extended set in gray. The 8086 divided the first four registers in half so that they could be used either as one 16-bit
GIF-3000 (U. Laval)
Jeux
d’instructions
register or as two 8-bit registers. Starting with
the 80386,
the top eight registers were extended to 32 bits and could
C. Gagné
41 / 45
Modes d’adressage
Modes d’adressages variés
I
I
I
Absolu, registre indirect, base, indexé, base indexé + déplacement, etc.
Déplacements de 8, 16, 32 bits
Contraintes sur registres utilisables selon le mode
Adressage little endian
GIF-3000 (U. Laval)
C. Gagné
42 / 45
Instructions x86
J.3 The Intel 80x86
■
J-53
Instruction
Meaning
Control
Conditional and unconditional branches
JNZ, JZ
Jump if condition to IP + 8-bit offset; JNE (for JNZ), JE (for JZ) are alternative names
JMP, JMPF
Unconditional jump—8- or 16-bit offset intrasegment (near), and intersegment (far) versions
CALL, CALLF
Subroutine call—16-bit offset; return address pushed; near and far versions
RET, RETF
Pops return address from stack and jumps to it; near and far versions
LOOP
Loop branch—decrement CX; jump to IP + 8-bit displacement if CX ≠ 0
Data transfer
Move data between registers or between register and memory
MOV
Move between two registers or between register and memory
PUSH
Push source operand on stack
POP
Pop operand from stack top to a register
LES
Load ES and one of the GPRs from memory
Arithmetic/logical
Arithmetic and logical operations using the data registers and memory
ADD
Add source to destination; register-memory format
SUB
Subtract source from destination; register-memory format
CMP
Compare source and destination; register-memory format
SHL
Shift left
SHR
Shift logical right
RCR
Rotate right with carry as fill
CBW
Convert byte in AL to word in AX
TEST
Logical AND of source and destination sets flags
INC
Increment destination; register-memory format
DEC
Decrement destination; register-memory format
OR
Logical OR; register-memory format
XOR
Exclusive OR; register-memory format
String instructions
Move between string operands; length given by a repeat prefix
MOVS
Copies from string source to destination; may be repeated
LODS
Loads a byte or word of a string into the A register
Figure J.41 Some typical operations on the 80x86. Many operations use register-memory format, where either the
source or the destination may be memory and the other may be a register or immediate operand.
GIF-3000 (U. Laval)
C. Gagné
43 / 45
Format des instructions
J.3
struction Set Architectures
Repeat
4
8
Condition
Displacement
■
J-57
a. JE PC + displacement
Lock
Seg. override
4
JE
The Intel 80x86
8
16
16
CALLF
Offset
Segment number
Prefixes
Addr. override
b. CALLF
Size override
6
2
8
8
d/w
r-m
postbyte
Displacement
Opcode
MOV
Opcode
Opcode ext.
mod, reg, r/m
sc, index, base
c. MOV BX, [DI + 45]
Address
specifiers
3
Reg
d. PUSH SI
Disp8
Disp16
5
PUSH
Displacement
4
3
1
16
ADD
Reg
w
Constant
Disp24
e. ADD AX, #6765
Disp32
Imm8
Imm16
Imm24
Immediate
6
2
8
SHL
v/w
r-r
postbyte
f. SHL BX, 1
7
1
8
8
TEST
w
Postbyte
Immediate
Imm32
g. TEST DX, #42
43 The instruction format of the 8086 (black type) and Figure
the extensions
for instruction formats. The encoding of the postbyte is shown
J.44 Typical 8086
in Figure
J.45. Many instructions contain the 1-bit field w, which
whether the44
oper6 (shaded type).
Every (U.
fieldLaval)
is optional except the opcode.
GIF-3000
Jeux
d’instructions
C.says
Gagné
/ 45
x86 en bref
C’est compliqué !
I
I
N’est pas une architecture orthogonale
Exceptions variées
Complexitée due aux modifications incrémentales
I
I
I
Conserve compatibilité arrière
Basé sur un processeur 16 bits
Base logicielle importante
Excellente performance grâce à la loi de Moore et améliorations
architecturales
I
I
De l’extérieur, architecture CISC (Complex Instruction Set Computer)
À l’interne, émule architecture RISC depuis le P6
GIF-3000 (U. Laval)
C. Gagné
45 / 45

Jeux d`instructions

Transcription

Documents pareils

Program Description - Centre pour enfant Timiskaming Child Care

Program Description - Centre pour enfant Timiskaming Child Care

Kirkland Lake - Centre pour enfant Timiskaming Child Care

Program Description - Centre pour enfant Timiskaming Child Care

Les Jeux du Québec welcome Nadia Comaneci

Architecture des Ordinateurs - Moodle

Program Description - Centre pour enfant Timiskaming Child Care

Volunteers Appreciation Party - 29e finale des jeux de l`Acadie à

liste complementaire de liens

centre st. martin centre st. martin

banque / fondation win-win situation