RISC Project - users.etech.haw

Transcription

RISC Project - users.etech.haw
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
1
Hamburg University of Applied Sciences
Introduction to the RISC Processor
Major development advances in the field of computer architectures [2, 3, 5, 11]:
· Family concept was introduced by IBM with System/360 (1964) and DEC with PDP-8 (1965):
A common hardware architectures with different performance characteristics.
Clock frequency, data bus width, addressable memory width
· Micro programmed control unit (suggested by Wilkes 1951, first realised in System/360):
Processing of a machine instruction in a sequence of microinstructions simplified the development
of the control unit for increasing instruction word length.
· Cache memory: Fast buffer to reuse commonly occurring data and/or instructions. Introduced in
1968 with the Type 85 of IBM System/360 family.
· Pipelining: Concurrent processing of several instructions in different stages increases the throughput
of instructions. The Control Data machine CDC 6600 founded the roots of RISC in 1964.
· Multiple processors: Symmetric (shared memory) multiprocessors (SMP), nonuniform memory access (NUMA) systems with physically distributed memory.
· Reduced Instruction Set Computer (RISC)(1980 Patterson/1981 Hennessy): Instruction set , CPU architecture and control unit design derived from studies about dynamic frequency of instructions and
variables in usual running programs.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-1
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Motivations for the design of Complex Instruction Set Computers (CISC) [2, 3, 5]:
· Two principle reasons: Simplified compilers and increased performance.
Concurrently more high level languages (HLL) with powerful expressions have been developed in
order to reduce the software costs and it was intended to design hardware architectures with better
support of HLLs.
· Simplified compilers:
Ø Complex machine instruction sets with more specialised instructions and many addressing modes
get closer to the HLL expressions and therefore each HLL statement can be realised with a simpler
sequence of machine instruction.
Ø But it was experienced that the process of translation tends to become more complicated with a larger number of sophisticated instructions. Optimisation for code reduction and pipelining was shown
to be less efficient.
· Smaller machine programs:
Ø With decreasing semiconductor memory costs that was no longer a driving force
Ø In contrary less instructions can be supported with a shorter operation code and therefore fewer
bytes have to transferred from main memory.
Ø Quantitative studies have shown that code size is reduced only by a small amount (Table 1-1).
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-2
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Table 1-1: Code size of CISC instruction set relative to RISC I [2]
· Faster machine program processing:
Ø At the first glance it seems as if a HLL statement which is translated directly to a machine instruction can be processed faster than a sequence of simple instructions.
Ø But the control unit will be more complicated and the microcode storage has to be enlarged. As a result no reduction of processing cycles can be realised.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-3
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
1.1
Hamburg University of Applied Sciences
Motivation Derived from Code Analysis
· As a topic of cache memory design research for improvement of von Neumann computer architectures several dynamic measurements with running programs have been performed. The frequency
of events with memory interaction was a main target of these studies [2].
Table 1-2: Relative dynamic frequency of HHL operations [2].
· The evaluation of event frequencies in running programs was driven by three questions:
Ø Which operations have to be supported by the processor?
Ø Which operands determine the memory organisation and addressing modes?
Ø Which optimisation goals for control unit sequencing and pipelining are of major importance?
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-4
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Operations:
· Program activities are dominated by simple variable assignments which are related to move machine instructions. Conditional statements represent another preponderance. Because IF and LOOP are implemented with conditional branches the sequence control mechanisms are of interest.
· For an evaluation of the actual processor activities several HLL programs have been compiled for three
CISC processors VAX, PDP-11 und M68000. Dynamic occurrence of machine instructions and memory
accesses have been under investigation Table 1-3.
Table 1-3: Weighted relative dynamic frequency of
HLL operations
[2, PATT82a].
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-5
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
· To obtain the columns 4 and 5 each value in columns 2 and 3 is multiplied by the relative number of machine instructions produced by the compiler.
Ø Conditional branches and procedure calls make up the largest part of machine language code and
therefore the relationship of instruction set design and code sequencing is of interest.
· The 6th and 7th columns are derived by multiplying the frequency of occurrence of each statement by the
relative number of memory references caused by each statement.
Ø Procedure calls and returns are the most time consuming operations in HLL programs. Parameter passing and register savings to main memory take more time than simple register access.
Operands:
· The majority of occurring operands are simple variables and 80% of them are locally associated with
functions [2, PATT82a]. Indices of arrays and pointers for structure access are of the same kind.
Table 1-4: Dynamic percentage of HLL operands
[2, PATT82a].
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-6
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Procedure calls:
· The number of parameters, of local variables and the depths of procedure nesting determines the amount
of memory transfers related to calls and returns.
Ø Quantitative studies by Tanenbaum [2, TANE78] have shown that 98% of reviewed procedure calls
pass less than 6 parameters and 92 % of checked calls have used less than 6 local variables.
Ø Call-return behaviour of a program is depicted in Fig. 1-1. Each call is represented by the line moving
down to the right and each return by the line moving up to the right. An interval with the nesting depth
of 5 is defined as a grey window. Only a sequence of 6 calls causes a shift of the window’s position.
Ø By further investigations it was recognised that with window width of 8 only 1% of the calls makes the
window’s position shift up or down.
Ø This characteristic of operand usage is stated as locality of reference.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-7
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Fig. 1-1: Example call-return behaviour of a program [2, PATT85].
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-8
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
First Implications:
· Designing an large instruction set with close relation to complex HLL statements and expressions is not
very effective. HLLs should rather be supported where time consuming operations have to be processed,
i.e. avoided memory accesses.
· A large number of internal general purpose registers (GPR) supports local handling of variables in order
to reduce the amount of external memory transfers.
· A careful organisation of pipeline sequencing is necessary because of a typically large portion of conditional branches and procedure calls. Otherwise to many instructions are fed into the processors pipeline
stages without being operated.
· Till here the requirements for a simplified and reduced instruction set can not be derived from the named
study results.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-9
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Characteristics of Reduced Instruction Set Architectures:
1. A simple instruction set layout with a unique length of the instruction word with typically 32 bits.
A small number of different combinations of opcode, operand fields and addresses. Register fields
with a minimum of 5 bit width ().
2. Logic and arithmetic operations only with operands located in registers (Register to Register Operations). Only load and store instructions support the access to external (Load/Store Operations).
These instructions contain only one operand with a reference to the memory.
3. Only a few addressing modes: register indirect addressing, displacement, immediate und PC relative. No indirect addressing modes in order to avoid to many slower external memory transfers.
Advantages:
· The complexity of the instruction decoder component of the control unit will be reduced.
Ø No long sequences of microcode processing steps are necessary.
Ø Fewer chip area is occupied by the simple combinational logic of the control unit.
· Branch processing with in the pipeline becomes more efficient.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-10
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
Hamburg University of Applied Sciences
Fig. 1-2: Instruction layout for
MIPS R processor family [2, 3, 5,
11].
· I-Typ
Loads und Stores with register
indirect and displacement:
rt ¬ Mem(rs + imm)
All immediate addressed:
rt ¬ rs op imm
Conditional branches with PC
relative:
rs = = 0, PC ¬ PC + imm
rs != rt, PC ¬ PC + imm
· R-Typ
Register-Register ALU Operationen:
rd ¬ rs func rt
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-11
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
1.2
Hamburg University of Applied Sciences
Processor Structure
Essential characteristics of a RISC architecture: CPU general purpose register and a control unit which is
realised with combinational logic hardware .
· User visible registers: Support the software engineer to optimise the locality of variables with register allocation in order to minimise the external memory transfers.
Ø GPRs Ri : Contain operands for all ALU operations and addresses for displacement addressing with
numbers from the immediate field.
Ø Stack pointer: Special register points to the top of the stack and is modified implicitly by procedure
calls and returns. Register savings are supported with push and pop instructions which address the
stack implicitly as well via the stack pointer.
Ø Index registers: Automatic increment for address calculations in loops.
· Control, status registers: Contents controls CPU operations but can not be manipulated directly.
Ø Program counter: State addresses the next instruction in main memory which has to be processed.
Ø Instruction register: Contains the just loaded instruction.
Ø Memory buffer registers: Intermediate storage of effective addresses for load/store data and data itself.
Ø Condition code register: Several bits which characterise the results of ALU operations.
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-12
Hochschule für Angewandte Wissenschaften Hamburg
Prof. Dr. B. Schwarz
15
Hamburg University of Applied Sciences
Control Unit
0
15
0
PC
0000
0001
Fig. 1-3: Basic elements of a von Neumann
RISC processor architecture.
IR
Decoder
Memory
R0
R1
R2
R3
Rn
Not depicted:
· CCR and index register
· CPU data and address bus
· Separation registers for decoupling the
pipeline stages
FFFE
FFFF
CPU
ALU
RISC Project
PDF created with FinePrint pdfFactory trial version http://www.fineprint.com
1-13