1 Introduction
这门课在 UNNC 被分两个部分教学:系统架构 + 网络。系统架构部分使用的教材是 MK Computer Organization and Design 5th Edition 2。
2 Hierarchy, Components & Technology
2.1 Overview
- Computer components
- Computer systems hierarchy
- Technology trends
2.2 Computer Components
2.2.1 Computer Components
- Input
- Mouse
- Keyboard
- Touchpad
- ...
- Output
- Monitor
- Printer
- ...
- Memory
- Volatile main memory
- Random Access Memory (RAM)
- CPU cache
- Non-volatile secondary memory
- Read Only Memory (ROM)
- Flash memory
- Optical disk (CDROM, DVD)
- Volatile main memory
- Processor (CPU)
- Datapath: the component that performs arithmetic operations
- Control: the component that commands the datapath, memory and I/O devices according t othe instructions of the program
The IAS (von Neumann Machine)
2.2.2 Below Your Program
- Application software
- Written in high-level language
- System software
- Compiler
- HLL code => machine code
- Operating system
- Handling basic I/O operations
- Managing memory & storage
- Scheduling tasks & sharing resources
- Compiler
- Hardware
- Processor
- Memory
- I/O controller
2.2.3 Levels of Programming Code
- Compiler: a program that translates high-level language statements into assembly language statements
- Instruction: a command that computer hardware understands and obeys
- Assembler: a program that translates a symbolic version of instructions into the binary versions
- Assembly language: a symbolic representation of machine instructions
- Machine language: a binary representation of machine instructions
2.3 Computer Systems Hierarchy
2.3.1 Computer Systems Hierarchy
2.3.2 Instruction Set Architecture (ISA)
- ISA
- A well-defined hardware/software interface
- The "contract" between software and hardware
- Allow communications
- Standardize instructions, machine language bit patterns, operation modes, storage locations, etc
- Advantage
- Allow different implementations of the same architecture
- Disadvantage
- Sometimes prevent adding new innovations
- Modern ISAs
- 80x86, MIPS, ARM, PowerPC
2.4 Technology Trends
2.4.1 Basic Logic
Because of the on-or-off nature of the binary information and signal routing the computer uses, an efficient electronic switch is required:
- Vacuum tube
- Transistor
- Integrated circuit (IC)): combine dozens to hundreds of transistor on a single chip
- Very large-Scale integrated (VLSI) circuit: a device containing hundreds of thousands to millions of transistors
- Ultra large-scale integrated circuit: a device containing over millions of transistors
2.4.2 Trends
Year | Technology | Relative Performance/Cost |
---|---|---|
1951 | Vacuum tube | 1 |
1965 | Transistor | 35 |
1975 | IC | 9*10^3 |
1995 | VLSIC | 2.4*10^6 |
2013 | ULSIC | 2.5*10^11 |
2.4.3 Microprocessor
Microprocessor: computer CPU on a single chip (IC)
2.4.4 Modern Multicore Processor
Multicore processor: a single chip contains more than one microprocessor core
2.4.5 Manufacturing Chips
3 Computer Performance
3.1 Overview
- Computer performance
- Power wall
- SPEC Benchmark
- Amdahl's Law
3.2 Computer Performance
3.2.1 Response Time and Throughput
- Response time (execution time)
- The total time to complete a task
- Throughput (brandwidth)
- The number of tasks completed per unit time
- Effects
- Replacing the processor in a computer with a faster version
- Adding additional processor to a system that uses multiple processors for separate tasks
3.2.2 Relative Performance
To maximize performance => minimize execution
1 | Performance = 1 / Execution time_x |
Computer X is n times faster than computer Y:
1 | Performance_x / Performance_y = Execution time_y / Execution time_x = n |
3.2.3 Measuring Performance
- Response time
- The total time to complete a task
- Disk accesses, memory accesses, I/O activities, etc
- CPU execution time (CPU time)
- The actual time the CPU spends computing for a specific task
- Exclude time spent waiting for I/O or running other programs
- Further divided into:
- User CPU time: the CPU time spent in a program itself
- System CPU time: the CPU time spent in the OS performing tasks on behalf of the program
3.2.4 CPU Clocking
- Clock: determines when events take place in hardware
- Clock signal: produced by an external oscillator circuit
- Clock cycles (clock ticks, clock periods): the time for one clock period, usually of the processor clock, which runs at a constant rate (like 250ps)
- Clock frequency (clock rate): cycles per second, which is the inverse of the clock period (like 4.0GHz)
3.2.5 CPU Time
1 | CPU Time = CPU Clock Cycles * Clock Cycle Time |
To improve performance:
- Reducing the number of clock cycles
- Increasing clock rate
- Trade-off: the number of clock cycles & the clock cycle time
3.2.6 Instruction Performance
- The execution time of instructions: determined by the number of instructions in a program
- The instruction count for a program: determined by program, ISA and compiler
- Clock cycles per instruction (CPI): the average number of clock cycles each instruction takes to execute, determined by CPU hardware
1 | CPU Clock Cycles = Instruction Count * CPI |
1
2 CPU Time = Instruction Count * CPI * Clock Cycle Time
= Instruction Count * CPI / Clock Rate
If different instruction classes take different numbers of cycles:
1
2
3 n
Clock Cycles = ∑(CPI_i * Instruction Count_i)
i=1Weighted average CPI:
1
2
3
4 CPI = CPU Clock Cycles / Instruction Count
n
= ∑(CPI_i * Instruction Count_i / Instruction Count)
i=1Instruction Count: relative frequency
3.2.7 Understanding Program Performance
- Algorithm
- Affects insturction count
- Affects CPI (possibly) - by favouring slower or faster instructions
- Programming language
- Affects instructions count
- Affects CPI - beacause of its own features
- Compiler
- Affects instructions count
- Affects CPI
- ISA
- Affects instructions count
- Affects CPI, clock rate
3.3 Power Wall
Complememtray metal oxide semiconductor (CMOS): the dominant technology for integrated circuits
Dynamic energy: the primary source of energy consumption for CMOS, consumed when transistors switch states from 0 to 1 and vice versa.
1
Energy ∝ Capacitive load * Voltage^2
The energy of a single transition:
1
Energy ∝ 0.5 * Capacitive load * Voltage^2
The power required per transistor:
1
Power ∝ 0.5 * Capacitive load * Voltage^2 * Frequency
3.4 SPEC Benchmark
System Performance Evalucation Cooperative (SPEC): the standard sets of benchmarks for modern computer systems
SPEC CPU2006
SPEC Power Benchmark
3.5 Amdah's Law
3.5.1 Amdah's Law
Amdah's Law: the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. It is a quantitative version of the law of diminishing returns.
The execution time of the program after the improvement:
1
T_improved = T_affected / improvement factor + T_unaffected
3.5.2 Law of Diminishing Returns
3.5.3 Low Power at Idle
(NEED FURTHER READING)
4 MIPS32 Programming
4.1 Overview
- MIPS introduction
- MIPS structure
- MIPS syntax & instruction
- MIPS basic arithmetic
- MIPS branching
- MIPS logical operations
4.2 MIPS Introduction
4.2.1 Introduction
- Microprocessor without Interlocked Processor States (MIPS)
- ISA based on Reduced Instruction Set Computing (RISC) CPU design strategy
- Simple instruction, fast execution
- 1980s 32-bit, 1991 64-bit
4.2.2 Bits, Bytes and Words
- Bit: binary digit (either 0 or 1)
- Byte: a sequence of bits (having 8 bits in length since the mid 1960's to get 256 possibilities)
- Word: amount of data computer can process in one step. (Most CPUs have a word size of 32 or 64 bits today. On the 32-bit MIPS, a word is 4 bytes long.)
4.3 MIPS Structure
4.3.1 MIP32 Structure
- 1 word = 32 bits = 4 bytes
- Addresses are 32-bit words => (2^32 different locations)
- Registers are 32-bit words
4.3.2 Fetch-Execute Cycle
- Fetch the instruction: fetch the instruction from the memory address that is currently stored in the program counter (PC), and store it in the instruction regitser (IR).
- Decode the instruction: the encoded instruction presented in the IR is interpreted by the decoder
- Execute the instruction
- Write back
1 | Repeat { |
4.4 MIPS Syntax & Instruction
4.4.1 MIPS Programming: "Hello, world!"
1 | .data |
4.4.2 MIPS Basic Syntax
- Assembler directives begin with a "."
.data
- Start assembling data used by the program
.text
- Start assembling program instructions
.asciiz
- Place a null-terminated ASCII string in memory
- Labels are names followed by a colon ":"
- Descriptive names chosen by programers
- The assembler will assign memory addresses to labels for later reference
- The label "main" is the entry of the program
- Machine instructions
- Lines after "main:" contain symbolic machine instructions
- Comments begin with a "#"
4.4.3 MIPS Registers
Name | Register Number | Use |
---|---|---|
$zero | 0 | Constant 0 |
\(at | 1 | Reserved for assembler | | \$v0-\)v1 | 2-3 | Function select, return result |
$a0-\(a3 | 4-7 | Arguments | | \$t0-\)t7 | 8-15 | temporaries |
$s0-\(s7 | 16-23 | saved temporaries | | \$t8-\)t9 | 24-25 | temporaries |
$k0-$k1 | 26-27 | kernel registers |
$gp | 28 | global heap pointer |
$sp | 29 | stack pointer |
$fp | 30 | frame pointer |
$ra | 31 | return address |
4.4.4 MIPS Instruction Field
General R-format:
op | rs | rt | rd | shamt | funct | |
---|---|---|---|---|---|---|
bits | 6 | 5 | 5 | 5 | 5 | 6 |
op
- operation code (basic operation of the instruction)rs
- first source register operandrt
- second source register operandrd
- destination register operandshamt
- shift amount (used in shift instruction)funct
- function code (select the variant of the operation in the op code field)
General I-format:
op | rs | rt | constant or address | |
---|---|---|---|---|
bits | 6 | 5 | 5 | 16 |
4.4.5 MIPS Instruction Format
General syntax:
3 Operands | 2 Operands | Other |
---|---|---|
op dst, src, src | op dst, src | op |
op dst, src, imm | op dst, imm | op src |
dst
- destination registersrc
- source registerimm
- immediate value (16 bits)
4.5 MIPS Arithmetic
4.5.1 Add
1 | # rd = rs1 + rs2 |
Example: add $s0, $t0, $t1
Before | After | |
---|---|---|
$s0 | 4 | 11 |
$t0 | 5 | 5 |
$t1 | 6 | 6 |
4.5.2 Subtract
1 | # rd = rs1 - rs2 |
Example: sub $s0, $t0, $t1
Before | After | |
---|---|---|
$s0 | 4 | -1 |
$t0 | 5 | 5 |
$t1 | 6 | 6 |
4.5.3 Add Immediate
1 | # rd = rs1 + imm |
Example: addi $s0, $t0, 100
Before | After | |
---|---|---|
$s0 | 4 | 105 |
$t0 | 5 | 5 |
4.5.4 Assignment
1 | move a, b |
The instruction is accepted by MIPS assembler, but it is called the pseudo-instruction, which will be translated into:
1 add a, b, $zero
4.5.5 System Call
To request an operating-system-like service:
- Load the service number in register $v0
- Load argument values, if any, in $a0, …, $a3 (or $f12 for floating point values)
- Issue the SYSCALL instruction
- Return values and if any, put the result in register $v0 (or $f0 for floating point values)
May destroy \(v0-\)v1, \(a0-\)a3, \(t0-\)t9, $ra, but always preserves \(s0-\)s7
(System services)
4.6 MIPS Branching
4.6.1 Branch Instructions
beq a, b, l
- Branch on equal- Goto instruction at label
l
ifa==b
, otherwise continue
- Goto instruction at label
bne a, b, l
- Branch on not equal- Goto instruction at label
l
ifa!=b
, otherwise continue
- Goto instruction at label
j l
- Jump to- Jump to instruction at label
l
- Jump to instruction at label
4.6.2 If-then-else
Example:
i | j | f | g | h |
---|---|---|---|---|
$s0 | $s1 | $s2 | $s3 | $s4 |
1 | # if (i==j) then {f=g+h} else {f=g-h} |
4.6.3 While
Example:
i | j |
---|---|
$s0 | $s1 |
1 | # while (i!=0) {i=i-1; j=j+3;} |
4.6.4 Other Comparisons
slt a, b, c
- set on less than- Set register
a
to1
, ifb<c
, otherwise to0
. (a = b<c ? 1 : 0
)
- Set register
slti a, b, n
- set on less than immediate- Set register
a
to 1, ifb<n
, otherwise to0
. (a = b<n ? 1 : 0
)
- Set register
4.6.5 Example: Maximum of Two Numbers
1 | .text |
4.7 MIPS Logical Operations
4.7.1 Bitwise Logical Operations
OR, AND, NOR, XOR, ...
4.7.2 Logical Operators
or, and, nor, xor, ...
4.7.3 MIPS Logical Operations
1 | # rd = rs and rt |
5 Sign Numbers, Overflow, Multiplication and Division
5.1 Overview
- Sign numbers
- Overflow
- MIPS shift opreations
- MIPS multiplication
- MIPS division
5.2 Sign Numbers
5.2.1 4-Bit Signed Encodings
(See details in CSF.)
Binary | Unsigned | Sign and Magnitude | 1's Complement | 2's Complement | Excess-8 |
---|---|---|---|---|---|
0000 | 0 | 0 | 0 | 0 | -8 |
0001 | 1 | 1 | 1 | 1 | -7 |
0010 | 2 | 2 | 2 | 2 | -6 |
0011 | 3 | 3 | 3 | 3 | -5 |
0100 | 4 | 4 | 4 | 4 | -4 |
0101 | 5 | 5 | 5 | 5 | -3 |
0110 | 6 | 6 | 6 | 6 | -2 |
0111 | 7 | 7 | 7 | 7 | -1 |
1000 | 8 | -0 | -7 | -8 | 0 |
1001 | 9 | -1 | -6 | -7 | 1 |
1010 | 10 | -2 | -5 | -6 | 2 |
1011 | 11 | -3 | -4 | -5 | 3 |
1100 | 12 | -4 | -3 | -4 | 4 |
1101 | 13 | -5 | -2 | -3 | 5 |
1110 | 14 | -6 | -1 | -2 | 6 |
1111 | 15 | -7 | -0 | -1 | 7 |
5.2.2 Sign Extension
Sign extension: the operation of increasing the number of bits of a binary number while preserving the number's sign(+/-) and value
Examples:
Befor Extension | After Extension |
---|---|
00 1010 | 0000 0000 0000 1010 |
11 1111 0001 | 1111 1111 1111 0001 |
5.3 Overflow
5.3.1 Problem
Overflow: happens when the last carry(1) cannot be accommodated.
Situations:
Operation | Result |
---|---|
+ + + | - |
- + - | + |
+ - - | - |
- - + | + |
5.3.2 Solution in MIPS
Exception Program Counter (EPC): a register in MIPS to contain the address of the instruction that caused the exception.
The instruction move from system control (
mfc0
) is used to copy EPC into a general-purpose register so that MIPS software has the option of returning to the offending instruction via a jump register instruction.
- Causing exceptions on overflow:
add
addi
sub
- Not causing exceptions on overflow:
addu
(add unsigned)addiu
subu
5.3.3 References
- MIPS Overflow Exception
MIPS can trap on overflow, but unlike many other computers, there is no conditional branch to test overflow. A sequence of MIPS instructions can discover overflow. For signed addition, the sequence is the following:
1
2
3
4
5
6
7
8
9
10
11 addu $t0, $t1, $t2 # $t0 = sum, but don’t trap
# if (signs of $t1 & $t2 differ) then {goto No_overflow}
xor $t3, $t1, $t2 # $t3 = signs of $t1 & $t2 differ ? <0 : >0
slt $t3, $t3, $zero # $t3 = $t3<0 ? 1 : 0
bne $t3, $zero, No_overflow # $t3 == 1 && {goto No_overflow}
# if (signs of $t1 & $t2 equal, but sign of sum differs) then {goto Overflow}
xor $t3, $t0, $t1 # $t3 = sign of sum differs ? <0 : >0
slt $t3, $t3, $zero # $t3 = $t3<0 ? 1 : 0
bne $t3, $zero, Overflow # $t3 == 1 && {goto Overflow}For unsigned addition ($t0 = $t1 + $t2), the test is
1
2
3
4
5
6 addu $t0, $t1, $t2 # $t0 = sum, but don’t trap
nor $t3, $t1, $zero # $t3 = NOT $t1 = (2's comp of $t1) - 1 = 2^32 - $t1 - 1
sltu $t3, $t3, $t2 # $t3 = (2^32 – $t1 – 1) < $t2 ? 1 : 0
# ⇒ 2^32 – 1 < $t1 + $t2
bne $t3,$zero,Overflow # if(2^32 –1<$t1+$t2) goto overflow
5.4 MIPS Shift Operations
5.4.1 Shift Operations
Examples:
Original Bits | Shifted Left | Shifted Right |
---|---|---|
0101 (5) | 1010 (10) | 0010 (2) |
1100 (-4) | 1000 (-8) | 0110 (6) => 1110 (-2) |
1011 (-5) | 0110 (6) => 1110(-2) [overflow] | 0101 (5) => 1101 (-3) |
For unsigned numbers:
- Shift left corresponds to multiplication by 2
- Shift right corresponds to division by 2, ignoring the remainder
For signed numbers:
- On 2's complement, shift left is multiplication by 2 if no overflow
- (Shift right corresponds to division by 2 given sign extension, plus the remainder?)
- Arithmetical shift right performs sign extension when shifting (to be half)
5.4.2 MIPS Logical Shift Operations
1 | # Shift left logical |
5.4.3 MIPS Arithmetic Shift Operations
1 | # Shift right arithmetical |
5.5 Multiplication
5.5.1 Multiplication
m
-bit number * n
-bit number = m+n
-bit number
Example: 1101 * 1011 (Long Multiplication)
1 | 1 1 0 1 13 |
5.5.2 MIPS Multiplication Instructions
MIPS stores the 64-bit result of the multiplication of two 32-bit registers in two special 32-bit registers Hi
(32~63 bits) and Lo
(0~31 bits).
1 | # Hi~Lo = $s1 * $s2 (signed) |
5.6 MIPS Division
5.6.1 Division
dividend = quotient * divisor + remainder
Algorithm: repeatedly try to subtract the left-shifted divisor from the dividend, shifting the divisor right each time
Example:
(Long Division Decimal:)
(Long Division Binary:)
- Carry out division for absolute values of the dividend and divisor
- Negate the quotient if the signs disagree
- The remainder must have the same signs as dividend
5.6.2 MIPS Division Instructions
MIPS stores the 64-bit result of the division of two 32-bit registers in two special 32-bit registers Hi
(remainder) and Lo
(quotient).
1 | # Hi~Lo = $s1 ÷ $s2 (signed) |
6 Machine Code, Addressing
6.1 Overview
- Machine code VS. assembly language
- MIPS instruction encodings
- Memory organization
- MIPS addressing modes
- Stack and Heap
- Dynamic data
6.2 Machine Code VS. Assembly Language
1 | # Assembly |
6.3 MIPS Instruction Encodings
6.3.1 R-format - Operands Are Registers Only
op | rs | rt | rd | shamt | funct | |
---|---|---|---|---|---|---|
bits | 6 | 5 | 5 | 5 | 5 | 6 |
op
- operation code (basic operation of the instruction)rs
- first source register operandrt
- second source register operandrd
- destination register operandshamt
- shift amount (used in shift instruction)funct
- function code (select the variant of the operation in the op code field)
6.3.2 I-format - Operands Contain Immediate
op | rs | rd | imm | |
---|---|---|---|---|
bits | 6 | 5 | 5 | 16 |
op
- operation coders
- first source register operandrt
- second source register operandimm
- immediate constant
6.3.3 J-format - Operands Are Immediate Only
op | imm | |
---|---|---|
bits | 6 | 26 |
op
- operation codeimm
- immediate constant
6.3.4 Examples
``` # Assembly add $s0, $s1, $s2
Hexadecimal (machine code)
0232 8020
1
2
3
4
5
6
7
8
9
10
11
12
| op | rs | rt | rd | shmt | funct |
| :----: | :---: | :---: | :---: | :---: | :----: |
| 000000 | 10001 | 10010 | 10000 | 00000 | 100000 |
| R | $s1 | $s2 | $s0 | | add |
- ```
# Assembly
sub $s0, $s1, $s2
# Hexadecimal (machine code)
0233 8022op rs rt rd shmt funct 000000 10001 10010 10000 00000 100010 R $s1 $s2 $s0 sub ``` # Assembly addi $t0, $t1, 100
Hexadecimal (machine code)
2128 0064
# \(t0 = Memory[\)t1+n] lw \(t0, n(\)t1)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
| op | rs | rd | imm |
| :----: | :---: | :---: | :-----------------: |
| 001000 | 01001 | 01000 | 0000 0000 0110 0100 |
| addi | $t1 | $t0 | |
### 6.3.5 Decoding Machine Code
![MIPS opcode map](https://tvax4.sinaimg.cn/large/006tNc79gy1flc1g5d0irj30ym13s163.jpg)
- Convert hexadecimal to binary to find the op fields
- Look at the op filed (bits 31:26) to determine the operation
- Reformat the binary instruction into corresponding format fields
- Decode the operation type, instruction by checking the field values
### 6.3.6 Register VS Memory
- Processor can access registers directly
- Limit of 32 registers
- Most programs require much more data than can be held in registers (like text, code and images)
- Extra data must kept in memory
- **Load**: transfer data from memory to register
- **Store**: transfer data from register to memory
![](https://tvax4.sinaimg.cn/large/006tNc79gy1flf08zfgmlj30ko09cglz.jpg)
(Memory hierarchy)
> #### Key Differences Between Register and Memory
>
> 1. The primary difference between register and memory is that register **holds the data that the CPU is currently processing** whereas, the memory **holds the data the that will be required for processing**.
> 2. The Register ranges from **32-bits register to 64-bits register** whereas, the memory capacity ranges from some **GB** to some **TB**.
> 3. The processor accesses register **faster** than the memory.
> 4. Computers registers are **accumulator register, program counter, instruction register, address register**, etc. On the other hands, memory is referred as the main memory of the computer which is **RAM**.
## 6.4 Memory Organization
### 6.4.1 Memory
Memory is organized as a sequence of bytes.
| Address | Value |
| :-----: | :---------: |
| 0 | 8-bit value |
| 1 | 8-bit value |
| 2 | 8-bit value |
| ... | ... |
Registers hold 32 bits (= 4 bytes = 1 word).
| Address | Value |
| :-----: | :----------: |
| 0 | 32-bit value |
| 4 | 32-bit value |
| 8 | 32-bit value |
| ... | ... |
### 6.4.2 Data Transfer Instruction: Load Word
The address has to be divisible by 4 (word aligned).
The offset n has to fit within 16 bit, i.e. -32768 <= n <= 32767.
1 |
|
Memory[$t1+n] = $t0
sw \(t0, n(\)t1)
The address has to be divisible by 4 (word aligned).
The offset n has to fit within 16 bit, i.e. -32768 <= n <= 32767.
1 |
|
Load (unsigned) byte
lb rt, address lbu rt, address
Load (unsigned) halfword
lh rt, address lhu rt, address
Store byte
sb rt, address
Store halfword
sh rt, address
The addresses for halfwordshave to be divisible by 2.
There is no restriction for byte addresses.
1 |
|
# STEP 1: load upper immediate
# lui $s0, imm
lui $s0, 1202 # 1202 = 0000 0100 1011 0010
# Now, $s0 = 0000 0100 1011 0010 0000 0000 0000 0000
# STEP 2: add(or) the lower immediate
# ori $s0, $s0, imm
ori $s0, $s0, 4307 # 4307 = 0001 0000 1101 0011
# Now, $s0 = 0000 0100 1011 0010 0001 0000 1101 0011
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
_Do not use `addi` in STEP 2. Because `addi` is with sign-extend. If the immediate value is negative, then the upper 16 bits will be all 1's, and adding this will ruin the upper 16 bits._
## 6.5 MIPS Addressing Modes
### 6.5.1 Register Addressing
Register addressing: a source or destination operand is specified as content of one of the registers `$0`~`$31`.
E.g. `add $s0, $s1, $s2`
### 6.5.2 Immediate Addressing
Immediate addressing: a numeric value embedded in the instruction is the actual operand.
E.g. `addi $s0, $s1, 100`
### 6.5.3 PC-Relative addressing
PC-relatvie addressing: a data or instruction memory location is specified as an offset relatvie to the incremented PC.
(PC refers to special purpose register , it stores the address of next instruction to be fetched: the operand `address = PC + offset`. Usually used in conditional branch.)
E.g. `bne $s0, $s1, Label`
### 6.5.4 Base Addressing
Base addressing: a data or instruction memory location is specified as a signed offset from a register. The address of the operand is the sum of the immediate and the value in a register.
E.g. `lw $t0, 4($s0)`
### 6.5.5 Pseudodirect Addressing
Pseudodirect addressing: the memory address is (mostly) embedded in the J-type instructions.
E.g. `j Label`
### 6.5.6 Addressing Mode vs Instruction Type
- Addressing mode: how an address (memory or register) is determined
- Instruction type: how an instruction is put together
## 6.6 Stack and Heap
Two available areas of memory used to store variables are commonly referred to as the stack and the heap.
### 6.6.1 Stack
- Store function local variables
- Store temporary variables
- Pass the arguments for next function call
- Return the address when function ends
### 6.6.2 Heap
- A much larger pool of memory than stack
- Must be explicitly allocated and deallocated by the programmer manually
- _This is completely unrelated to the Heap data structure (?)_
## 6.7 Dynamic Data
### 6.7.1 MIPS Memory Allocation
To allocate additional memory, use `sbrk` system call (9).
Example: allocate 16 bytes (= 4 words) of memory and returns the address of the new memory in $v0.
li $a0, 16 li $v0, 9 syscall # sbrk 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
### 6.7.2 MIPS Memory Deallocation
There is no system call to deallocate memory in MIPS. Either does JAVA. While JAVA uses <u>garbage collection</u> technique to reclaim unused memory automatically.
# 7 Floating Point Numbers
## 7.1 Overview
- Floating Point Representation
- Arithmetic for Floats
- Error Propagation
- MIPS Floating Point Unit
## 7.2 Floating Point Representation
### 7.2.1 Scientific Notation
General form:
![Floating Point Representation](https://tvax4.sinaimg.cn/large/006tNc79gy1flffkh8rqmj30p4078q3i.jpg)
where:
- `m` = significand
- `b` = base number (any positive integer)
- `e` = exponent (an integer)
E.g.
- `2.73*10^(-7)` is in normalized scientific notation
- `0.273*10^(-6)` is not in normalized scientific notation
- `20.73*10^(-8)` is not in proper scientific notation
### 7.2.2 Normalized Scientific Notation for Binary
General form:
![Normalized Scientific Notation for Binary](https://tvax1.sinaimg.cn/large/006tNc79gy1flffusxbl1j30ty09041a.jpg)
E.g.
- `11.101 = 1.1101 * 2^1` is normalized form
- `11.101 = 11.101 * 2^0` is not normalized form
### 7.2.3 IEEE 754 Floating Point Standard
IEEE 754 Floating Point Standard: a type of sign and magnitude representation
![IEEE 754 Floating Point Standard](https://tvax4.sinaimg.cn/large/006tNc79gy1flfg04fu8mj30q002474l.jpg)
| | S | Biased Exponent | Fraction |
| :-----------------------: | :---: | :-------------: | :------: |
| Single-precision (32-bit) | 1 bit | 8 bits | 23 bits |
| Double-precision (64-bit) | 1 bit | 11 bits | 52 bits |
- **S**: sign bit of the fraction (0 for +, 1 for -)
- **Biased Exponent**:
- In <u>excess representation</u> for fast sorting (2's complement with negative exponent will make it look like a big number)
> The designers of IEEE 754 also wanted a floating-point representation that could be easily processed by integer comparisons, especially for sorting. This desire is why the sign is in the most significant bit, allowing a quick test of less than, greater than, or equal to 0. (It’s a little more complicated than a simple integer sort, since this notation is essentially sign and magnitude rather than two’s complement.)
- A fixed value called <u>bias</u> is subtracted from the field to get the true exponent value
- `Bias = 2^(k-1) - 1`, where `k` is the number of bits in exponent field. E.g. Single precision, `bias = 127`; double precision, `bias = 1023`.
- `Biased Exponent = Actual Exponent + Bias`. E.g. `Actual Exponent = Biased Exponent - Bias = 0000 0101 - Biad = 5 - 127 = -122`
Example: binary to single-precision floating point format
`-10.101`
`-10.101 = (-1) * (1.0101) * 2^1`
`Biased Exponent = 1 + 127 = 128 = 1000 0000`
| S | Biased Exponent | Fraction |
| :-: | :-------------: | :--------------------------: |
| 1 | 1000 0000 | 0101 0000 0000 0000 0000 000 |
> Special values (single precision):
>
> - Smallest positive normalized number: `1.0 * 2^(-126)`
> - Least negative normalized number: `-1.0 * 2^(-126)`
> - Largest positive normalized number: `1.1111 1111 1111 1111 1111 111 * 2^127 = (2-2^(-23)) * 2^127`
> - Most negative normalized number: `-1.1111 1111 1111 1111 1111 111 * 2^127 = -(2-2^(-23)) * 2^127`
> - ±∞: `± 1.0 * 2^128`
> - NaN: `1.f * 2^128` where `f ≠ 0`
### 7.2.5 Overflow, Underflow, Infinities and NaN
- Overflow: result > largest representable number (`127` for single-precision) => ±∞
- Underflow: result < smallest normalized number (`-126` for single-precision) => denormalized number
- Infinities: `1/0`
- NaN: `0.0/0.0`, `±∞/±∞`, `0*±∞`, `-∞+∞`, `sqrt(-1.0)`, `log(-1.0)`
> The purpose of NaNs is to allow programmers to postpone some tests and decisions to a later time in the program when they are convenient.
### 7.2.6 Binary Floating Point ⇌ Decimal Floating Point
Examples:
- Binary => Decimal1
2
3
4
5
- Decimal => Binary
- Method 1 :1
2
3
- Method 2:
-0.4375_[10] = (-7/16)_[10]
= (-7/2^4)_[10]
= (-0.0111*2^0)_[2]
= (-1.11*2^(-2))_[2]
1
2
3
- Method 3:
27.1875_[10] = 27_[10] + 0.1875_[10]
= 11011_[2] + 0.1875_[10]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| | \*2 | Integer Part |
| :----: | :---: | :----------: |
| 0.1875 | 0.375 | 0 |
| 0.375 | 0.75 | 0 |
| 0.75 | 1.5 | 1 |
| 0.5 | 1.0 | 1 |
=> `0.1875_[10] = 0.0011_[2]`
=> `27.1875_[10] = 11011.0011`
- _`0.3` in decimal cannot be expressed exactly by finite digits in binary_
- Decimal => Binary => Single-precision binary floating point format
- ```
-13.825_[10] = -( 13_[10] + 0.825_[10] )
= -( 1101_[2] + 0.825_[10] )
| 0.825 | Integer |
| :---: | :-----: |
| 1.65 | 1 |
| 1.3 | 1 |
| 0.6 | 0 |
| 1.2 | 1 |
| 0.4 | 0 |
| 0.8 | 0 |
| 1.6 | 1 |
| 1.2 | 1 |
| ... | ... |
1
2
3
4
. .
-13.825_[10] = -1101.1101001_[2]
. .
= -(1.1011101001 * 2^3)_[2]
`Biased Exponent = 3 + 127 = 130 = 1000 0010`
| S | Biased Exponent | Fraction |
| :-: | :-------------: | :--------------------------: |
| 1 | 1000 0010 | 101 1101 0011 0011 0011 0011 |
7.3 Arithmetic for Floats
7.3.1 Floating Point Addition
Steps:
- Shifted right the smaller number until its exponent matches the larger number
- Add the significands
- Normalize the sum
- Overflow or underflow ? => Exception
- Round the sum
- Need further normalization ? => goto 3
Example:
1 | 1.0*2^(-1) - 1.11*2^(-2) (using 4-bit precision) |
7.3.2 Floating Point Multiplication
Steps:
- Add the exponents
- Multiply the significands
- Normalize the product
- Overflow or underflow ? => Exception
- Round the product
- Need further normalization ? => goto 3
- Sign the product
Example:
1 | - 1.0*2^(-1) * 1.11*2^(-2) (using 4-bit precision) |
7.3.3 IEEE 754 Rounding
Rounding modes:
- Round Up
- Round Down
- Toward Zero
- Round to Even
(MIPS and Java use round to even by default)
7.4 Error Propagation
7.4.1 Floating Point Arithmetic Errors
Sources of floating point arithmetic errors:
- Rounding
- Approximate representation
Denote:
m
: the number to be representedn
:m
's approximation
The absolute error: a = |m - n|
The relative error: r = a / |m|
The true value m
∈ [n-a, n+a] ∪ [n(1-r), n(1+r)]
7.4.2 Error Propagation
1 | // Addition or subtraction sums absolute error of operands |
Example: What is the absolute error in y^2 - 4xz
?
1 | y^2 * (1 ± 2*r_y) - 4xz(1 ± (r_x + r_z)) |
7.5 MIPS Floating Point Unit
7.5.1 IEEE 754 for MIPS
Floating Point Unit (FPU): performs IEEE operations (MIPS core refers to FPU as coprocessor 1)
FPU features 32 single precision registers ($f0, $f1, …, $f31) or as 16 pairs of double precision registers ($f0, $f2, …, $f30, here $fi actually stands for the pair $fi and $f(i+1)).
FPU instructions does not raise exceptions. So checking for
±∞
orNaN
is needed.FPU defaults to round to even.
7.5.2 Floating-Point Instructions in MIPS
- General format:
mmm.s
: single precisionmmm.d
: double precision
- Addition:
add.s fd, fs, ft
e.g.add.s $f0, $f1, $f2
add.d fd, fs, ft
e.g.add.d $f0, $f2, $f4
- Subtraction
sub.s
sub.d
- Multiplication
mul.s
mul.d
- Divsion
div.s
div.d
7.5.3 Load/Store for Floating Point
1 | // Load word copr. 1 (32-bit word) |
No encoding for immediate floating-point operands.
Must be placed in .data
segment due to too many bytes.
Assembler directives: .float n
or .double n
.
7.5.4 Copying Data Between FPU and CPU
1 | // Move from copr. 1 |
Only the bit pattern is copied, not the actual value that it represents.
7.5.5 Floating Point I/O
(See full list on 4.5.5)
7.5.6 Example: Area of a Circle
1 | .data |
7.5.7 Floating Point Comparison
General format:
c.x.s
: singlec.x.d
: double
where x
can be:
eq
: equalneq
: not equallt
: less thanle
: less than or equalgt
: greater thange
: greater than or equal
7.5.8 Floating Point Branch
bclt label
- Branch on coprocessor 1 true- Goto
label
if flag is true, otherwise continue
- Goto
bclf label
- Branch on coprocessor 1 false- Goto
label
if flag is false, otherwise continue
- Goto
Example: jump to islt
if the contents of $f0<$f1
as single precision floating point numbers
1 | c.lt.s $f0, $f1 |
8 MIPS Procedure
8.1 Overview
- MIPS Procedure
- Recursive Procedure
8.2 Procedures
8.2.1 Procedures
- Definition: protion of code, within a larger program, which appear frequently
- Help to
- reduce code duplication
- improve code re-usability
- decompose complex programs into manageable parts
- Other names
- methods (Java, OO languages)
- functions (C, C++, Haskell)
- routines, subroutines (not popular anymore)
8.2.2 Callee vs Caller
- Callee: a function called by a caller
- Caller: a functoin calling a callee
8.2.3 MIPS Procedure Calling Convention I
- Arguments are passed in registers
$a0
~$a3
- The return value is passed in register
$v0
(and$v1
if necessary) - The return address of the caller is stored in register
$ra
jal label
: jump and link- Effect: call a procedure
- Side effects:
$ra = PC + 4
PC = [label]
js rs
: jump register- Effect: jump to the address specified by the register
rs
- (To return to the caller using
jr $ra
)
- Effect: jump to the address specified by the register
8.2.4 Example: x * x + y * y
1 | .text |
8.2.5 MIPS Procedure Calling Convention II
- To use any registers within the procedure:
- Save what is in these registers at the start of the procedure
- Restore these values at the end of the procedure
- The callee has to restore:
$s0 ~ $s7
,$ra
,$sp
- The caller is responsible itself for most other registers: e.g.
$t0 ~ $t7
,$a0 ~ $a3
We use a stack to do this.
8.2.6 The Stack
- In MIPS, stack is stored in memory (last-in, first-out [lifo] list)
- The stack pointer
$sp
- points to the last location on stack
- contains the address of the next free location
- Each procedure is associated with a call frame with a frame pointer
$fp
- points to the first word of procedure frame
- does not change during the execution
8.2.7 Stack: Push & Pop
8.2.8 Stack in MIPS
To push
$s0
onto the stack ($s0
is placed on the stack):1
2addi $sp, $sp, -4 # allocate space on the stack
sw $s0, 0($sp) # save $s0 -- pushTo pop
$s0
from the stack ($s0
is taken from the stack):1
2lw $s0, 0($sp) # restore $s0 -- pop
addi $sp, $sp, 4 # deallocate space
8.2.9 MIPS Procedure Calling Convention III
- Caller
- Push any of
$a0
~$a3
,$v0
~$v1
and$t0
~$t0
if necessary - Place arguments in
$a0
to$a3
if needed - Make the call using
jal callee
- Pop saved registers and/or extra arguments off stack
- Push any of
- Callee
- Push any of
$ra
,$s0
~$s7
that may be overwrritten - Perform desired task
- Place result in
$v0
and$v1
- Pop above registers off the stack
- Return to caller with
jr $ra
- Push any of
8.2.10 Argument Passing
First four arguments:
$a0
~$a3
Remaining arguments: put on stack by the caller
Example:
fun(int x0, int x1, …, int x7)
8.2.11 Return Values
- First two return values:
$v0
, ($v1
) - Remaining return values: put on stack by the procedure, and popped from the stack by the caller
8.2.12 MIPS Procedure Call: Good Strategy
- Caller
- At call time
- Put arguments in
$a0
~$a3
- Save any caller-save temporaries
- Put arguments in
- At call time
- Callee
- At entry
- Allocate all stack space
- Save
$ra
,$s0
~$s3
- At exit
- Restore
$ra
,$s0
~$s3
- Deallocate all stack space
- Put return value in
$v0
- Restore
- At entry
- Caller
- After return
- Retrieve return value from
$v0
- Restore any caller-save temporaries
- Retrieve return value from
- After return
8.2.13 Example: g(x) = f(x, x+1) + f(x+2, x+3)
1 | f: # procedure f, args x=$a0, y=$a1, returns x*x+y*y |
8.3 Recursive Procedure
8.3.1 Example: Factorial
1 | # int fact(int n): return n<=0 ? 1 : n * fact(n-1); |
9 Piplining
9.1 Overview
- Pipeline
- Pilined Implementation
- Obstacles to pipelining
9.2 Pipelining
9.2.1 Functional Units and Their Timings
Executing a MIPS instruction can take up to five functional units:
Type | Duration | Stage | Use |
---|---|---|---|
Memory | 200ps | IF (Instruction Fetch) | get instruction from memory |
Memory | 200ps | MEM (Memory Access) | data memory read or write |
ALU | 200ps | EX (Execute) | ALU operation |
Register | 100ps | ID (Instruction Decode) | get source register operands |
Register | 100ps | WB (Write-Back) | result to destination register |
9.2.2 Memory Hierarchy
9.2.3 Critical Paths and Instruction Timings
Instruction Class | IF | ID | EX | MEM | WB | Total |
---|---|---|---|---|---|---|
R-format(add , sub , AND , OR , slt ) |
200 | 100 | 200 | 100 | 600 | |
Load Word (lw ) |
200 | 100 | 200 | 200 | 100 | 800 |
Store Word (sw ) |
200 | 100 | 200 | 500 | ||
Branch (beq ) |
200 | 100 | 200 | 500 | ||
Jump (j ) |
200 | 200 |
(This calculation assumes that the multiplexors, control unit, PC accesses, and sign extension unit have no delay.)
9.2.4 Pipelining
In this case, we see a fourfold speed-up on average time between instructions, from 800 ps down to 200 ps. The pipeline stage times of a computer are limited by the slowest resource, either the ALU operation or the memory access. We assume the write to the register file occurs in the first half of the clock cycle and the read from the register file occurs in the second half. We use this assumption throughout this chapter.
Pipelining: an implementation technique in which multiple instructions are overlapped in execution.
Executing n
(n >> s
) instructions takes:(n + (s-1)) * t / s ≈ nt / s
.
Where:
t
is the timing of every instructions
is the number of the stages of every instructiont/s
is the timing of a cycles-1
is the further cycles that then
-th instruction needs to complete
9.2.5 Designing ISAs for Pipelining
- MIPS instructions are the same length. => much easier to fetch instructions in the first pipeline stage, and to decode them in the second stage.
- MIPS has only a few instruction formats, with the source register fields being located in the same place in each instruction. => the second stage can begin reading the register file at the same time that the hardware is determining what type of instruction was fetched.
- MIPS memory operands only appear in loads or stores. => the execute stage can be used to calculate the memory address, and then access memory in the following stage.
- MIPS operands must be aligned in memory. => the requested data can be transferred between processor and memory in a single pipeline stage.
9.2.6 Pipeline Hazards
Structural Hazard: When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
- E.g. In MIPS pipeline with a single memory: both load/store and insrtuction fetch require memory access at the same time
- Solutions:
- Avoid scheduling instructions that would create sturctural hazards
- Hardware includes control logic that stalls until earlier instruction is no longer using contended resource
- Duplicate hardware to design so that each instruction can access to independent resources at the same time.
Data Hazard: When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available.
Solutions:
Avoid
Stall
Forwarding / Bypassing: A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmervisible registers or memory.
E.g. EX forwarding:
add $s0, $t0, $t1
,sub $t2, $s0, $t3
Extra hardware to take result directly from ALU output when it is computed instead of waiting for it to be stored in a register.
Shading on the right half of the register file or memory means the element is read in that stage.
Shading of the left half means it is written in that stage.
MEM has a white background because
add
does not access the data memory.E.g. MEM forwarding:
lw $s0, 20($t1)
,sub $t2, $s0, $t3
This figure shows an important pipeline concept, officially called a pipeline stall, but often given the nickname bubble.
Without the stall, the path from memory access stage output to execution stage input would be going backward in time, which is impossible. This figure is actually a simplification, since we cannot know until after the subtract instruction is fetched and decoded whether or not a stall will be necessary.
Control Hazard / Branch Hazard: When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.
Solutions:
Stall
This example assumes the conditional branch is taken, and the instruction at the destination of the branch is the OR instruction. There is a one-stage pipeline stall, or bubble, after the branch. In reality, the process of creating a stall is slightly more complicated. The effect on performance, however, is the same as would occur if a bubble were inserted.
Branch Prediction: A method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.
The top drawing shows the pipeline when the branch is not taken. The bottom drawing shows the pipeline when the branch is taken. As we noted previously, the insertion of a bubble in this fashion simplifies what actually happens, at least during the first clock cycle immediately following the branch.
If the prediction is wrong, the pipeline control must ensure that the instructions wrongly executed have no effectand, and must restart the pipeline from the proper branch address
- Static Branch Prediction:
- Predict "taken":
- Backwards branches (likely to be a loop)
- Unconditional branches (or jumps)
- Predict "not taken":
- Forward branches (likely to be if-then control)
- Predict "taken":
- Dynamic Branch Prediction:
- Keep track of historic branch decisions
- Over 90% prediction accuracy
- Static Branch Prediction:
Delayed Decision (actually used by the MIPS architecture): The delayed branch always executes the next sequential instruction, with the branch taking place after that one instruction delay. It is hidden from the MIPS assembly language programmer because the assembler can automatically arrange the instructions to get the branch behavior desired by the programmer. MIPS software will place an instruction immediately after the delayed branch instruction that is not affected by the branch, and a taken branch changes the address of the instruction that follows this safe instruction. In our example, the add instruction before the branch in Figure 4.31 does not affect the branch and can be moved after the branch to fully hide the branch delay. Since delayed branches are useful when the branches are short, no processor uses a delayed branch of more than one cycle. For longer branch delays, hardware-based branch prediction is usually used.
9.2.7 Example: Reordering Code to Avoid Pipeline Stalls
1 | # a = b + e; |
10 Pointers and References
10.1 Overview
- Pointers
- Reference
- Arrays
- Strings
10.2 Pointers
10.2.1 Pointers
Pointer: a variable containing the address of another variable.
In C: (Assume c
in memory at address 0x10000000
, p
in $a0
, x
in $s0
.)
1 | int c = 100; |
In MIPS:
1 | lui $a0, 0x10000000 # p = &c = 0x10000000 |
10.3 References
10.3.1 References
In Java:
1 | class Int{ |
In MIPS:
1 | .data |
10.4 Arrays
10.4.1 Arrays
1 | .data |
1 | .data |
1 | # return the max value of an array |
1 | # Clearing an array using reference |
1 | # Clearing an array using pointer |
10.5 Strings
10.5.1 Strings
Example: printing the length of a string
In C:
1 | int strlen (char *s) |
In MIPS:
1 | .data |