Disk accesses, memory accesses, I/O activities, etc
CPU execution time (CPU time)
The actual time the CPU spends computing for a specific task
Exclude time spent waiting for I/O or running other programs
Further divided into:
User CPU time: the CPU time spent in a program itself
System CPU time: the CPU time spent in the OS performing tasks on behalf of the program
3.2.4 CPU Clocking
Clock: determines when events take place in hardware
Clock signal: produced by an external oscillator circuit
Clock cycles (clock ticks, clock periods): the time for one clock period, usually of the processor clock, which runs at a constant rate (like 250ps)
Clock frequency (clock rate): cycles per second, which is the inverse of the clock period (like 4.0GHz)
3.2.5 CPU Time
1 2
CPU Time = CPU Clock Cycles * Clock Cycle Time = CPU Clock Cycles / Clock Rate
To improve performance:
Reducing the number of clock cycles
Increasing clock rate
Trade-off: the number of clock cycles & the clock cycle time
3.2.6 Instruction Performance
The execution time of instructions: determined by the number of instructions in a program
The instruction count for a program: determined by program, ISA and compiler
Clock cycles per instruction (CPI): the average number of clock cycles each instruction takes to execute, determined by CPU hardware
1
CPU Clock Cycles = Instruction Count * CPI
1 2
CPU Time = Instruction Count * CPI * Clock Cycle Time = Instruction Count * CPI / Clock Rate
If different instruction classes take different numbers of cycles:
1 2 3
n Clock Cycles = ∑(CPI_i * Instruction Count_i) i=1
Weighted average CPI:
1 2 3 4
CPI = CPU Clock Cycles / Instruction Count n = ∑(CPI_i * Instruction Count_i / Instruction Count) i=1
Instruction Count: relative frequency
3.2.7 Understanding Program Performance
Algorithm
Affects insturction count
Affects CPI (possibly) - by favouring slower or faster instructions
Programming language
Affects instructions count
Affects CPI - beacause of its own features
Compiler
Affects instructions count
Affects CPI
ISA
Affects instructions count
Affects CPI, clock rate
3.3 Power Wall
Complememtray metal oxide semiconductor (CMOS): the dominant technology for integrated circuits
Dynamic energy: the primary source of energy consumption for CMOS, consumed when transistors switch states from 0 to 1 and vice versa.
1
Energy ∝ Capacitive load * Voltage^2
The energy of a single transition:
1
Energy ∝ 0.5 * Capacitive load * Voltage^2
The power required per transistor:
1
Power ∝ 0.5 * Capacitive load * Voltage^2 * Frequency
3.4 SPEC Benchmark
System Performance Evalucation Cooperative (SPEC): the standard sets of benchmarks for modern computer systems
SPEC CPU2006
SPEC CPU2006
SPEC Power Benchmark
SPEC Power Benchmark
3.5 Amdah's Law
3.5.1 Amdah's Law
Amdah's Law: the performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. It is a quantitative version of the law of diminishing returns.
The execution time of the program after the improvement:
Microprocessor without Interlocked Processor States (MIPS)
ISA based on Reduced Instruction Set Computing (RISC) CPU design strategy
Simple instruction, fast execution
1980s 32-bit, 1991 64-bit
4.2.2 Bits, Bytes and Words
Bit: binary digit (either 0 or 1)
Byte: a sequence of bits (having 8 bits in length since the mid 1960's to get 256 possibilities)
Word: amount of data computer can process in one step. (Most CPUs have a word size of 32 or 64 bits today. On the 32-bit MIPS, a word is 4 bytes long.)
4.3 MIPS Structure
4.3.1 MIP32 Structure
MIP32 Structure
1 word = 32 bits = 4 bytes
Addresses are 32-bit words => (2^32 different locations)
Registers are 32-bit words
4.3.2 Fetch-Execute Cycle
Fetch the instruction: fetch the instruction from the memory address that is currently stored in the program counter (PC), and store it in the instruction regitser (IR).
Decode the instruction: the encoded instruction presented in the IR is interpreted by the decoder
Execute the instruction
Write back
1 2 3 4 5 6 7 8 9 10
Repeat { Fetch (PC): Fetch the instruction word (at PC) Instruction decoded Calculate next instruction address Execute (ALU, Registers and Control): Read operands Executes the operations Write/store results }
4.4 MIPS Syntax & Instruction
4.4.1 MIPS Programming: "Hello, world!"
1 2 3 4 5 6 7 8 9 10
.data msg: .asciiz "Hello, world!\n" .text .globl main main: la $a0, msg # load label msg li $v0, 4 # load immediate syscal # print it li $v0, 10 syscall # exit
4.4.2 MIPS Basic Syntax
Assembler directives begin with a "."
.data
Start assembling data used by the program
.text
Start assembling program instructions
.asciiz
Place a null-terminated ASCII string in memory
Labels are names followed by a colon ":"
Descriptive names chosen by programers
The assembler will assign memory addresses to labels for later reference
The label "main" is the entry of the program
Machine instructions
Lines after "main:" contain symbolic machine instructions
# rd = not rs (implementing negation using nor) nor rd, rs, $zero
# rd = rs xor rt xor rd, rs, rt
# rd = rs xor imm xori rd, rs, imm
5 Sign Numbers, Overflow, Multiplication and Division
5.1 Overview
Sign numbers
Overflow
MIPS shift opreations
MIPS multiplication
MIPS division
5.2 Sign Numbers
5.2.1 4-Bit Signed Encodings
(See details in CSF.)
Binary
Unsigned
Sign and Magnitude
1's Complement
2's Complement
Excess-8
0000
0
0
0
0
-8
0001
1
1
1
1
-7
0010
2
2
2
2
-6
0011
3
3
3
3
-5
0100
4
4
4
4
-4
0101
5
5
5
5
-3
0110
6
6
6
6
-2
0111
7
7
7
7
-1
1000
8
-0
-7
-8
0
1001
9
-1
-6
-7
1
1010
10
-2
-5
-6
2
1011
11
-3
-4
-5
3
1100
12
-4
-3
-4
4
1101
13
-5
-2
-3
5
1110
14
-6
-1
-2
6
1111
15
-7
-0
-1
7
5.2.2 Sign Extension
Sign extension: the operation of increasing the number of bits of a binary number while preserving the number's sign(+/-) and value
Examples:
Befor Extension
After Extension
00 1010
0000 0000 0000 1010
11 1111 0001
1111 1111 1111 0001
5.3 Overflow
5.3.1 Problem
Overflow: happens when the last carry(1) cannot be accommodated.
Situations:
Operation
Result
+ + +
-
- + -
+
+ - -
-
- - +
+
5.3.2 Solution in MIPS
Exception Program Counter (EPC): a register in MIPS to contain the address of the instruction that caused the exception.
The instruction move from system control (mfc0) is used to copy EPC into a general-purpose register so that MIPS software has the option of returning to the offending instruction via a jump register instruction.
Causing exceptions on overflow:
add
addi
sub
Not causing exceptions on overflow:
addu (add unsigned)
addiu
subu
5.3.3 References
MIPS Overflow Exception
MIPS can trap on overflow, but unlike many other computers, there is no conditional branch to test overflow. A sequence of MIPS instructions can discover overflow. For signed addition, the sequence is the following:
MIPS stores the 64-bit result of the multiplication of two 32-bit registers in two special 32-bit registers Hi (32~63 bits) and Lo (0~31 bits).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
# Hi~Lo = $s1 * $s2 (signed) mult $s1, $s2
# Hi~Lo = $s1 * $s2 (unsigned) multu $s1, $s2
# Store the lower 32-bit of the result of the previous operation in register $s0 mflo $s0
# Store the higher 32-bit of the result of the previous operation in register $s0 mfhi $s0
# Since we are commonly only interested in the 32-bit result of a multiplication, the sequence: mult $s1, $s2 mflo $s0 # can be encoded as one operation: mul $s0, $s1, $s2
# Note that mul doesn't check for overflow, but a pseudo-insturtion multiply with overflow `mulo` does
5.6 MIPS Division
5.6.1 Division
dividend = quotient * divisor + remainder
Algorithm: repeatedly try to subtract the left-shifted divisor from the dividend, shifting the divisor right each time
Example:
(Long Division Decimal:)
Long Division Decimal
(Long Division Binary:)
Long Division Binary
Carry out division for absolute values of the dividend and divisor
Negate the quotient if the signs disagree
The remainder must have the same signs as dividend
5.6.2 MIPS Division Instructions
MIPS stores the 64-bit result of the division of two 32-bit registers in two special 32-bit registers Hi (remainder) and Lo (quotient).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Hi~Lo = $s1 ÷ $s2 (signed) div $s1, $s2
# Hi~Lo = $s1 ÷ $s2 (unsigned) divu $s1, $s2
# Store the quotient of the previous operation in register $s0 mflo $s0
# Store the remainder of the previous operation in register $s0 mfhi $s0
# The instruction: div $s0, $s1, $s2 # corresponds to: div $s1, $s2 mflo $s0 # i.e.: $s0 = $s1 / $s2
6 Machine Code, Addressing
6.1 Overview
Machine code VS. assembly language
MIPS instruction encodings
Memory organization
MIPS addressing modes
Stack and Heap
Dynamic data
6.2 Machine Code VS. Assembly Language
1 2 3 4 5 6 7 8
# Assembly add $s0, $s1, $s2
# Hexadecimal (machine code) 0232 8020
# Binary 0000 0010 0011 0010 1000 0000 0010 0000
6.3 MIPS Instruction Encodings
6.3.1 R-format - Operands Are Registers Only
op
rs
rt
rd
shamt
funct
bits
6
5
5
5
5
6
op - operation code (basic operation of the instruction)
rs - first source register operand
rt - second source register operand
rd - destination register operand
shamt - shift amount (used in shift instruction)
funct - function code (select the variant of the operation in the op code field)
- Convert hexadecimal to binary to find the op fields - Look at the op filed (bits 31:26) to determine the operation - Reformat the binary instruction into corresponding format fields - Decode the operation type, instruction by checking the field values
### 6.3.6 Register VS Memory
- Processor can access registers directly - Limit of 32 registers - Most programs require much more data than can be held in registers (like text, code and images) - Extra data must kept in memory - **Load**: transfer data from memory to register - **Store**: transfer data from register to memory
> #### Key Differences Between Register and Memory > > 1. The primary difference between register and memory is that register **holds the data that the CPU is currently processing** whereas, the memory **holds the data the that will be required for processing**. > 2. The Register ranges from **32-bits register to 64-bits register** whereas, the memory capacity ranges from some **GB** to some **TB**. > 3. The processor accesses register **faster** than the memory. > 4. Computers registers are **accumulator register, program counter, instruction register, address register**, etc. On the other hands, memory is referred as the main memory of the computer which is **RAM**.
## 6.4 Memory Organization
### 6.4.1 Memory
Memory is organized as a sequence of bytes.
| Address | Value | | :-----: | :---------: | | 0 | 8-bit value | | 1 | 8-bit value | | 2 | 8-bit value | | ... | ... |
Registers hold 32 bits (= 4 bytes = 1 word).
| Address | Value | | :-----: | :----------: | | 0 | 32-bit value | | 4 | 32-bit value | | 8 | 32-bit value | | ... | ... |
### 6.4.2 Data Transfer Instruction: Load Word
# \(t0 = Memory[\)t1+n] lw \(t0, n(\)t1)
The address has to be divisible by 4 (word aligned).
The offset n has to fit within 16 bit, i.e. -32768 <= n <= 32767.
_Complete data transfer instruction list see: pp A-49 ~ A-80._
### 6.4.5 32-Bit Immediate and Addresses
- Problem:
- MIPS ISA uses fixed size 32-bit instructions, so there are always < 32 bits available for any instruction arguments - However, sometimes 32-bit constants and addresses are necessary
- Solution:
- Large constants and 32-bit addresses have to be broken into pieces and then reassembled into a register
- Example:
- Load the following 32-bit constant into register `$s0`:
_Do not use `addi` in STEP 2. Because `addi` is with sign-extend. If the immediate value is negative, then the upper 16 bits will be all 1's, and adding this will ruin the upper 16 bits._
## 6.5 MIPS Addressing Modes
### 6.5.1 Register Addressing
Register addressing: a source or destination operand is specified as content of one of the registers `$0`~`$31`.
E.g. `add $s0, $s1, $s2`
### 6.5.2 Immediate Addressing
Immediate addressing: a numeric value embedded in the instruction is the actual operand.
E.g. `addi $s0, $s1, 100`
### 6.5.3 PC-Relative addressing
PC-relatvie addressing: a data or instruction memory location is specified as an offset relatvie to the incremented PC.
(PC refers to special purpose register , it stores the address of next instruction to be fetched: the operand `address = PC + offset`. Usually used in conditional branch.)
E.g. `bne $s0, $s1, Label`
### 6.5.4 Base Addressing
Base addressing: a data or instruction memory location is specified as a signed offset from a register. The address of the operand is the sum of the immediate and the value in a register.
E.g. `lw $t0, 4($s0)`
### 6.5.5 Pseudodirect Addressing
Pseudodirect addressing: the memory address is (mostly) embedded in the J-type instructions.
E.g. `j Label`
### 6.5.6 Addressing Mode vs Instruction Type
- Addressing mode: how an address (memory or register) is determined - Instruction type: how an instruction is put together
## 6.6 Stack and Heap
Two available areas of memory used to store variables are commonly referred to as the stack and the heap.
### 6.6.1 Stack
- Store function local variables - Store temporary variables - Pass the arguments for next function call - Return the address when function ends
### 6.6.2 Heap
- A much larger pool of memory than stack - Must be explicitly allocated and deallocated by the programmer manually - _This is completely unrelated to the Heap data structure (?)_
## 6.7 Dynamic Data
### 6.7.1 MIPS Memory Allocation
To allocate additional memory, use `sbrk` system call (9).
Example: allocate 16 bytes (= 4 words) of memory and returns the address of the new memory in $v0.
There is no system call to deallocate memory in MIPS. Either does JAVA. While JAVA uses <u>garbage collection</u> technique to reclaim unused memory automatically.
# 7 Floating Point Numbers
## 7.1 Overview
- Floating Point Representation - Arithmetic for Floats - Error Propagation - MIPS Floating Point Unit
## 7.2 Floating Point Representation
### 7.2.1 Scientific Notation
General form:
data:image/s3,"s3://crabby-images/76898/768985586899763a381d641997506b0713c4d3b0" alt="Floating Point Representation"
where:
- `m` = significand - `b` = base number (any positive integer) - `e` = exponent (an integer)
E.g.
- `2.73*10^(-7)` is in normalized scientific notation - `0.273*10^(-6)` is not in normalized scientific notation - `20.73*10^(-8)` is not in proper scientific notation
### 7.2.2 Normalized Scientific Notation for Binary
General form:
data:image/s3,"s3://crabby-images/861eb/861ebbe89928a6541bec8fd79bdcb7edbca77e7f" alt="Normalized Scientific Notation for Binary"
E.g.
- `11.101 = 1.1101 * 2^1` is normalized form - `11.101 = 11.101 * 2^0` is not normalized form
### 7.2.3 IEEE 754 Floating Point Standard
IEEE 754 Floating Point Standard: a type of sign and magnitude representation
data:image/s3,"s3://crabby-images/581ba/581ba74af65a79469f417fd9f7cb96457913245d" alt="IEEE 754 Floating Point Standard"
- **S**: sign bit of the fraction (0 for +, 1 for -) - **Biased Exponent**:
- In <u>excess representation</u> for fast sorting (2's complement with negative exponent will make it look like a big number)
> The designers of IEEE 754 also wanted a floating-point representation that could be easily processed by integer comparisons, especially for sorting. This desire is why the sign is in the most significant bit, allowing a quick test of less than, greater than, or equal to 0. (It’s a little more complicated than a simple integer sort, since this notation is essentially sign and magnitude rather than two’s complement.)
- A fixed value called <u>bias</u> is subtracted from the field to get the true exponent value
- `Bias = 2^(k-1) - 1`, where `k` is the number of bits in exponent field. E.g. Single precision, `bias = 127`; double precision, `bias = 1023`.
- Overflow: result > largest representable number (`127` for single-precision) => ±∞ - Underflow: result < smallest normalized number (`-126` for single-precision) => denormalized number - Infinities: `1/0` - NaN: `0.0/0.0`, `±∞/±∞`, `0*±∞`, `-∞+∞`, `sqrt(-1.0)`, `log(-1.0)`
> The purpose of NaNs is to allow programmers to postpone some tests and decisions to a later time in the program when they are convenient.
### 7.2.6 Binary Floating Point ⇌ Decimal Floating Point
Floating Point Unit (FPU): performs IEEE operations (MIPS core refers to FPU as coprocessor 1)
FPU features 32 single precision registers ($f0, $f1, …, $f31) or as 16 pairs of double precision registers ($f0, $f2, …, $f30, here $fi actually stands for the pair $fi and $f(i+1)).
FPU instructions does not raise exceptions. So checking for ±∞ or NaN is needed.
# int fact(int n): return n<=0 ? 1 : n * fact(n-1); # assume $a0 = n
fact: addi $sp, $sp, -8 # space for two words sw $ra, 4($sp) sw $a0, 0($sp)
li $v0, 1 # $v0 = 1 blez $a0, fact_return # if ($a0<=0) then {goto fact_return}
addi $a0, $a0, -1 # n--
jal fact
tmpl: lw $a0, 0($sp) # retrieve original n mul $v0, $v0, $a0 # n * fact(n-1)
fact_return:lw $ra, 4($sp) addi $sp, $sp, 8 jr $ra
9 Piplining
9.1 Overview
Pipeline
Pilined Implementation
Obstacles to pipelining
9.2 Pipelining
9.2.1 Functional Units and Their Timings
Executing a MIPS instruction can take up to five functional units:
Type
Duration
Stage
Use
Memory
200ps
IF (Instruction Fetch)
get instruction from memory
Memory
200ps
MEM (Memory Access)
data memory read or write
ALU
200ps
EX (Execute)
ALU operation
Register
100ps
ID (Instruction Decode)
get source register operands
Register
100ps
WB (Write-Back)
result to destination register
9.2.2 Memory Hierarchy
Memory Hierarchy
9.2.3 Critical Paths and Instruction Timings
Instruction Class
IF
ID
EX
MEM
WB
Total
R-format(add, sub, AND, OR, slt)
200
100
200
100
600
Load Word (lw)
200
100
200
200
100
800
Store Word (sw)
200
100
200
500
Branch (beq)
200
100
200
500
Jump (j)
200
200
(This calculation assumes that the multiplexors, control unit, PC accesses, and sign extension unit have no delay.)
9.2.4 Pipelining
Single-cycle, nonpipelined execution in top versus pipelined execution in bottom.
In this case, we see a fourfold speed-up on average time between instructions, from 800 ps down to 200 ps. The pipeline stage times of a computer are limited by the slowest resource, either the ALU operation or the memory access. We assume the write to the register file occurs in the first half of the clock cycle and the read from the register file occurs in the second half. We use this assumption throughout this chapter.
Pipelining: an implementation technique in which multiple instructions are overlapped in execution.
Executing n (n >> s) instructions takes:(n + (s-1)) * t / s ≈ nt / s.
Where:
t is the timing of every instruction
s is the number of the stages of every instruction
t/s is the timing of a cycle
s-1 is the further cycles that the n-th instruction needs to complete
9.2.5 Designing ISAs for Pipelining
MIPS instructions are the same length. => much easier to fetch instructions in the first pipeline stage, and to decode them in the second stage.
MIPS has only a few instruction formats, with the source register fields being located in the same place in each instruction. => the second stage can begin reading the register file at the same time that the hardware is determining what type of instruction was fetched.
MIPS memory operands only appear in loads or stores. => the execute stage can be used to calculate the memory address, and then access memory in the following stage.
MIPS operands must be aligned in memory. => the requested data can be transferred between processor and memory in a single pipeline stage.
9.2.6 Pipeline Hazards
Structural Hazard: When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.
E.g. In MIPS pipeline with a single memory: both load/store and insrtuction fetch require memory access at the same time
Solutions:
Avoid scheduling instructions that would create sturctural hazards
Hardware includes control logic that stalls until earlier instruction is no longer using contended resource
Duplicate hardware to design so that each instruction can access to independent resources at the same time.
Data Hazard: When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available.
Solutions:
Avoid
Stall
Forwarding / Bypassing: A method of resolving a data hazard by retrieving the missing data element from internal buffers rather than waiting for it to arrive from programmervisible registers or memory.
E.g. EX forwarding: add $s0, $t0, $t1, sub $t2, $s0, $t3
Graphical representation of forwarding.
Extra hardware to take result directly from ALU output when it is computed instead of waiting for it to be stored in a register.
Shading on the right half of the register file or memory means the element is read in that stage.
Shading of the left half means it is written in that stage.
MEM has a white background because add does not access the data memory.
E.g. MEM forwarding: lw $s0, 20($t1), sub $t2, $s0, $t3
We need a stall even with forwarding when an R-format instruction following a load tries to use the data.
This figure shows an important pipeline concept, officially called a pipeline stall, but often given the nickname bubble.
Without the stall, the path from memory access stage output to execution stage input would be going backward in time, which is impossible. This figure is actually a simplification, since we cannot know until after the subtract instruction is fetched and decoded whether or not a stall will be necessary.
Control Hazard / Branch Hazard: When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.
Solutions:
Stall
Pipeline showing stalling on every conditional branch as solution to control hazards.
This example assumes the conditional branch is taken, and the instruction at the destination of the branch is the OR instruction. There is a one-stage pipeline stall, or bubble, after the branch. In reality, the process of creating a stall is slightly more complicated. The effect on performance, however, is the same as would occur if a bubble were inserted.
Branch Prediction: A method of resolving a branch hazard that assumes a given outcome for the branch and proceeds from that assumption rather than waiting to ascertain the actual outcome.
Predicting that branches are not taken as a solution to control hazard.
The top drawing shows the pipeline when the branch is not taken. The bottom drawing shows the pipeline when the branch is taken. As we noted previously, the insertion of a bubble in this fashion simplifies what actually happens, at least during the first clock cycle immediately following the branch.
If the prediction is wrong, the pipeline control must ensure that the instructions wrongly executed have no effectand, and must restart the pipeline from the proper branch address
Static Branch Prediction:
Predict "taken":
Backwards branches (likely to be a loop)
Unconditional branches (or jumps)
Predict "not taken":
Forward branches (likely to be if-then control)
Dynamic Branch Prediction:
Keep track of historic branch decisions
Over 90% prediction accuracy
Delayed Decision (actually used by the MIPS architecture): The delayed branch always executes the next sequential instruction, with the branch taking place after that one instruction delay. It is hidden from the MIPS assembly language programmer because the assembler can automatically arrange the instructions to get the branch behavior desired by the programmer. MIPS software will place an instruction immediately after the delayed branch instruction that is not affected by the branch, and a taken branch changes the address of the instruction that follows this safe instruction. In our example, the add instruction before the branch in Figure 4.31 does not affect the branch and can be moved after the branch to fully hide the branch delay. Since delayed branches are useful when the branches are short, no processor uses a delayed branch of more than one cycle. For longer branch delays, hardware-based branch prediction is usually used.
9.2.7 Example: Reordering Code to Avoid Pipeline Stalls
# a = b + e; # c = b + f; # Assuming all variables are in memory and are addressable as offsets from $t0.
lw $t1, 0($t0) # load b from $t0+[0] lw $t2, 4($t0) # load e from $t0+[4] add $t3, $t1, $t2 # a = b + e sw $t3, 12($t0) # store a into $t0+[12] lw $t4, 8($t0) # load f from $t0+[8] * add $t5, $t1, $t4 # c = b + f sw $t5, 16($t0) # store c into $t0+[16]
# After
lw $t1, 0($t0) # load b from $t0+[0] lw $t2, 4($t0) # load e from $t0+[4] lw $t4, 8($t0) # load f from $t0+[8] * add $t3, $t1, $t2 # a = b + e sw $t3, 12($t0) # store a into $t0+[12] add $t5, $t1, $t4 # c = b + f sw $t5, 16($t0) # store c into $t0+[16]
10 Pointers and References
10.1 Overview
Pointers
Reference
Arrays
Strings
10.2 Pointers
10.2.1 Pointers
Pointer: a variable containing the address of another variable.
In C: (Assume c in memory at address 0x10000000, p in $a0, x in $s0.)
1 2 3 4 5
int c = 100; int *p; p = &c; x = *p; *p = 200;
In MIPS:
1 2 3 4
lui $a0, 0x10000000 # p = &c = 0x10000000 lw $s0, 0($a0) # x = *p = c = 100 addi $t0, $zero, 200 # $t0 = 200 sw $t0, 0($a0) # *p = $t0 = 200
10.3 References
10.3.1 References
In Java:
1 2 3 4 5 6 7 8 9 10 11 12 13
class Int{ public int n; public Int(int m) { n = m; } } public class Reference{ public static void main(String args[]){ Int a = new Int(1); // create an Intobject with n==1 Int b = new Int(2); // create an Intobject with n==2 a = b; // copy the reference b.n = 3; System.out.println(a.n); // 3 } }
In MIPS:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
.data aa: .word 3 bb: .word 2 .text .globl main main: la $a0, aa # load address of aa into $a0 la $a1, bb # load address of bb into $a1 jal swap ... # print a, b swap: lw $t0, ($a0) lw $t1, ($a1) sw $t1, ($a0) sw $t0, ($a1) jr $ra
10.4 Arrays
10.4.1 Arrays
1 2 3 4 5
.data as: .word 3, 1, 4, 1, 5, 9, 2, 6 .text main: la $t1, as # load address of as into $t1 lw $t2, 24($t1) # load word from $t1+24 (as[6]) into $t4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.data asx: .word 1, 2, 3 bsx: .word 5, 4 .text .globl main main: la $s0, asx la $s1, bsx move $s0, $s1 # $s0 = $s1 = the address of bsx li $t0, 7 sw $t0, 0($s1) # bsx[0] = 7 lw $a0, 0($s0) li $v0, 1 # print bsx[0] syscall li $v0, 10 # exit syscall
# return the max value of an array .data as: .word 3, 1, 4, 1, 5, 9, 2, 6 .text .globl main main: addi $sp, $sp, -4 # allocate the space on stack sw $ra, 0($sp) # save the return address -- push
la $a0, as # $a0 = &as li $a1, 8 # $a1 = 8 (the length of the array) jal array_max # array_max(&as, 8)