Operating Systems and Concurrency
Content
- Introduction to operating systems/computer design
- Processes, process scheduling, threading, ...
- Concurrency (deadlocks)
- Coursework clarification/revision
- Memory management, swapping, virtual memory, ...
- File Systems, file structures, management, ...
- Virtualisation
- Revision
Book Resources
- Abraham Silberschatz Operating System Concepts
- Modern Operating System - Tanenbaum
- Operating System Concepts 8th Edition
Introduction
Defining Operating Systems
What Can an OS Do For Me?
1 | import java.io.FileWriter; |
- File systems: where is the file physically written on the disk and how is it retrieved?
- Abstraction: why looks the instruction the same independent of the device?
- Concurrency: what if multiple programs access the same file simultaneously?
- Security: why is the access denied?
OS - A Virtual Machine Providing Abstractions (i.e. an Extended Machine)
- In the early days, programmers had to deal directly with the hardware
- Real computer hardware is ugly
- Hardware is extremely difficult to manipulate/program
- An operating system is a layer of indirection on top of the hardware
- It provides abstractions for application programs (e.g., file systems)
- It provides a cleaner and easier interface to the hardware and hides the complexity of “bare metal”
- It allows the programmer to be lazy by using common routines :-)
1 | ------------------ |
OS - A Resource Manager
- Many modern operating systems use multi-programming to improve user experience and maximise resource utilisation
- without multi-programming, CPU time is wasted while waiting for I/O requests
- The operating system must allocate/share resources (including CPU, memory, I/O devices) fairly and safely between competing processes:
- In time, e.g. CPUs and printers
- In space, e.g., memory and disks
- The execution of multiple programs (processes) needs to be interleaved with one another:
- This requires context switches and process scheduling ⇒ mutual exclusion, deadlock avoidance, protection, ...
Computer Hardware
Simplified Computer Model
CPU Design
- A CPU’s basic cycle consist of fetch, decode, and execute (pipelines, or superscalar)
- Every CPU has his own instruction set
- A CPU has a set of registers (extremely fast memory close to the CPU “core”)
- Registers are used to store data and for special functions (e.g. program counter, program status word – mode bit)
- The compiler/programmer decides what to keep in the registers
- Context switching must save and restore the CPU’s internal state, including its registers
CPU Design: Memory Management Unit
1 |
|
- Will the same or different addresses be displayed for
x
?- Same
- Will the value for
x
in the first run influence the value forx
in the second run?- Yes
Note that this does not work on Mac (which uses Address Space Layout Randomization – ASLR)
- There are two different address spaces:
- the logical address space seen by the process and used by the compiler
- the physical address space seen by the hardware/OS
- When compiling code, memory addresses must be assigned to variables and instructions, the compiler does not know what memory addresses will be available in physical memory
- It will just assume that the code will start running at address 0 when generating the machine code
- On some rare occasions, the process may run at physical address 0
- On other occasions, it will be running at a completely different location in physical memory and an offset is added
1 | physical address = logical address + offset |
In the case of our example:
- The address printed on the screen is the logical address
- The logical address is translated into two different physical addresses using different offsets
Modern computers use a logical and physical memory addresses:
- Every process has a logical address space – [0,MAX_64] (theoretically)
- MAX_64 = 16 exbibyte, 1 exbibyte = 1152921504606846976 bytes = 1024 pebibytes
- The machine has a physical address space – [0, MAX] (MAX determined by the amount of physical memory)
A context switch between processes invalidates the MMU (as well as registers, cache, ...)
- The CPU’s internal state must be saved and restored for every context switch (by the OS in the process control block)
Timer Interrupts
- Interrupts temporarily pause a process’s normal operation
- Types of interrupts
- Timer interrupts by CPU clock
- Context switches (i.e. switching between processes) can be initiated by timer interrupts after a “set time”
- I/O interrupts for I/O completion or error codes
- Software generated, e.g. errors and exceptions
- Timer interrupts by CPU clock
1 | 1. Timer generates an interrupt |
Moore’s Law
The number of transistors on an integrated circuit (chip) doubles roughly every two years.
- Closely linked, but not necessarily related to performance (performance roughly doubles until 2003)
- Moore’s still continuing, but the “power wall” slows performance improvements of single core/single processor systems
Multi-core, Hyperthreaded Processors
- Modern CPUs contain multiple cores and are often hyper-threaded
- Evolution in hardware has implications on operating system design
- Process scheduling needs to account for load balancing and CPU affinity
- Cache coherency becomes important
Memory
- Memory hierarchies used to balance cost and performance
- Fast and expensive memory is used for caching
- Slow and inexpensive memory is used for long term storage
- Memory includes, registers, L1/L2 cache, main/core memory, disk, etc.
- L2 Cache can be shared or dedicated to individual cores
- Cache management is mainly done by hardware
- The CPU can only access main memory directly (i.e. files have to be brought into memory first)
I/O Devices
- Device driver interacts with the controller, controller interacts with the device (e.g., disk controller)
- The operating system/device driver typically communicates with the controller through registers
- I/O can take place through:
- Busy waiting
- Interrupt based
- Direct memory access (using DMA chip)
Operating System Structure
- Operating Systems contain a lot of functionality, including
- Processes, process management, process scheduling, context switching, etc.
- Memory management, virtual memory, etc.
- File Systems
- I/O Management
- How are operating systems structured?
- Micro kernels
- Monolithic
Micro Kernels
- All non-essential functionality is extracted from the kernel
- Communication, memory management and CPU scheduling are likely to be included in the kernel
- The file system, GUI, device drivers are likely to be user processes
- Micro kernels are easier to extend, more portable, and usually more reliable
- Frequent system calls and kernel traps cause significant overhead (mode switches)
- Some Unix version, Mac OS X, Minix, and early versions of Windows (NT4.0) were (partially) micro kernels
Monolithic Systems
- All procedures are linked together into one single executable running in kernel mode
- Monolithic kernels are difficult to maintain
- Current versions of Windows, Linux are implemented as monolithic kernels
Processes and Threads
Processes
Definition
- A process is a running instance of a program
- A program is passive and “sits” on a disk
- A process has control structures associated with it, may be active, and may have resources assigned to it (e.g. I/O devices, memory, processor)
- A process is registered with the OS using its “control structures”
- i.e. an entry in the OS’s process table to a process control blocks (PCB)
- The process control block contains all information necessary to administer the process
- is essential for context switching in multiprogrammed systems
Memory Image
- A process’ memory image contains:
- The program code (could be shared between multiple processes running the same code)
- A data segment, stack and heap
- Every process has its own logical address space, in which the stack and heap are placed at opposite sides to allow them to grow
- Some OS’es use address space layout randomisation
Process States and Transitions
States
- A new process has just been created (has a PCB) and is waiting to be admitted (it may not yet be in memory)
- A ready process is waiting for CPU to become available (e.g. unblocked or timer interrupt)
- A running process “owns” the CPU
- A blocked process cannot continue, e.g. is waiting for I/O
- A terminated process is no longer executable (the data structures - PCB - may be temporarily preserved
- A suspended process is swapped out (not discussed further)
Transitions
- New → Ready: admit the process and commit to execution
- Running → Blocked: e.g. process is waiting for input or carried out a system call
- Ready → Running: the process is selected by the process scheduler
- Blocked → Ready: event happens, e.g. I/O operation has finished
- Running → Ready: the process is preempted, e.g., by a timer interrupt or by pause
- Running → Exit: process has finished, e.g. program ended or exception encountered
The interrupts/traps/system calls lie on the basis of the transitions
OS Queues
Context Switching
Multi-programming
- Modern computers are multi-programming systems
- Assuming a single processor system, the instructions of individual processes are executed sequentially
- Multi-programming goes back to the “MULTICS” age
- Multi-programming is achieved by alternating processes and context switching
- So true parallelism requires multiple processors
- When a context switch takes place, the system saves the state of the old process and loads the state of the new process (creates overhead)
- Saved ⇒ the process control block is updated
- (Re-)started ⇒ the process control block read
- A trade-off exists between the length of the time-slice and the context switch time
- Short time slices result in good response times but low effective “utilisation”
- e.g.: 99 * (1 + 1) = 198ms
- Long time slices result in poor response times but better effective “utilisation”
- e.g.: 99 * (100 + 1) = 9999ms
- Short time slices result in good response times but low effective “utilisation”
- A process control block contains three types of attributes:
- Process identification (PID, UID, Parent PID)
- Process control information (process state, scheduling information, etc.)
- Process state information (user registers, program counter, stack pointer, program status word, memory management information, files, etc.)
- Process control blocks are kernel data structures
- i.e. they are protected and only accessible in kernel mode!
- Allowing user applications to access them directly could compromise their integrity
- The operating system manages them on the user’s behalf through system calls (e.g. to set process priority)
Process Implementation
Tables and Control Blocks
- An operating system maintains information about the status of “resources” in tables
- Process tables (process control blocks)
- Memory tables (memory allocation, memory protection, virtual memory)
- I/O tables (availability, status, transfer information)
- File tables (location, status)
- The process table holds a process control block for each process, allocated upon process creation
- Tables are maintained by the kernel and are usually cross referenced
Switching Processes
1 | 1. Save process state (program counter, registers) |
System Calls
Process Creation
- The true system calls are “wrapped” in the OS libraries (e.g.
libc
) following a well defined interface (e.g.POSIX
,WIN32
API) - System calls for process creation:
- Unix:
fork()
generates and exact copy of parent ⇒exec()
- Windows:
NTCreateProcess()
- Linux:
Clone()
- Unix:
1 | Application |
fork()
fork()
creates an exact copy of the current processfork()
returns the process identifier of the child process to the parent processfork()
returns 0 to the child process
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Parent calls fork
// Child has identical copy of parent code
// Child code overwrites memory image
void main()
{
int iStatus;
int iPID = fork();
if(iPID < 0)
{
printf("fork error\n");
}
else if(iPID == 0)
{
printf("hello from child process\n");
execl("/bin/ls", "ls", "-l", 0);
}
else if(iPID > 0)
{
waitpid(iPID, &iStatus, 0);
printf("hello from parent process\n");
}
}
Process Termination
- System calls are necessary to notify the OS that the process has terminated
- Resources must be de-allocated
- Output must be flushed
- Process admin may have to be carried out
- A system calls for process termination:
- UNIX/Linux:
exit()
,kill()
- Windows:
TerminateProcess()
- UNIX/Linux:
Process Scheduling
Context
- The OS is responsible for managing and scheduling processes
- Decide when to admit processes to the system (new → ready)
- Decide which process to run next (ready → run)
- Decide when and which processes to interrupt (running → ready)
- The OS relies on the scheduler (dispatcher) to decide which process to run next, which uses a scheduling algorithm to do so
- The type of algorithm used by the scheduler is influenced by the type of operating system (e.g., real time vs. batch)
Classification by Time Horizon
- Long term: applies to new processes and controls the degree of multiprogramming by deciding which processes to admit to the system when
- A good mix of CPU and I/O bound processes is favourable to keep all resources as busy as possible
- Usually absent in popular modern OS
- Medium term: controls swapping and the degree of multi-programming
- Short term: decides which process to run next
- Manages the ready queue
- Invoked very frequently, hence must be fast
- Usually called in response to clock interrupts, I/O interrupts, or blocking system calls
Classification by Approach
- Non-preemptive: processes are only interrupted voluntarily (e.g., I/O operation or “nice” system call –
yield()
)- Windows 3.1 and DOS were non-preemtive
- Preemptive: processes can be interrupted forcefully or voluntarily
- This requires context switches which generate overhead, too many of them should be avoided (recall last lecture)
- Prevents processes from monopolising the CPU
- Most popular modern operating systems are preemptive
Performance Assessment
Criteria
- User oriented criteria:
- Response time: minimise the time between creating the job and its first execution
- Turnaround time: minimise the time between creating the job and finishing it
- Predictability: minimise the variance in processing times
- System oriented criteria:
- Throughput: maximise the number of jobs processed per hour
- Fairness:
- Are processing power/waiting time equally distributed?
- Are some processes kept waiting excessively long (starvation)
- Evaluation criteria can be conflicting, i.e., improving the response time may require more context switches, and hence worsen the throughput and increase the turn around time
Scheduling Algorithms
Overview
- Algorithms considered:
- First Come First Served (FCFS)/ First In First Out (FIFO)
- Shortest job first
- Round Robin
- Priority queues
- Performance measures used:
- Average response time: the average of the time taken for all the processes to start
- Average turnaround time: the average time taken for all the processes to finish
First Come First Served
- Concept: a non-preemtive algorithm that operates as a strict queueing mechanism and schedules the processes in the same order that they were added to the queue
- Advantages: positional fairness and easy to implement
- Disadvantages:
- Favours long processes over short ones (think of the supermarket checkout!)
- Could compromise resource utilisation, i.e., CPU vs. I/O devices
- Average response time = (0+7+9+15)/4 = 31/4 = 7.75
- Average turn around time = (7+9+15+20)/4 = 51/4 = 12.75
Shortest Job First
- Concept: A non-preemtive algorithm that starts processes in order of ascending processing time using a provided/known estimate of the processing
- Advantages: always result in the optimal turn around time
- Disadvantages:
- Starvation might occur
- Fairness and predictability are compromised
- Processing times have to be known beforehand
- Average response time = (0+2+7+13)/4 = 22/4 = 5.5
- Average turn around time = (2+7+13+20)/4 = 42/4 = 10.5
Round Robin
- Concept: a preemptive version of FCFS that forces context switches at periodic intervals or time slices
- Processes run in the order that they were added to the queue
- Processes are forcefully interrupted by the timer
- Advantages:
- Improved response time
- Effective for general purpose time sharing systems
- Disadvantages:
- Increased context switching and thus overhead
- Favours CPU bound processes (which usually run long) over I/O processes (which do not run long)
- Can be prevented by working with multiple queues?
- Can reduce to FCFS
- The length of the time slice must be carefully considered!
- For instance, assuming a multi-programming system with preemptive scheduling and a context switch time of 1ms:
- E.g., a good (low) response time is achieved with a small time slice (e.g. 1ms) ⇒ low throughput
- E.g., a high throughput is achieved with a large time slice (e.g. 1000ms) ⇒ high response time
- If a time slice is only used partially, the next process starts immediately
- Average response time = (0+1+2+3)/4 = 6/4 = 1.5
- Average turn around time = (6+17+19+20)/4 = 62/4 = 15.5
Priority Queues
- Concept: A preemptive algorithm that schedules processes by priority (high → low)
- The process priority is saved in the process control block
- Advantages: can prioritise I/O bound jobs
- Disadvantages: low priority processes may suffer from starvation (with static priorities)
- Average response time = (0+1+11+13)/4 = 25/4 = 6.25
- Average turn around time = (10+11+13+20)/4 = 54/4 = 13.5
Threads
Threads from an OS Perspective
- A process consists of two fundamental units:
- Resources
- A logical address space containing the process image (program, data, heap, stack)
- Files, I/O devices, I/O channels, . . .
- Execution trace: an entity that gets executed
- Resources
- A process can share its resources between multiple execution traces
- i.e., multiple threads running in the same resource environment
- Every thread has its own execution context (e.g. program counter, stack, registers)
- All threads have access to the process’ shared resources
- E.g. files, one thread opens a file, all threads of the same process can access the file
- Global variables, memory, etc. (⇒ synchronisation!)
- Some CPUs (hyperthreaded ones) have direct hardware support for multi-threading
- They can offer up to 8 hardware threads per core
Threads vs Processes
- Similar to processes, threads have:
- States
- Transitions
- A thread control block
- Threads incur less overhead to create/terminate/switch
- Address space remains the same for threads of the same process
- Inter-thread communication is easier/faster than interprocess communication
- (threads share memory by default)
- No protection boundaries are required in the address space
- (threads are cooperating, belong to the same user, and have a common goal)
- Synchronisation has to be considered carefully!
Why Use Threads
- Multiple related activities apply to the same resources, these resources should be accessible/shared
- Processes will often contain multiple blocking tasks
- I/O operations (thread blocks, interrupt marks completion)
- Memory access (pages faults are result in blocking)
- Such activities should be carried out in parallel/concurrently
OS Implementations of Threads
- User threads
- Kernel threads
- Hybrid implementations
3 Types of Threads
User Threads (Many-to-One)
- Thread management is carried out in user space with the help of a user library
- (creating, destroying, scheduling, thread control block manipulation)
- The process maintains a thread table managed by the runtime system without the kernel’s knowledge
- Similar to process table
- Used for thread switching
- Tracks thread related information
- Advantages:
- Threads are in user space
- (no mode switches required)
- Full control over the thread scheduler
- OS independent
- (threads can run on OS that do not support them)
- Threads are in user space
- Disadvantages:
- Blocking system calls suspend the entire process
- (user threads are mapped onto a single process, managed by the kernel)
- No true parallelism
- (a process is scheduled on a single CPU)
- Clock interrupts are non-existent
- (user threads are non-preemptive)
- Page faults result in blocking the process
- Blocking system calls suspend the entire process
Kernel Threads (One-to-One)
- The kernel manages the threads, user application accesses threading facilities through API and system calls
- Thread table is in the kernel, containing thread control blocks (subset of process control blocks)
- If a thread blocks, the kernel chooses thread from same or different process (↔ user threads)
- Advantages:
- True parallelism can be achieved
- No run-time system needed
- Disadvantages:
- Frequent mode switches take place, resulting in lower performance
Windows and Linux apply this approach
User Threads vs. Kernel Threads vs. Processes on Performance
- Null fork: the overhead in creating, scheduling, running and terminating a null process/thread
- Signal wait: overhead in synchronising threads
Hybrid Implementations (Many-to-Many)
- User threads are multiplexed onto kernel threads
- Kernel sees and schedules the kernel threads (a limited number)
- User application sees user threads and creates/schedules these (an “unrestricted” number)
Comparison
Thread Management
Libraries
- Thread libraries provide an API/interface for managing threads
- (e.g. creating, running, destroying, synchronising, etc.)
- Thread libraries can be implemented:
- Entirely in user space
- (i.e. user threads)
- Based on system calls
- (i.e. rely on the kernel for thread implementations)
- Entirely in user space
Examples of thread APIs include POSIX’s PThreads, Windows Threads, and Java Threads.
POSIX Threads
POSIX threads are a specification that “anyone” can implement, i.e., it defines a set of APIs (function calls, over 60 of them) and what they do.
Core functions of PThreads include:
More detailed descriptions can be found using
man function_name
on the command line/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void * hello_world(void * tid)
{
printf("HELLO from thread %d\n", *((int *) tid));
pthread_exit(NULL);
}
int main(int argc, char * argv[]) {
int aiIds[] = {1,2,3,4,5,6,7,8,9,10};
pthread_t threads[NUMBER_OF_THREADS];
int i;
for(i = 0; i < NUMBER_OF_THREADS; i++) {
if(pthread_create(&threads[i], NULL, hello_world, (void *) &(aiIds[i])) == -1)
{
printf("Creating thread %d failed", i);
exit(-1);
}
}
for(i = 0; i < NUMBER_OF_THREADS; i++)
pthread_join(threads[i], NULL);
return 0;
}
Scheduling in Windows 7: Multi-level Feedback Queues
Moving Beyond Priority Queues
Different scheduling algorithms can be used for the individual queues (e.g., round robin, SJF, FCFS).
- Feedback queues allow priorities to change dynamically, i.e., jobs can move between queues:
- Move to lower priority queue if too much CPU time is used (prioritise I/O and interactive processes)
- Move to higher priority queue to prevent starvation and avoid inversion of control
Inversion of Control
1
2
3
4
5
6
7
8 Process A (low) Process B (high) Process C (high)
...
request X
receive X
... RUN
... request X
... blocked RUN
... ... ...
- Characteristics of feedback queues:
- The number of queues
- The scheduling algorithms used for the individual queues
- Migration policy between queues
- Initial access to the queues
- Feedback queues are highly configurable and offer significant flexibility
Windows 7
- An interactive system using a preemptive scheduler with dynamic priority levels
- Two priority classes with 16 different priority levels exist
- “Real time” processes/threads have a fixed priority level
- “Variable” processes/threads can have their priorities boosted temporarily
- Two priority classes with 16 different priority levels exist
- A round robin algorithm is used within the queues
- Priorities are based on the process base priority (between 0-15) and thread base priority (±2 relative to the process priority)
- A thread’s priority dynamically changes during execution between its base priority and the maximum priority within its class
- Interactive I/O bound processes (e.g. keyboard) receive a larger boost
- Boosting priorities prevents priority inversion
Example:
1 | using System; |
Scheduling in Linux
The Completely Fair Scheduler
- Process scheduling has evolved over different versions of Linux to account for multiple processors/cores, processor affinity, and load balancing between cores
- Linux distinguishes between two types of tasks for scheduling:
- Real time tasks (to be POSIX compliant), divided into:
- Real time FIFO tasks
- Real time Round Robin tasks
- Time sharing tasks using a preemptive approach (similar to variable in Windows)
- Real time tasks (to be POSIX compliant), divided into:
- The most recent scheduling algorithm in Linux for time sharing tasks is the “completely fair scheduler” (CFS, before the 2.6 kernel, this was an O(1) scheduler)
Real-Time Tasks
- Real time FIFO tasks have the highest priority and are scheduled using a FCFS approach, using preemption if a higher priority job shows up
- Real time round robin tasks are preemptable by clock interrupts and have a time slice associated with them
- Both approaches cannot guarantee hard deadlines
Time Sharing Tasks (equal priority)
- The CFS divides the CPU time between all processes
- If all N processes have the same priority:
- They will be allocated a “time slice” equal to 1/N times the available CPU time
- I.e., if N equals 5, every process will receive 20% of the processor’s time
- They will be allocated a “time slice” equal to 1/N times the available CPU time
- The length of the time slice and the “available CPU time” are based on the targeted latency (⇒ every process should run at least once during this interval)
- If N is very large, the context switch time will be dominant, hence a lower bound on the “time slice” is imposed by the minimum granularity
- A process’s time slice can be no less than the minimum granularity (response time will deteriorate)
Time Sharing Tasks (different priority)
- A weighting scheme is used to take different priorities into account
- If process have different priorities:
- Every process \(i\) is allocated a weight \(W_i\) that reflects its priority
- The “time slice” allocated to process \(i\) is then proportional to \(\frac{W_i}{\sum_{j \in N} W_j}\)
- The tasks with the lowest proportional amount of “used CPU time” are selected first
Multi-processor Scheduling
Scheduling Decisions
- Single processor machine: which process (thread) to run next (one dimensional)
- Scheduling decisions on a multi-processor/core machine include:
- Which process (thread) to run where, i.e., which CPU?
- Which process (thread) to run when?
Shared Queues
- A single or multi-level queue shared between all CPUs
- Advantage: automatic load balancing
- Disadvantages:
- Contention for the queues (locking is needed)
- “All CPUs are equal, but some are more equal than others” : does not account for processor affinity:
- Cache becomes invalid when moving to a different CPU
- Translation look aside buffers (TLBs - part of the MMU) become invalid
- Windows will allocate the highest priority threads to the individual CPUs/cores
Private Queues
- Each CPU has a private (set) of queues
- Advantages:
- CPU affinity is automatically satisfied
- Contention for shared queue is minimised
- Disadvantages: less load balancing
- Push and pull migration between CPUs is possible
Related vs. Unrelated Threads
Thread Types
- Related: multiple threads that communicate with one another and ideally run together (e.g. search algorithm)
- Unrelated: e.g. processes threads that are independent, possibly started by different users running different programs
Scheduling Related Threads
Working Together
- E.g. threads belong to the same process and are cooperating, e.g. they exchange messages or share information, e.g
- Process A has thread A0 and A1, A0 and A1 cooperate
- Process B has thread B0 and B1, B0 and B1 cooperate
- The scheduler selects A0 and B1 to run first, then A1 and B0, and A0 and A1, and B0 and B1 run on different CPUs
- They try to send messages to the other threads, which are still in the ready state
- The aim is to get threads running, as much as possible, at the same time across multiple CPUs
- Approaches include:
- Space sharing
- Gang scheduling
Space Scheduling
- Approach:
- N threads are allocated to N dedicated CPUs
- N threads are kept waiting until N CPUs are available
- Non-preemptive, i.e. blocking calls result in idle CPUs (less context switching overhead but results in CPU idle time)
- The number N can be dynamically adjusted to match processor capacity
Gang Scheduling
- Time slices are synchronised and the scheduler groups threads together to run simultaneously (as much as possible)
- A preemptive algorithm
- Blocking threads result in idle CPUs