Upload
bathsheba-hunt
View
214
Download
0
Embed Size (px)
Citation preview
CS 4284Systems Capstone
Godmar Back
Resource Allocation & Scheduling
Resource Allocation and Scheduling
CS 4284 Spring 2015
Resource Allocation & Scheduling
• Resource management is primary OS function• Involves resource allocation & scheduling
– Who gets to use what resource and for how long• Example resources:
– CPU time– Disk bandwidth– Network bandwidth– RAM– Disk space
• Processes are the principals that use resources– often on behalf of users
CS 4284 Spring 2015
Preemptible vs Nonpreemptible Resources
• Nonpreemptible resources:– Once allocated, can’t easily ask for them back
– must wait until process returns them (or exits)• Examples: Locks, Disk Space, Control of terminal
• Preemptible resources:– Can be taken away (“preempted”) and
returned without the process noticing it• Examples: CPU, Memory
CS 4284 Spring 2015
Physical vs Virtual Memory
• Classification of a resource as preemptible depends on price one is willing to pay to preempt it– Can theoretically preempt most resources via copying
& indirection• Virtual Memory: mechanism to make physical
memory preemptible– Take away by swapping to disk, return by reading
from disk (possibly swapping out others)• Not always tolerable
– resident portions of kernel– Pintos kernel stack pages
CS 4284 Spring 2015
Space Sharing vs Time Sharing
• Space Sharing: Allocation (“how much?”)– Use if resource can be split (multiple CPUs,
memory, etc.)– Use if resource is non-preemptible
• Time Sharing: Scheduling (“how long?”)– Use if resource can’t be split– Use if resource is easily preemptible
CS 4284 Spring 2015
CPU vs. Other Resources
• CPU is not the only resource that needs to be scheduled
• Overall system performance depends on efficient use of all resources– Resource can be in use (busy) or be unused (idle)
• Duty cycle: portion of time busy
– Consider I/O device: busy after receiving I/O request – if CPU scheduler delays process that will issue I/O request, I/O device is underutilized
• Ideal: want to keep all devices busy
CS 4284 Spring 2015
Per-process perspective
• Process alternates between CPU bursts & I/O bursts
I/OCPU
I/O Bound Process
CPU Bound Process
CS 4284 Spring 2015
Global perspective
• If these were executed on the same CPU:
I/OCPU
I/O Bound Process
CPU Bound ProcessWaiting
CPU Scheduling
Part I
CS 4284 Spring 2015
CPU Scheduling Terminology• A job (sometimes called a task, or a job instance)
– Activity that’s scheduled: process or part of a process• Arrival time: time when job arrives• Start time: time when job actually starts• Finish time: time when job is done• Completion time (aka Turn-around time)
– Finish time – Arrival time• Response time
– Time when user sees response – Arrival time• Execution time (aka cost): time a job needs to execute
CPUI/OCPU burstwaiting waiting
Arrival Time Start Time Finish Time
Completion TimeResponse Time
CS 4284 Spring 2015
CPU Scheduling Terminology (2)
• Waiting time = time when job was ready-to-run– didn’t run because CPU scheduler picked
another job• Blocked time = time when job was blocked
– while I/O device is in use• Completion time
– Execution time + Waiting time + Blocked time
CS 4284 Spring 2015
Static vs Dynamic Scheduling
• Static– All jobs, their arrival & execution times are known in
advance, create a schedule, execute it• Used in statically configured systems, such as embedded
real-time systems
• Dynamic or Online Scheduling– Jobs are not known in advance, scheduler must make
online decision whenever jobs arrives or leaves• Execution time may or may not be known• Behavior can be modeled by making assumptions about
nature of arrival process
CS 4284 Spring 2015
Scheduling Algorithms vs Scheduler Implementations
• Scheduling algorithms’ properties are (usually) analyzed under static assumptions first; then adapted for dynamic scenarios
• Algorithms often consider only an abstract notion of (CPU) “jobs”, but a dynamic scheduler must map that to processes with alternating - and repeating - CPU and IO bursts– Often applies static algorithm to current ready queue
• Algorithms often assume length of job/CPU burst is known, but real scheduler must estimate expected execution cost (or make assumptions)
CS 4284 Spring 2015
Preemptive vs Nonpreemptive Scheduling
• Q.: when is scheduler asked to pick a thread from ready queue?
• Nonpreemptive:– Only when RUNNING
BLOCKED transition– Or RUNNING EXIT– Or voluntary yield: RUNNING
READY• Preemptive
– Also when BLOCKED READY transition
– Also on timer (forced call to yield upon intr exit)
RUNNINGRUNNING
READYREADYBLOCKEDBLOCKED
Processmust waitfor event
Event arrived
Schedulerpicks process
Processpreempted
CS 4284 Spring 2015
CPU Scheduling Goals
• Minimize latency– Can mean (avg) completion time– Can mean (avg) response time
• Maximize throughput– Throughput: number of finished jobs per time-unit– Implies minimizing overhead (for context-switching,
for scheduling algorithm itself)– Requires efficient use of non-CPU resources
• Fairness– Minimize variance in waiting time/completion time
CS 4284 Spring 2015
Scheduling Constraints
• Reaching those goals is difficult, because– Goals are conflicting:
• Latency vs. throughput• Fairness vs. low overhead
– Scheduler must operate with incomplete knowledge• Execution time may not be known• I/O device use may not be known
– Scheduler must make decision fast• Approximate best solution from huge solution space
CS 4284 Spring 2015
First Come First Serve
• Schedule processes in the order in which they arrive– Run until completion (or until they block)
• Simple!• Example:
0 20 22 27Q.: what is the averagecompletion time?
2 7
CS 4284 Spring 2015
FCFS (cont’d)
• Disadvantage: completion time depends on arrival order– Unfair to short jobs
• Possible Convoy Effect:– 1 CPU bound (long CPU bursts, infrequent I/O bursts), multiple
I/O bound jobs (frequent I/O bursts, short CPU bursts).– CPU bound process monopolizes CPU: I/O devices are idle– New I/O requests by I/O bound jobs are only issued when CPU
bound job blocks – CPU bound job “leads” convoy of I/O bound processes
• FCFS not usually used for CPU scheduling, but often used for other resources (network device)
CS 4284 Spring 2015
Round-Robin
• Run process for a timeslice (quantum), then move on to next process, repeat
• Decreases avg completion if jobs are of different lengths
• No more unfairness to short jobs!
0 27
Q.: what is the averagecompletion time?
5 8
CS 4284 Spring 2015
Round Robin (2)
• What if there are no “short” jobs?
0 217 14
Q.: what is the average completion time?
What would it be under FCFS?
CS 4284 Spring 2015
Round Robin – Cost of Time Slicing
• Context switching incurs a cost– Direct cost (execute scheduler & context switch) + indirect cost
(cache & TLB misses)• Long time slices lower overhead, but approaches FCFS
if processes finish before timeslice expires• Short time slices lots of context switches, high
overhead• Typical cost: context switch < 10µs• Time slice typically around 100ms • Note: time slice length != interval between timer
interrupts where periodic timers are used
CS 4284 Spring 2015
Shortest Process Next (SPN)
• Idea: remove unfairness towards short processes by always picking the shortest job
• If done nonpreemptively also known as: – Shortest Job First (SJF), Shortest Time to
Completion First (STCF)• If done preemptively known as:
– Shortest Remaining Time (SRT), Shortest Remaining Time to Completion First (SRTCF)
CS 4284 Spring 2015
SPN (cont’d)
• Provablyoptimalwith respectto avg waiting time:– Moving shorter job up reduces its waiting time more
than it delays waiting time of longer job that follows
• Advantage: Good I/O utilization• Disadvantage:
– Can starve long jobs
0 272 7
Big Q: How do we know the length of a job?
CS 4284 Spring 2015
Practical SPN
• Usually don’t know (remaining) execution time– Exception: profiled code in real-time system; or worst-
case execution time analysis (WCET)• Idea: determine future from past:
– Assume next CPU burst will be as long as previous CPU burst
– Or: weigh history using (potentially exponential) average: more recent burst lengths more predictive than past CPU bursts
• Note: for some resources, we know or can compute length of next “job”:– Example: disk scheduling (shortest-seek time first)
Aside
• http://cacm.acm.org/magazines/2014/2/171680-computation-takes-time-but-how-much/fulltext
• Computation Takes Time, But How Much?• Reinhard Wilhelm, Daniel Grund
Communications of the ACM, Vol. 57 No. 2, Pages 94-10310.1145/2500886
CS 4284 Spring 2015
CS 4284 Spring 2015
Multi-Level Feedback Queue Scheduling
• Kleinrock 1969• Want:
– preference for short jobs (tends to lead to good I/O utilization)– longer timeslices for CPU bound jobs (reduces context-switching
overhead)• Problem:
– Don’t know type of each process – algorithm needs to figure out• Use multiple queues
– queue determines priority– usually combined with static priorities (nice values)– many variations of this idea exist
CS 4284 Spring 2015
MLFQS
MIN
MAXH
ighe
r P
riorit
y
1
2
4
3
Long
er T
imes
lices
Process thatuse up theirtime slice movedown
Processes start in highest queue
Higher priority queues are served before lower-priority ones - within highest-priority queue, round-robin
Processes that starve
move up
Only ready processes are in this queue - blocked processes leave queue and reenter same queue on unblock
CS 4284 Spring 2015
Basic Scheduling: Summary
• FCFS: simple– unfair to short jobs & poor I/O performance (convoy effect)
• RR: helps short jobs– loses when jobs are equal length
• SPN: optimal average waiting time– which, if ignoring blocking time, leads to optimal average
completion time– unfair to long jobs– requires knowing (or guessing) the future
• MLFQS: approximates SPN without knowing execution time– Can still be unfair to long jobs
CPU Scheduling
Part II
CS 4284 Spring 2015
Case Study: 2.6 Linux Scheduler (pre 2.6.23)
• Variant of MLFQS• 140 priorities
– 0-99 “realtime”– 100-140 nonrealtime
• Dynamic priority computed from static priority (nice) plus “interactivity bonus”
0
100
120
140
“Realtime”processesscheduled based on static prioritySCHED_FIFOSCHED_RR
Processes scheduledbased on dynamicprioritySCHED_OTHER
nice=0
nice=19
nice=-20
CS 4284 Spring 2015
Linux Scheduler (2)
• Instead of recomputation loop, recompute priority at end of each timeslice– dyn_prio = nice + interactivity bonus (-5…5)
• Interactivity bonus depends on sleep_avg– measures time a process was blocked
• 2 priority arrays (“active” & “expired”) in each runqueue (Linux calls ready queues “runqueue”)
CS 4284 Spring 2015
Linux Scheduler (3)struct prio_array { unsigned int nr_active; unsigned long bitmap[BITMAP_SIZE]; struct list_head queue[MAX_PRIO];};typedef struct prio_array prio_array_t;
/* find the highest-priority ready thread */ idx = sched_find_first_bit(array->bitmap); queue = array->queue + idx; next = list_entry(queue->next, task_t, run_list);
struct prio_array { unsigned int nr_active; unsigned long bitmap[BITMAP_SIZE]; struct list_head queue[MAX_PRIO];};typedef struct prio_array prio_array_t;
/* find the highest-priority ready thread */ idx = sched_find_first_bit(array->bitmap); queue = array->queue + idx; next = list_entry(queue->next, task_t, run_list);
/* Per CPU runqueue */struct runqueue { prio_array_t *active; prio_array_t *expired; prio_array_t arrays[2]; …}
/* Per CPU runqueue */struct runqueue { prio_array_t *active; prio_array_t *expired; prio_array_t arrays[2]; …}
• Finds highest-priority ready thread quickly• Switching active & expired arrays at end of epoch is
simple pointer swap (“O(1)” claim)
CS 4284 Spring 2015
Linux Timeslice Computation
• Linux scales static priority to timeslice– Nice [ -20 … 0 … 19 ] maps to
[800ms … 100 ms … 5ms]• Various tweaks:
– “interactive processes” are reinserted into active array even after timeslice expires• Unless processes in expired array are starving
– processes with long timeslices are round-robin’d with other of equal priority at sub-timeslice granularity
History
• Variants of MLFQS dominant until a few years ago; still used in Windows kernel
• Accompanied by belief that online scheduler must be O(1) with small c
• MLFQS are easily manipulated and do not guarantee fair (“proportional”) CPU assignments
• Another problem is accuracy of accounting – sampling charges entire tick to process that happened to be running at that point
CS 4284 Spring 2015
Accuracy of accounting
• Instead of relying on sampling, modern versions of OS use cycle counters or high-precision timers to accurate determine a process’s recent CPU usage.
• This also makes it easy to exclude time spent in IRQ handling that should not be charged to a process– See [Inside the Vista Kernel] for a graph
CS 4284 Spring 2015
Linux’s CFS
• Linux went a step further and reinvented WFQ (of course without any credit), implemented in its “CFS”– “completely fair scheduler”
• O (log n) – red/black tree– But does not aim to support really large n
• But, as we’ll see, WFQ does not automatically give precedence to I/O bound apps; required a lot of “Interactivity improvements” to tune it heuristically– Well-known trade-off between fairness & latency
CS 4284 Spring 2015
CS 4284 Spring 2015
Proportional Share Scheduling
• Aka “Fair-Share” Scheduling• None of algorithms discussed so far provide a
direct way of assigning CPU shares– E.g., give 30% of CPU to process A, 70% to process
B• Proportional Share algorithms do by assigning
“tickets” or “shares” to processes– Process get to use resource in proportion of their
shares to total number of shares• Lottery Scheduling, Weighted Fair
Queuing/Stride Scheduling [Waldspurger 1995]
Lottery Scheduling• Idea: number tickets between 1…N
– every process gets pi tickets according to importance
– process 1 gets tickets [1… p1-1]
– process 2 gets tickets [p1… p1+p2-1] and so on.
• Scheduling decision:– Hold a lottery and draw ticket, holder gets to run for
next time slice
• Nondeterministic algorithm• Q.: how to implement priority donation?
CS 4284 Spring 2015
Weighted Fair Queuing
• Uses ‘per process’ virtual time • Increments process’s virtual time by a “stride”
after each quantum, which is defined as (process_share)-1
• Choose process with lowest virtual finishing time– ‘virtual finishing time’ is virtual time + stride
• Also known as stride scheduling• Linux now implements a variant of WFQ/Stride
Scheduling as its “CFS” completely fair scheduler
CS 4284 Spring 2015
WFQ Example (A=3, B=2, C=1)Ready Queue is sorted by Virtual Finish Time
(Virtual Time at end of quantum if a process were scheduled)
Time Task A Task B Task C Ready QueueWho Runs
One scheduling epoch. A ran 3 out of 6 quanta, B 2 out of 6, C 1 out of 6.
This process will repeat, yielding proportional fairness.
0 1/3 1/2 1 A (1/3) B (1/2) C (1) A
1 2/3 1/2 1 B (1/2) A (2/3) C (1) B
2 2/3 1 1 A (2/3) C(1) B(1) A
3 1 1 1 C(1) B(1) A(1) C
4 1 1 2 B(1) A(1) C(2) B
5 1 3/2 2 A(1) B(3/2) C(2) A
6 4/3 3/2 2 A (4/3) B(3/2) C(2)
CS 4284 Spring 2015
WFQ (cont’d)
• WFQ requires a sorted ready queue– Linux now uses R/B tree– Higher complexity than O(1) linked lists, but appears manageable
for real-world ready queue sizes• Unblocked processes that reenter the ready queue are
assigned a virtual time reflecting the value that their virtual time counter would have if they’d received CPU time proportionally
• Accommodating I/O bound processes still requires fudging– In strict WFQ, only way to improve latency is to set number of
shares high – but this is disastrous if process is not truly I/O bound– Linux uses “sleeper fairness,” to identify when to boost virtual time;
similar to the sleep average in old scheduler
CS 4284 Spring 2015
CS 4284 Spring 2015
Linux SMP Load Balancing
• Runqueue is per CPU• Periodically, lengths of
runqueues on different CPU is compared– Processes are migrated to
balance load• Aside: Migrating requires
locks on both runqueues
static void double_rq_lock( runqueue_t *rq1, runqueue_t *rq2){ if (rq1 == rq2) { spin_lock(&rq1->lock); } else { if (rq1 < rq2) { spin_lock(&rq1->lock); spin_lock(&rq2->lock); } else { spin_lock(&rq2->lock); spin_lock(&rq1->lock); } }}
static void double_rq_lock( runqueue_t *rq1, runqueue_t *rq2){ if (rq1 == rq2) { spin_lock(&rq1->lock); } else { if (rq1 < rq2) { spin_lock(&rq1->lock); spin_lock(&rq2->lock); } else { spin_lock(&rq2->lock); spin_lock(&rq1->lock); } }}
CS 4284 Spring 2015
Real-time Scheduling
• Real-time systems must observe not only execution time, but a deadline as well– Jobs must finish by deadline– But turn-around time is usually less important
• Common scenario are recurring jobs– E.g., need 3 ms every 10 ms (here, 10ms is the recurrence
period T, 3 ms is the cost C)• Possible strategies
– RMA (Rate Monotonic)• Map periods to priorities, fixed, static
– EDF (Earliest Deadline First)• Always run what’s due next, dynamic
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
Hyper-period
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
Lexical order tie breaker (C > B > A)
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
CS 4284 Spring 2015
EDF – Example
Task T C
A 4 1
B 8 4
C 12 3
50 10 15 20 25
Assume deadline equals period (T).
A
B
C
Pattern repeats
CS 4284 Spring 2015
EDF Properties
• Feasibility test:
• U = 100% in example• Bound theoretical• Sufficient and necessary• Optimal
CS 4284 Spring 2015
CS 4284 Spring 2015
Scheduling Summary
• OS must schedule all resources in a system– CPU, Disk, Network, etc.
• CPU Scheduling affects indirectly scheduling of other devices
• Goals for general purpose schedulers:– Minimizing latency (avg. completion or waiting time)– Maximing throughput– Provide fairness
• In Practice: some theory, lots of tweaking