Upload
jazlyn-wild
View
256
Download
2
Embed Size (px)
Citation preview
Nova SchedulerShane Wang(王庆) , Intel Open Source Technology Center
微信号: qq559382
Agenda
What is current situation?How scheduler works in Juno and KiloResource TrackingFilters and WeightUtilization Based Scheduling (UBS)
What is next plan?GanttDynamic Resource Scheduling (DRS)
How scheduler works in Juno and Kilo
ConductorAPI
Scheduler
Compute
1. User request andwith scheduler hints to include scheduling policy 2. Submit new task
3. Request host that match the request_spec and filter_properties
4. Returns selected hosts
5. Call the selected compute
6. Rescheduling after claim resource failed or other failure
Resource usage Tracking
Conductor
Scheduler
Compute
2. Submit new task
3. Request host that match the request_spec and filter_properties
4. Returns selected hosts
5. Call the selected compute
6. Rescheduling after claim resource failed or other failure
DB
HypervisorHypervisor
Hypervisor
Resource Claiming1) Validate the resource usage2) Update the resource Usage3) Update to DB
1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host
Periodically update the node resource with 60 seconds interval1) Get hypervisor resource2) Consuming the resource3) Update to DB
Filters and weight hosts
Request Spec:ImageInstance_propertiesInstance_type
Filter_propertiesScheduler-hintsAssist parameter: retry
Nova boot –flavor 1 –image …… --hint group=‘sg1’ --hint <key=value> Send arbitrary key/value pairs to the scheduler for custom use.
scheduler_host_subset_size=1
scheduler_available_filters='nova.scheduler.filters.all_filters‘scheduler_default_filters= [……]
scheduler_weight_classes=nova.scheduler.weights.all_weighers
Filters
Resource:CoreFilter AggregateCoreFilter: cpu_allocation_ratio=16.0
RamFilter AggregateRamFilter: ram_allocation_ratio=1.5
DiskFilter AggregateDiskFilter: disk_allocation_ratio=1.0
IoOpsFilter AggregateIoOpsFilter: max_io_ops_per_host=8. IoOps means resize, building, image snaphsot. Migration, rescues, unshelve, backup
PciPassthroughFilter: Generic PCI device or SRIOV assignment
NUMATopologyFilter: NUMA in J, CPUPinning, Hugepage in K
Filters
Affinity:DifferentHostFilter, SameHostFilter: scheduler_hints: different_host/ same_host =[‘instance uuid’…]
ServerGroupAffnityFilter, ServerGroupAntiAffinityFilter: nova server-group-create Create a new server group with the specified details. nova server-group-delete Delete specific server group(s). nova server-group-get Get a specific server group. nova server-group-list Print a list of all server groups. boot with scheduler-hints: group=uuid Boot new instance into server group
SimpleCIDRAffinityFilter: scheduler_hints: cidr, build_near_host_ip
TypeAffinityFilter, AggregateTypeAffinityFilter: instance_type
Filters
Topology: AggregateImagePropertiesIsolation: image properties matchs aggregate metadata
IsolatedHostsFilter: isolated_hosts, isolated_images, restrict_isolated_hosts_to_isolated_images
AggregateInstanceExtraSpecsFilter: Flavor’s extra spec match aggregate metadata
AggregateMultiTenancyIsolation: filter_tenant_id
AvailabilityZoneFilter
Filters
Others: ComputeCapabilitiesFilter: work with instance type extra_spec: ‘capabilities:’
ComputeFilter: The compute node is live or disabled
ImagePropertiesFilter: architecture, hypervisor type, vm_mode, hypervisor_version_requires
JsonFilter: scheduler_hints:query
NumInstancesFilter, AggregateNumInstancesFilter, max_instances_per_host
RetryFilter
TrustedFilter
Weight
IoOpsWeigher
MetricsWeigher
RAMWeigher
Utilization Based Scheduling
• CPU Utilization data• Memory Utilization data• Network Bandwidth data• etc
Utilization Based Scheduling
Conductor
Scheduler
Compute
2. Submit new task
3. Request host that match the request_spec and filter_properties
4. Returns selected hosts
5. Call the selected compute
6. Rescheduling after claim resource failed or other failure
DB
HypervisorHypervisor
Hypervisor
1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host
CPU Monitor
NetworkBandWidth
MemoryCache Monitor
Update 60 seconds interval
Notification BusAMQP
Utilization Based Scheduling
MetricsWeigher:weight_multiplier: Multiplier used for weighing metrics.weight_setting: How the metrics are going to be weighed.Required: If true, use the MetricsFilterweight_of_unavailable
How scheduler strategy affects performance?
Benchmark Accuracy
Smart Scheduling
Efficiency
QoS meet SLA contract
What is monitored now?
OpenStack Service Type Metrics (e.g.)
Nova
Static capabilities • CPU features• hypervisor version
Dynamic Resources
• free memory/disk • vCPU #• PCI devices• # of NIC virtual functions
Ceilometer
Resources creation/deletion
• VM • network/subnet/port• image• ……
Resources usage data
• CPU usage in VM• memory usage in VM• network usage in VM• storage usage stats• ……
NotEnough
• CPU usage stats of host• Network usage stats of host• Intel Node Manager Power data• Cache Qos Monitoring(CQM) data……
Ceilometer
no hardware pollsters
Nova
not easy to addhow to use?
What are missing?
Policy managementBreak policy into QoS parameterMapping QoS parameter to metrics
ActionsLive migrationResource reallocationEnforcement… …
Knowledge model to evaluate complex policy situations(e.g. predict future VM workload)
Dynamic Resource Scheduling
Policy
Ceilometer
Nova collectorCeilometercollector
Other agents
Other collectors
Pluggable Executors
Logging
resource reallocation
Alarming
Evaluating
Enforcement
Live migration
De-virtualizing
Benchmarking
Evaluator
Parser
Analyzer
Historic metrics
dataPluggable Collectors
Other actions
admins
API API
API
API
Existing components
To be implemented
Knowledgemodel
Nova
setalarm
alarmtrigger
Next: Gantt
Scheduler-as-a-Service project
Split from Nova first, then for other projects
Plan to split begin from L
Gantt in Kilo: Refactor, Refactor, Refactor….
The Scheduler before Juno
API Scheduler Compute
The scheduler in Kilo
ConductorAPI
Scheduler
Compute
1. User request andwith scheduler hints to include scheduling policy 2. Submit new task
3. Request host that match the request_spec and filter_properties
4. Returns selected hosts
5. Call the selected compute
6. Rescheduling after claim resource failed or other failure
Scheduler API: select_destinations update_resource_stats
Refactor
https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-objects
https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode
https://blueprints.launchpad.net/nova/+spec/resource-objects
https://blueprints.launchpad.net/nova/+spec/request-spec-object
https://blueprints.launchpad.net/nova/+spec/sched-select-destinations-use-request-spec-object
https://blueprints.launchpad.net/nova/+spec/isolate-scheduler-db
Thanks
Backup
The problem of current Nova scheduler
Server GroupCan’t add/remove active server to/from server-group
https://review.openstack.org/136487https://review.openstack.org/139272
With affinity policy means you can’t evacuateIgnore down host when populate the instance: https://review.openstack.org/#/c/135607/Remove the instance from server group: https://review.openstack.org/136487, but won’t land in K, maybe L. It also won’t work for
something automatic HAhttps://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/soft-affinity-for-server-group,n,z
Anti-affinity policy race problem, may trigger extra reschedulingRace for migration
Support unshelve, rebuild, live-migration, migration, resize in K….but not resolve the anti-affinity policy problem.Unshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…
The problem of current Nova scheduler
Missing resource claiming and retry for migrationUnshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…
Scheduling-hints can’t persistYou only can specific your scheduling policy at the beginningViolate the policy after migrationhttps://review.openstack.org/88983 block in K, maybe L
Race Problemthe bug link https://bugs.launchpad.net/nova/+bug/1341420scheduler_host_subset_size=N
Ironic integrationhttps://bugs.launchpad.net/nova/+bug/1402658
Any more problem for scheduler?
Only do initial placement!Each project have own scheduler
DRS in Openstack
Gantt
Tetris https://docs.google.com/document/d/1DMsnGxQ3P-OwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit
Purview(Tetris) will provide framework to quickly implement and enforce different kinds of policies. Policies can be different types. Here are a few examples of policies in clouds: Availability Policies, Performance Policies, Load balancing Policy, User Defined Policy.
Congress https://wiki.openstack.org/wiki/CongressCongress is a policy-based management framework for the cloud. It is designed to work with any cloud software that reasonably fits
within the relational data model. It automatically prevents policy violations when possible and corrects them when not, and it enables administrators to control the extent to which enforcement is automatic
Tetris is domain-specific policy system Congress is domain-independent policy system
domain-independent and domain-specific policy systems are highly complementary