Nova Scheduler Shane Wang （王庆）, Intel Open Source Technology Center 微信号： qq559382

Nova SchedulerShane Wang（王庆） , Intel Open Source Technology Center

微信号： qq559382

Agenda

What is current situation?How scheduler works in Juno and KiloResource TrackingFilters and WeightUtilization Based Scheduling (UBS)

What is next plan?GanttDynamic Resource Scheduling (DRS)

How scheduler works in Juno and Kilo

ConductorAPI

Scheduler

Compute

1. User request andwith scheduler hints to include scheduling policy 2. Submit new task

3. Request host that match the request_spec and filter_properties

4. Returns selected hosts

5. Call the selected compute

6. Rescheduling after claim resource failed or other failure

Resource usage Tracking

Conductor

Scheduler

Compute

2. Submit new task





DB

HypervisorHypervisor

Hypervisor

Resource Claiming1) Validate the resource usage2) Update the resource Usage3) Update to DB

1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host

Periodically update the node resource with 60 seconds interval1) Get hypervisor resource2) Consuming the resource3) Update to DB

Filters and weight hosts

Request Spec:ImageInstance_propertiesInstance_type

Filter_propertiesScheduler-hintsAssist parameter: retry

Nova boot –flavor 1 –image …… --hint group=‘sg1’ --hint <key=value> Send arbitrary key/value pairs to the scheduler for custom use.

scheduler_host_subset_size=1

scheduler_available_filters='nova.scheduler.filters.all_filters‘scheduler_default_filters= [……]

scheduler_weight_classes=nova.scheduler.weights.all_weighers

Filters

Resource:CoreFilter AggregateCoreFilter: cpu_allocation_ratio=16.0

RamFilter AggregateRamFilter: ram_allocation_ratio=1.5

DiskFilter AggregateDiskFilter: disk_allocation_ratio=1.0

IoOpsFilter AggregateIoOpsFilter: max_io_ops_per_host=8. IoOps means resize, building, image snaphsot. Migration, rescues, unshelve, backup

PciPassthroughFilter: Generic PCI device or SRIOV assignment

NUMATopologyFilter: NUMA in J, CPUPinning, Hugepage in K

Filters

Affinity:DifferentHostFilter, SameHostFilter: scheduler_hints: different_host/ same_host =[‘instance uuid’…]

ServerGroupAffnityFilter, ServerGroupAntiAffinityFilter: nova server-group-create Create a new server group with the specified details. nova server-group-delete Delete specific server group(s). nova server-group-get Get a specific server group. nova server-group-list Print a list of all server groups. boot with scheduler-hints: group=uuid Boot new instance into server group

SimpleCIDRAffinityFilter: scheduler_hints: cidr, build_near_host_ip

TypeAffinityFilter, AggregateTypeAffinityFilter: instance_type

Filters

Topology: AggregateImagePropertiesIsolation: image properties matchs aggregate metadata

IsolatedHostsFilter: isolated_hosts, isolated_images, restrict_isolated_hosts_to_isolated_images

AggregateInstanceExtraSpecsFilter: Flavor’s extra spec match aggregate metadata

AggregateMultiTenancyIsolation: filter_tenant_id

AvailabilityZoneFilter

Filters

Others: ComputeCapabilitiesFilter: work with instance type extra_spec: ‘capabilities:’

ComputeFilter: The compute node is live or disabled

ImagePropertiesFilter: architecture, hypervisor type, vm_mode, hypervisor_version_requires

JsonFilter: scheduler_hints:query

NumInstancesFilter, AggregateNumInstancesFilter, max_instances_per_host

RetryFilter

TrustedFilter

Weight

IoOpsWeigher

MetricsWeigher

RAMWeigher

Utilization Based Scheduling

• CPU Utilization data• Memory Utilization data• Network Bandwidth data• etc


Conductor

Scheduler

Compute

2. Submit new task





DB

HypervisorHypervisor

Hypervisor

1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host

CPU Monitor

NetworkBandWidth

MemoryCache Monitor

Update 60 seconds interval

Notification BusAMQP


MetricsWeigher:weight_multiplier: Multiplier used for weighing metrics.weight_setting: How the metrics are going to be weighed.Required: If true, use the MetricsFilterweight_of_unavailable

How scheduler strategy affects performance?

Benchmark Accuracy

Smart Scheduling

Efficiency

QoS meet SLA contract

What is monitored now?

OpenStack Service Type Metrics (e.g.)

Nova

Static capabilities • CPU features• hypervisor version

Dynamic Resources

• free memory/disk • vCPU #• PCI devices• # of NIC virtual functions

Ceilometer

Resources creation/deletion

• VM • network/subnet/port• image• ……

Resources usage data

• CPU usage in VM• memory usage in VM• network usage in VM• storage usage stats• ……

NotEnough

• CPU usage stats of host• Network usage stats of host• Intel Node Manager Power data• Cache Qos Monitoring(CQM) data……

Ceilometer

no hardware pollsters

Nova

not easy to addhow to use?

What are missing?

Policy managementBreak policy into QoS parameterMapping QoS parameter to metrics

ActionsLive migrationResource reallocationEnforcement… …

Knowledge model to evaluate complex policy situations(e.g. predict future VM workload)

Dynamic Resource Scheduling

Policy

Ceilometer

Nova collectorCeilometercollector

Other agents

Other collectors

Pluggable Executors

Logging

resource reallocation

Alarming

Evaluating

Enforcement

Live migration

De-virtualizing

Benchmarking

Evaluator

Parser

Analyzer

Historic metrics

dataPluggable Collectors

Other actions

admins

API API

API

API

Existing components

To be implemented

Knowledgemodel

Nova

setalarm

alarmtrigger

Next: Gantt

Scheduler-as-a-Service project

Split from Nova first, then for other projects

Plan to split begin from L

Gantt in Kilo: Refactor, Refactor, Refactor….

The Scheduler before Juno

API Scheduler Compute

The scheduler in Kilo

ConductorAPI

Scheduler

Compute

1. User request andwith scheduler hints to include scheduling policy 2. Submit new task





Scheduler API: select_destinations update_resource_stats

Refactor

https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-objects

https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode

https://blueprints.launchpad.net/nova/+spec/resource-objects

https://blueprints.launchpad.net/nova/+spec/request-spec-object

https://blueprints.launchpad.net/nova/+spec/sched-select-destinations-use-request-spec-object

https://blueprints.launchpad.net/nova/+spec/isolate-scheduler-db











Thanks

Backup

The problem of current Nova scheduler

Server GroupCan’t add/remove active server to/from server-group

https://review.openstack.org/136487https://review.openstack.org/139272

With affinity policy means you can’t evacuateIgnore down host when populate the instance: https://review.openstack.org/#/c/135607/Remove the instance from server group: https://review.openstack.org/136487, but won’t land in K, maybe L. It also won’t work for

something automatic HAhttps://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/soft-affinity-for-server-group,n,z

Anti-affinity policy race problem, may trigger extra reschedulingRace for migration

Support unshelve, rebuild, live-migration, migration, resize in K….but not resolve the anti-affinity policy problem.Unshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…

https://review.openstack.org/136487

https://review.openstack.org/#/c/135607/





https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,z




https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,z



The problem of current Nova scheduler

Missing resource claiming and retry for migrationUnshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…

Scheduling-hints can’t persistYou only can specific your scheduling policy at the beginningViolate the policy after migrationhttps://review.openstack.org/88983 block in K, maybe L

Race Problemthe bug link https://bugs.launchpad.net/nova/+bug/1341420scheduler_host_subset_size=N

Ironic integrationhttps://bugs.launchpad.net/nova/+bug/1402658





https://bugs.launchpad.net/nova/+bug/1341420

https://bugs.launchpad.net/nova/+bug/1341420

Any more problem for scheduler?

Only do initial placement!Each project have own scheduler

DRS in Openstack

Gantt

Tetris https://docs.google.com/document/d/1DMsnGxQ3P-OwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit

Purview(Tetris) will provide framework to quickly implement and enforce different kinds of policies. Policies can be different types. Here are a few examples of policies in clouds: Availability Policies, Performance Policies, Load balancing Policy, User Defined Policy.

Congress https://wiki.openstack.org/wiki/CongressCongress is a policy-based management framework for the cloud. It is designed to work with any cloud software that reasonably fits

within the relational data model. It automatically prevents policy violations when possible and corrects them when not, and it enables administrators to control the extent to which enforcement is automatic

Tetris is domain-specific policy system Congress is domain-independent policy system

domain-independent and domain-specific policy systems are highly complementary

https://docs.google.com/document/d/1DMsnGxQ3P-OwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit

https://docs.google.com/document/d/1DMsnGxQ3P-OwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit

https://wiki.openstack.org/wiki/Congress

https://wiki.openstack.org/wiki/Congress

Documents

Nova Scheduler Shane Wang （王庆）, Intel Open Source Technology Center 微信号： qq559382