28
Nova Scheduler Shane Wang 王王王 () , Intel Open Source Technology Center 王王王qq559382

Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Embed Size (px)

Citation preview

Page 1: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Nova SchedulerShane Wang(王庆) , Intel Open Source Technology Center

微信号: qq559382

Page 2: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Agenda

What is current situation?How scheduler works in Juno and KiloResource TrackingFilters and WeightUtilization Based Scheduling (UBS)

What is next plan?GanttDynamic Resource Scheduling (DRS)

Page 3: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

How scheduler works in Juno and Kilo

ConductorAPI

Scheduler

Compute

1. User request andwith scheduler hints to include scheduling policy 2. Submit new task

3. Request host that match the request_spec and filter_properties

4. Returns selected hosts

5. Call the selected compute

6. Rescheduling after claim resource failed or other failure

Page 4: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Resource usage Tracking

Conductor

Scheduler

Compute

2. Submit new task

3. Request host that match the request_spec and filter_properties

4. Returns selected hosts

5. Call the selected compute

6. Rescheduling after claim resource failed or other failure

DB

HypervisorHypervisor

Hypervisor

Resource Claiming1) Validate the resource usage2) Update the resource Usage3) Update to DB

1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host

Periodically update the node resource with 60 seconds interval1) Get hypervisor resource2) Consuming the resource3) Update to DB

Page 5: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Filters and weight hosts

Request Spec:ImageInstance_propertiesInstance_type

Filter_propertiesScheduler-hintsAssist parameter: retry

Nova boot –flavor 1 –image …… --hint group=‘sg1’ --hint <key=value> Send arbitrary key/value pairs to the scheduler for custom use.

scheduler_host_subset_size=1

scheduler_available_filters='nova.scheduler.filters.all_filters‘scheduler_default_filters= [……]

scheduler_weight_classes=nova.scheduler.weights.all_weighers

Page 6: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Filters

Resource:CoreFilter AggregateCoreFilter: cpu_allocation_ratio=16.0

RamFilter AggregateRamFilter: ram_allocation_ratio=1.5

DiskFilter AggregateDiskFilter: disk_allocation_ratio=1.0

IoOpsFilter AggregateIoOpsFilter: max_io_ops_per_host=8. IoOps means resize, building, image snaphsot. Migration, rescues, unshelve, backup

PciPassthroughFilter: Generic PCI device or SRIOV assignment

NUMATopologyFilter: NUMA in J, CPUPinning, Hugepage in K

Page 7: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Filters

Affinity:DifferentHostFilter, SameHostFilter: scheduler_hints: different_host/ same_host =[‘instance uuid’…]

ServerGroupAffnityFilter, ServerGroupAntiAffinityFilter: nova server-group-create Create a new server group with the specified details. nova server-group-delete Delete specific server group(s). nova server-group-get Get a specific server group. nova server-group-list Print a list of all server groups. boot with scheduler-hints: group=uuid Boot new instance into server group

SimpleCIDRAffinityFilter: scheduler_hints: cidr, build_near_host_ip

TypeAffinityFilter, AggregateTypeAffinityFilter: instance_type

Page 8: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Filters

Topology: AggregateImagePropertiesIsolation: image properties matchs aggregate metadata

IsolatedHostsFilter: isolated_hosts, isolated_images, restrict_isolated_hosts_to_isolated_images

AggregateInstanceExtraSpecsFilter: Flavor’s extra spec match aggregate metadata

AggregateMultiTenancyIsolation: filter_tenant_id

AvailabilityZoneFilter

Page 9: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Filters

Others: ComputeCapabilitiesFilter: work with instance type extra_spec: ‘capabilities:’

ComputeFilter: The compute node is live or disabled

ImagePropertiesFilter: architecture, hypervisor type, vm_mode, hypervisor_version_requires

JsonFilter: scheduler_hints:query

NumInstancesFilter, AggregateNumInstancesFilter, max_instances_per_host

RetryFilter

TrustedFilter

Page 10: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Weight

IoOpsWeigher

MetricsWeigher

RAMWeigher

Page 11: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Utilization Based Scheduling

• CPU Utilization data• Memory Utilization data• Network Bandwidth data• etc

Page 12: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Utilization Based Scheduling

Conductor

Scheduler

Compute

2. Submit new task

3. Request host that match the request_spec and filter_properties

4. Returns selected hosts

5. Call the selected compute

6. Rescheduling after claim resource failed or other failure

DB

HypervisorHypervisor

Hypervisor

1) Fetch newest compute node stats for each call2) Filter and weight the host3) Consuming the resource for selected host

CPU Monitor

NetworkBandWidth

MemoryCache Monitor

Update 60 seconds interval

Notification BusAMQP

Page 13: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Utilization Based Scheduling

MetricsWeigher:weight_multiplier: Multiplier used for weighing metrics.weight_setting: How the metrics are going to be weighed.Required: If true, use the MetricsFilterweight_of_unavailable

Page 14: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

How scheduler strategy affects performance?

Page 15: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Benchmark Accuracy

Page 16: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Smart Scheduling

Efficiency

QoS meet SLA contract

Page 17: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

What is monitored now?

OpenStack Service Type Metrics (e.g.)

Nova

Static capabilities • CPU features• hypervisor version

Dynamic Resources

• free memory/disk • vCPU #• PCI devices• # of NIC virtual functions

Ceilometer

Resources creation/deletion

• VM • network/subnet/port• image• ……

Resources usage data

• CPU usage in VM• memory usage in VM• network usage in VM• storage usage stats• ……

NotEnough

• CPU usage stats of host• Network usage stats of host• Intel Node Manager Power data• Cache Qos Monitoring(CQM) data……

Ceilometer

no hardware pollsters

Nova

not easy to addhow to use?

Page 18: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

What are missing?

Policy managementBreak policy into QoS parameterMapping QoS parameter to metrics

ActionsLive migrationResource reallocationEnforcement… …

Knowledge model to evaluate complex policy situations(e.g. predict future VM workload)

Page 19: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Dynamic Resource Scheduling

Policy

Ceilometer

Nova collectorCeilometercollector

Other agents

Other collectors

Pluggable Executors

Logging

resource reallocation

Alarming

Evaluating

Enforcement

Live migration

De-virtualizing

Benchmarking

Evaluator

Parser

Analyzer

Historic metrics

dataPluggable Collectors

Other actions

admins

API API

API

API

Existing components

To be implemented

Knowledgemodel

Nova

setalarm

alarmtrigger

Page 20: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Next: Gantt

Scheduler-as-a-Service project

Split from Nova first, then for other projects

Plan to split begin from L

Page 21: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Gantt in Kilo: Refactor, Refactor, Refactor….

The Scheduler before Juno

API Scheduler Compute

The scheduler in Kilo

ConductorAPI

Scheduler

Compute

1. User request andwith scheduler hints to include scheduling policy 2. Submit new task

3. Request host that match the request_spec and filter_properties

4. Returns selected hosts

5. Call the selected compute

6. Rescheduling after claim resource failed or other failure

Scheduler API: select_destinations update_resource_stats

Page 23: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Thanks

Page 24: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Backup

Page 25: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

The problem of current Nova scheduler

Server GroupCan’t add/remove active server to/from server-group

https://review.openstack.org/136487https://review.openstack.org/139272

With affinity policy means you can’t evacuateIgnore down host when populate the instance: https://review.openstack.org/#/c/135607/Remove the instance from server group: https://review.openstack.org/136487, but won’t land in K, maybe L. It also won’t work for

something automatic HAhttps://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/soft-affinity-for-server-group,n,z

Anti-affinity policy race problem, may trigger extra reschedulingRace for migration

Support unshelve, rebuild, live-migration, migration, resize in K….but not resolve the anti-affinity policy problem.Unshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…

Page 26: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

The problem of current Nova scheduler

Missing resource claiming and retry for migrationUnshelve: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bug/1400015,n,zRebuild: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:rebuild_schedule,n,zMigration/live-migration on going…

Scheduling-hints can’t persistYou only can specific your scheduling policy at the beginningViolate the policy after migrationhttps://review.openstack.org/88983 block in K, maybe L

Race Problemthe bug link https://bugs.launchpad.net/nova/+bug/1341420scheduler_host_subset_size=N

Ironic integrationhttps://bugs.launchpad.net/nova/+bug/1402658

Page 27: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

Any more problem for scheduler?

Only do initial placement!Each project have own scheduler

Page 28: Nova Scheduler Shane Wang (王庆), Intel Open Source Technology Center 微信号: qq559382

DRS in Openstack

Gantt

Tetris https://docs.google.com/document/d/1DMsnGxQ3P-OwZCF3uxaUeEFaKX8LqUqmmgQ_7EVK7Y8/edit

Purview(Tetris) will provide framework to quickly implement and enforce different kinds of policies. Policies can be different types. Here are a few examples of policies in clouds: Availability Policies, Performance Policies, Load balancing Policy, User Defined Policy.

Congress https://wiki.openstack.org/wiki/CongressCongress is a policy-based management framework for the cloud. It is designed to work with any cloud software that reasonably fits

within the relational data model. It automatically prevents policy violations when possible and corrects them when not, and it enables administrators to control the extent to which enforcement is automatic

Tetris is domain-specific policy system Congress is domain-independent policy system

domain-independent and domain-specific policy systems are highly complementary