Eventually consistent scheduler host state

Registered by Yingxin

Nova scheduler is planned to scale for a much larger cloud (~10k nodes) in the
coming cells v2, but the widely-used filter scheduler is still suffering from
performance issue even in 1000-node deployment. This issue is majorly caused by
the poor implementation of scheduler-cache update model, which causes a time
consuming database block during every decision making processes. So a
work-around caching scheduler is implemented which excludes the involvement of
database during request processing, and the throughput is boosted to about
*850%* [1] as a result . However, caching scheduler still has a very naive
cache update model which makes it unsuitable to the rapidly changing
compute-node resources, because it still refreshes its entire scheduler cache
from database periodically, and it's 60 seconds by default.

So we propose a new scheduler host manager driver to apply an eventually
consistent update model to scheduler host states. It excludes the involvement
of database during decision making like caching scheduler. But the scheduler
caches are always *almost* up-to-date because they receive incremental updates
directly from compute nodes, and the schedulers know whether their decisions
are successful or not in compute nodes. The throughput of a prototype
implementation [2] is boosted to at least *510%* [3] compared with the legacy
implementation in the same 1000-node simulation.

Major modifications in current prototype[2]:
1. Add a rpcapi to send updates to schedulers(i.e. send_commit)
2. Add a rpcapi to notify compute node to start send updates to scheduler(i.e. report_host_state).
3. Add a rpcapi to notify scheduler to reset its host state(i.e. notify_schedulers).
4. Add an argument to the rpcapi build_and_run_instance to populate scheduler resource claims(commits) to compute nodes.
5. Monitor host state changes in compute node and send the related updates to schedulers.
6. Replace `host_manager` to `shared_host_manager`, which will keep host states up to date to the most accurate version of host state in compute nodes.
7. Verify/reject scheduler decisions based on host state in compute nodes.
8. Implement a robust model(based on versioned updates) to keep scheduler host state consistent.
9. Monitor scheduler claims from scheduler to resource tracker, in order to make sure every claim is handled properly.

[1] Austin summit session
    https://www.openstack.org/summit/austin-2016/summit-schedule/events/7129
    or a brief introduction here: http://paste.openstack.org/show/494336
[2] https://review.openstack.org/#/c/306301
[3] http://paste.openstack.org/show/494211

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Yingxin
Direction:
Needs approval
Assignee:
Yingxin
Definition:
New
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/eventually-consistent-scheduler-host-state,n,z

Addressed by: https://review.openstack.org/280047
    WIP: Eventually consistent host state prototype

Gerrit topic: https://review.openstack.org/#q,topic:bp/resource-providers-scheduler,n,z

Addressed by: https://review.openstack.org/290302
    split host_state.least_disk_mb out of free_disk_mb

Addressed by: https://review.openstack.org/306301
    WIP: Eventually consistent host state prototype

Addressed by: https://review.openstack.org/306844
    Eventually consistent scheduler host state

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.