Provide a distributed scheduler and data store for scalability and reliability

Registered by Eric Day

The current default scheduler and data model (simple abstraction with a SQLAlchemy backend) works fine for small to medium sized local installations, but we need to discuss plans for large installations, possibly spanning multiple data centers. See the specification URL for more details and discussion.

Blueprint information

Status:
Complete
Approver:
Rick Clark
Priority:
Essential
Drafter:
Eric Day
Direction:
Approved
Assignee:
Sandy Walsh
Definition:
Approved
Series goal:
Accepted for diablo
Implementation:
Implemented
Milestone target:
milestone icon 2011.3
Started by
Vish Ishaya
Completed by
Sandy Walsh

Whiteboard

First appeared in diablo-2

Moved to diablo and targetted to milestone 2. Ed, if you think this needs to move to a different milestone, let me know. -Vish

From all the summit sessions and hallway conversations, can you restate your design. I am a little unsure about what decisions were made. - dendrobates

See the spec link. :)

I read the spec, I didn't find the info I was looking for. i.e. In the session we seemed to decide that sqlite on the nodes was a good idea. I think it was discussed more in hallway talks. I don't see mention of it in the spec.

I've expanded on some parts of the spec, does it make sense now? The main idea is that we are doing the same thing as discussed in the summit session, but it will be implemented as an optional scheduler, and not as the only way to run the system.

----

At the ozone sprint planning session in Jan '11, we worked with jaypipes to come up with a data design that would allow for efficient scheduling in both a single "zone" and nested zone design. This would require a database for each zone, which would only be responsible for the the zones or hosts immediately below it. In other words, there would not be a central database to manage all of the complexity involved in a nested zone design, and in the simplest cases where nova is run as a single zone, no changes would be needed.

Tasks:

 * The multi-cluster spec (http://wiki.openstack.org/MultiClusterZones) will need to be finalized, as the design of the scheduler depends directly on the organization around the zones.
 * We will need to define the attributes of the zones and hosts, and how to best represent them in the data design. This is handled primarily by the "multi-cluster-in-a-region" blueprint, but will need to be modified to accommodate dynamically specifying the data points tracked for each compute host.
 * The current scheduler is simply a random picker. We will need to create a scheduler that can run in each nested zone as well as the top-level zone, which will filter potential hosts by requirements criteria and select the best few candidates based on weighting criteria.
 * We also need to add logic to the scheduler to determine that if it is a zone scheduler with no hosts, how to take requests and pass them on to its child zones, and then aggregate the responses to return to the requester.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.