OpenStack Compute (nova)

Provide a distributed scheduler and data store for scalability and reliability

Registered by Eric Day on 2010-10-15

The current default scheduler and data model (simple abstraction with a SQLAlchemy backend) works fine for small to medium sized local installations, but we need to discuss plans for large installations, possibly spanning multiple data centers. See the specification URL for more details and discussion.

Read the full specification

Blueprint information

Status:: Complete

Approver:: Rick Clark

Priority:: Essential

Drafter:: Eric Day

Direction:: Approved

Assignee:: Sandy Walsh

Definition:: Approved

Series goal:: Accepted for diablo

Implementation:: Implemented

Milestone target:: 2011.3

Started by: Vish Ishaya on 2011-05-06

Completed by: Sandy Walsh on 2011-06-15

Related branches

Related bugs

Sprints

ods-b

Whiteboard

First appeared in diablo-2

Moved to diablo and targetted to milestone 2. Ed, if you think this needs to move to a different milestone, let me know. -Vish

From all the summit sessions and hallway conversations, can you restate your design. I am a little unsure about what decisions were made. - dendrobates

See the spec link. :)

I read the spec, I didn't find the info I was looking for. i.e. In the session we seemed to decide that sqlite on the nodes was a good idea. I think it was discussed more in hallway talks. I don't see mention of it in the spec.

I've expanded on some parts of the spec, does it make sense now? The main idea is that we are doing the same thing as discussed in the summit session, but it will be implemented as an optional scheduler, and not as the only way to run the system.

----

At the ozone sprint planning session in Jan '11, we worked with jaypipes to come up with a data design that would allow for efficient scheduling in both a single "zone" and nested zone design. This would require a database for each zone, which would only be responsible for the the zones or hosts immediately below it. In other words, there would not be a central database to manage all of the complexity involved in a nested zone design, and in the simplest cases where nova is run as a single zone, no changes would be needed.

Tasks:

* The multi-cluster spec (http://wiki.openstack.org/MultiClusterZones) will need to be finalized, as the design of the scheduler depends directly on the organization around the zones.
* We will need to define the attributes of the zones and hosts, and how to best represent them in the data design. This is handled primarily by the "multi-cluster-in-a-region" blueprint, but will need to be modified to accommodate dynamically specifying the data points tracked for each compute host.
* The current scheduler is simply a random picker. We will need to create a scheduler that can run in each nested zone as well as the top-level zone, which will filter potential hosts by requirements criteria and select the best few candidates based on weighting criteria.
* We also need to add logic to the scheduler to determine that if it is a zone scheduler with no hosts, how to take requests and pass them on to its child zones, and then aggregate the responses to return to the requester.

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information

Everyone can see this information.

Subscribers

Armando Migliaccio

Bob Blair

Brian Waldon

Dan Mihai Dumitriu

David Pravec

Devin Carlen

Enol Fernández

Ewan Mellor

Fabio Bracci

Hisaki Ohara

Hyok S. Choi

Isaku Yamahata

Jason Venner

Jay Pipes

Jim Mlodgenski

Joshua McKenty

Jun Nakajima

Karandeep Singh

Kost

Krishna Sankar

Masanori Itoh

Moshe Melnikov

Mrittika Ganguli

Nirmal Ranganathan

Ravi Gururaj

Sandy Walsh

Sateesh

Shyam

Steve Romohr

Thierry Carrez

Thorsten von Eicken

Tushar Patil

Ulrich Schwickerath

Vish Ishaya

Wei Fang

Yair Hershko

Youcef Laribi

Zhixue Wu