Start forming a top level layer that orchestrates creating a instance via finite states

Registered by Joshua Harlow

In order to start working in orchestration, i am going to propose we take a path that slowly starts to move the basics in to the run_instances method and corresponding states.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
Yahoo Openstackers!
Direction:
Needs approval
Assignee:
None
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
John Garbutt

Related branches

Sprints

Whiteboard

-- Jeff
The goal is to work toward a central point of state management that is responsible for the various phases of orchestration in order to avoid race conditions, unclear state from daemons that aren't responsive, and to reduce complexity of nova-compute. Getting to a state where an orchestrator tracks the state of the end-to-end status of a request and has sole responsibility for delegating actions to the various other daemons via RPC will also make it easier to isolate db/resource allocations in a place where they can be more easily maintained and improved over time.

-- Vish
Not sure how this is different from the extensive analysis of transition that Yun did[1]. Is this just to add rollback capabilities?
[1] https://docs.google.com/spreadsheet/ccc?key=0AsgKparJuTF2dE12Q0t0OGlwd25DTHdfVUNMT3IydWc#gid=0

-- Josh
So, adjusting this blueprint. I think we just need to start something like the following to get this going.

1. Perform validation of input queries in a single place (its distributed all over) and start to form a
    model of the incoming request which is passed around to the different components as needed
    and which is initially formed from this validation/transformation of input layer.
2. Send the validated input to a component, ie 'nova-orc' instead of sending it to the scheduler
3. Have 'nova-orc' perform the following (to start)...
 a. Initiate requests with the scheduling component to determine where the instances will be and
     reserve those instances, ie lets call them 'pseudo-vms' at this stage.
 b. Initiate requests with the network component to determine what those 'pseudo-vms' networking
     will be and reserve those networks (this allows quantum or others to being there background
     processes)
 c. Perform the same thing as b. with the volume management layer (ie cinder)
 d. Create a fullly defined 'real-vm' specification for the given 'pseudo-vm's and call the nova-
     compute entity where the 'real-vm' should be to establish that vm as a 'real-vm'.
     * This eliminates nova-compute from asking what its networks should be, what its volumes
     should be and allows it to just establish what the 'nova-orc' component has specified to create
     (the fully defined vm spec).
     * This allows nova-compute to be 'dumb' (and also allows for further complete disconnection from
        the database).

The benefits of this:

1. One component can deal with all the state transitions and recovery of those states instead of
    having X components do it in Y different locations in Z different 'most likely' incorrect & different
    manners.
2. Enables the path where the scheduler can begin to make combined decisions about all
    resources that a 'pseudo-vm' will use and allows for that scheduling entity to make the 'best'
    decision about where those set of resources should be.
3. Begins to reign in the state-transition madness that makes it incredibly hard to debug and causes
    inconsistencies in many different parts of nova (ie, exception and fault handling...

My plan of attack:

1. Begin to refactor the 'run_instances' method to do all the above
2. Begin to apply the above to later state transitions, ie listed in the above google doc as this
    becomes 'accepted' and 'proven' to be a good way to proceed.
3. Profit!

Links:
1. http://wiki.openstack.org/RepairingOwnership
2. http://wiki.openstack.org/SchedulingSplitout

Gerrit topic: https://review.openstack.org/#q,topic:bp/instance-create-state-machine,n,z

Addressed by: https://review.openstack.org/17426
    Begin adding a simple orchestration layer.

Addressed by: https://review.openstack.org/17434
    Begin adding a simple orchestration layer.
-- Jeff
The goal is to work toward a central point of state management that is responsible for the various phases of orchestration in order to avoid race conditions, unclear state from daemons that aren't responsive, and to reduce complexity of nova-compute. Getting to a state where an orchestrator tracks the state of the end-to-end status of a request and has sole responsibility for delegating actions to the various other daemons via RPC will also make it easier to isolate db/resource allocations in a place where they can be more easily maintained and improved over time.

-- Vish
Not sure how this is different from the extensive analysis of transition that Yun did[1]. Is this just to add rollback capabilities?
[1] https://docs.google.com/spreadsheet/ccc?key=0AsgKparJuTF2dE12Q0t0OGlwd25DTHdfVUNMT3IydWc#gid=0

-- Josh
So, adjusting this blueprint. I think we just need to start something like the following to get this going.

1. Perform validation of input queries in a single place (its distributed all over) and start to form a
    model of the incoming request which is passed around to the different components as needed
    and which is initially formed from this validation/transformation of input layer.
2. Send the validated input to a component, ie 'nova-orc' instead of sending it to the scheduler
3. Have 'nova-orc' perform the following (to start)...
 a. Initiate requests with the scheduling component to determine where the instances will be and
     reserve those instances, ie lets call them 'pseudo-vms' at this stage.
 b. Initiate requests with the network component to determine what those 'pseudo-vms' networking
     will be and reserve those networks (this allows quantum or others to being there background
     processes)
 c. Perform the same thing as b. with the volume management layer (ie cinder)
 d. Create a fullly defined 'real-vm' specification for the given 'pseudo-vm's and call the nova-
     compute entity where the 'real-vm' should be to establish that vm as a 'real-vm'.
     * This eliminates nova-compute from asking what its networks should be, what its volumes
     should be and allows it to just establish what the 'nova-orc' component has specified to create
     (the fully defined vm spec).
     * This allows nova-compute to be 'dumb' (and also allows for further complete disconnection from
        the database).

The benefits of this:

1. One component can deal with all the state transitions and recovery of those states instead of
    having X components do it in Y different locations in Z different 'most likely' incorrect & different
    manners.
2. Enables the path where the scheduler can begin to make combined decisions about all
    resources that a 'pseudo-vm' will use and allows for that scheduling entity to make the 'best'
    decision about where those set of resources should be.
3. Begins to reign in the state-transition madness that makes it incredibly hard to debug and causes
    inconsistencies in many different parts of nova (ie, exception and fault handling...

My plan of attack:

1. Begin to refactor the 'run_instances' method to do all the above
2. Begin to apply the above to later state transitions, ie listed in the above google doc as this
    becomes 'accepted' and 'proven' to be a good way to proceed.
3. Profit!

Links:
1. http://wiki.openstack.org/RepairingOwnership
2. http://wiki.openstack.org/SchedulingSplitout

Gerrit topic: https://review.openstack.org/#q,topic:bp/instance-create-state-machine,n,z

Addressed by: https://review.openstack.org/17426
    Begin adding a simple orchestration layer.

Addressed by: https://review.openstack.org/17434
    Begin adding a simple orchestration layer.

This blueprint is not complete after a good year or so, marking as Obsolete to tidy up the Nova backlog. --johnthetubaguy (20th April 2014)

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.