Start forming a top level layer that orchestrates creating a instance via finite states
In order to start working in orchestration, i am going to propose we take a path that slowly starts to move the basics in to the run_instances method and corresponding states.
Blueprint information
- Status:
- Complete
- Approver:
- None
- Priority:
- Undefined
- Drafter:
- Yahoo Openstackers!
- Direction:
- Needs approval
- Assignee:
- None
- Definition:
- Obsolete
- Series goal:
- None
- Implementation:
- Unknown
- Milestone target:
- None
- Started by
- Completed by
- John Garbutt
Related branches
Related bugs
Sprints
Whiteboard
-- Jeff
The goal is to work toward a central point of state management that is responsible for the various phases of orchestration in order to avoid race conditions, unclear state from daemons that aren't responsive, and to reduce complexity of nova-compute. Getting to a state where an orchestrator tracks the state of the end-to-end status of a request and has sole responsibility for delegating actions to the various other daemons via RPC will also make it easier to isolate db/resource allocations in a place where they can be more easily maintained and improved over time.
-- Vish
Not sure how this is different from the extensive analysis of transition that Yun did[1]. Is this just to add rollback capabilities?
[1] https:/
-- Josh
So, adjusting this blueprint. I think we just need to start something like the following to get this going.
1. Perform validation of input queries in a single place (its distributed all over) and start to form a
model of the incoming request which is passed around to the different components as needed
and which is initially formed from this validation/
2. Send the validated input to a component, ie 'nova-orc' instead of sending it to the scheduler
3. Have 'nova-orc' perform the following (to start)...
a. Initiate requests with the scheduling component to determine where the instances will be and
reserve those instances, ie lets call them 'pseudo-vms' at this stage.
b. Initiate requests with the network component to determine what those 'pseudo-vms' networking
will be and reserve those networks (this allows quantum or others to being there background
processes)
c. Perform the same thing as b. with the volume management layer (ie cinder)
d. Create a fullly defined 'real-vm' specification for the given 'pseudo-vm's and call the nova-
compute entity where the 'real-vm' should be to establish that vm as a 'real-vm'.
* This eliminates nova-compute from asking what its networks should be, what its volumes
should be and allows it to just establish what the 'nova-orc' component has specified to create
(the fully defined vm spec).
* This allows nova-compute to be 'dumb' (and also allows for further complete disconnection from
the database).
The benefits of this:
1. One component can deal with all the state transitions and recovery of those states instead of
having X components do it in Y different locations in Z different 'most likely' incorrect & different
manners.
2. Enables the path where the scheduler can begin to make combined decisions about all
resources that a 'pseudo-vm' will use and allows for that scheduling entity to make the 'best'
decision about where those set of resources should be.
3. Begins to reign in the state-transition madness that makes it incredibly hard to debug and causes
inconsistencies in many different parts of nova (ie, exception and fault handling...
My plan of attack:
1. Begin to refactor the 'run_instances' method to do all the above
2. Begin to apply the above to later state transitions, ie listed in the above google doc as this
becomes 'accepted' and 'proven' to be a good way to proceed.
3. Profit!
Links:
1. http://
2. http://
Gerrit topic: https:/
Addressed by: https:/
Begin adding a simple orchestration layer.
Addressed by: https:/
Begin adding a simple orchestration layer.
-- Jeff
The goal is to work toward a central point of state management that is responsible for the various phases of orchestration in order to avoid race conditions, unclear state from daemons that aren't responsive, and to reduce complexity of nova-compute. Getting to a state where an orchestrator tracks the state of the end-to-end status of a request and has sole responsibility for delegating actions to the various other daemons via RPC will also make it easier to isolate db/resource allocations in a place where they can be more easily maintained and improved over time.
-- Vish
Not sure how this is different from the extensive analysis of transition that Yun did[1]. Is this just to add rollback capabilities?
[1] https:/
-- Josh
So, adjusting this blueprint. I think we just need to start something like the following to get this going.
1. Perform validation of input queries in a single place (its distributed all over) and start to form a
model of the incoming request which is passed around to the different components as needed
and which is initially formed from this validation/
2. Send the validated input to a component, ie 'nova-orc' instead of sending it to the scheduler
3. Have 'nova-orc' perform the following (to start)...
a. Initiate requests with the scheduling component to determine where the instances will be and
reserve those instances, ie lets call them 'pseudo-vms' at this stage.
b. Initiate requests with the network component to determine what those 'pseudo-vms' networking
will be and reserve those networks (this allows quantum or others to being there background
processes)
c. Perform the same thing as b. with the volume management layer (ie cinder)
d. Create a fullly defined 'real-vm' specification for the given 'pseudo-vm's and call the nova-
compute entity where the 'real-vm' should be to establish that vm as a 'real-vm'.
* This eliminates nova-compute from asking what its networks should be, what its volumes
should be and allows it to just establish what the 'nova-orc' component has specified to create
(the fully defined vm spec).
* This allows nova-compute to be 'dumb' (and also allows for further complete disconnection from
the database).
The benefits of this:
1. One component can deal with all the state transitions and recovery of those states instead of
having X components do it in Y different locations in Z different 'most likely' incorrect & different
manners.
2. Enables the path where the scheduler can begin to make combined decisions about all
resources that a 'pseudo-vm' will use and allows for that scheduling entity to make the 'best'
decision about where those set of resources should be.
3. Begins to reign in the state-transition madness that makes it incredibly hard to debug and causes
inconsistencies in many different parts of nova (ie, exception and fault handling...
My plan of attack:
1. Begin to refactor the 'run_instances' method to do all the above
2. Begin to apply the above to later state transitions, ie listed in the above google doc as this
becomes 'accepted' and 'proven' to be a good way to proceed.
3. Profit!
Links:
1. http://
2. http://
Gerrit topic: https:/
Addressed by: https:/
Begin adding a simple orchestration layer.
Addressed by: https:/
Begin adding a simple orchestration layer.
This blueprint is not complete after a good year or so, marking as Obsolete to tidy up the Nova backlog. --johnthetubaguy (20th April 2014)