Fix reschedule up-calls

Registered by Matt Riedemann on 2018-08-03

This is really about fixing two bugs for cells v2:

https://bugs.launchpad.net/nova/+bug/1781286

https://bugs.launchpad.net/nova/+bug/1781300

Cold migration / resize can reschedule which kicks the request back to the migrate_server method in conductor. There are two "up call" issues with this:

1. The @targets_cell decorator looks up the instance mapping in the API DB for the instance. If the cell conductor isn't configured for the API DB, that lookup will fail with CantStartEngineError.

2. The migrate task looks up the AZ for the alternate host to set on the instance, but that requires hitting the aggregates table in the API DB, same failure as #1 if the API DB isn't configured. Note that this problem also exists for reschedules during server create.

Solutions:

1. Add a resize_reschedule method which doesn't use the @targets_cell decorator but otherwise is the same as the migrate_server call in conductor. The API would call migrate_server and the compute would call resize_reschedule.

2. When we're in super-conductor before casting to compute to build (or resize) the server, we already look up the AZ for the selected primary host and put that on the instance. We could also lookup the AZ for the alternate hosts and put them on the Selection objects which get passed down to compute. Then we don't need the up-calls to get the AZ since they'd be on the alternate host objects. Granted there is a race window here where the host AZ could change between scheduling and reschedule, but that should be rare.

--

This is tracked with a blueprint since both solutions above require RPC API version changes which we can't backport and is a non-trivial change in both cases which will require some more extensive testing.

Blueprint information

Status:
Complete
Approver:
melanie witt
Priority:
Low
Drafter:
Matt Riedemann
Direction:
Approved
Assignee:
Matt Riedemann
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Matt Riedemann on 2019-01-14

Whiteboard

Discussed at the PTG and for #1 in the @targets_cell decorator, we can just check to see if the API database is configured and if not, just assume we're already in the cell and don't attempt to lookup the instance mapping. If the API is calling the conductor method and the API DB isn't configured, we would have already failed because the API would fail to lookup the instance mapping (assuming nova-api and nova-conductor in this case are using the same config). -- mriedem 20180913

On the nova meeting there was nobody against this bp which are actually fixing nontrivial bugs. However only stephenfin and gibi was present from the core team so I haven't approved the bp yet. -- gibi 20190110

As discussed in #openstack-nova today after the nova meeting, dansmith is also OK with approving this with the only question being whether mriedem still wants to work on this, given he's also working on cross-cell resize. I'm going to approve it for now and will ask mriedem if he still wants to work on it next week when he's back from PTO. If needed, I'll defer it back out from Stein. -- melwitt 20190110

I'm not sure we still need the blueprint, it was originally more for tracking but at the last PTG we said both issues are bugs and could be fixed as such. I've already got a patch up for bug 1781300 as well: https://review.openstack.org/#/c/581912/ So we could probably just drop this blueprint since it's just a couple of bugs. -- mriedem 20190114

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.