nova-api-quantum-create-port

Registered by Aaron Rosen on 2013-05-15

In this blueprint i'd like to move the quantum port-creation from nova-compute to nova-api. There are two reasons for this:

1) If a user boots two instances and has a port quota of one the vms will be scheduled and then land on a nova-compute node. Then, the nova-compute node tries to create the ports in quantum and fails due to a quota issue. Failing on the nova-api node before it gets scheduled would be better.

Related to --- https://bugs.launchpad.net/nova/+bug/1172808

2) Currently when booting an instance if you are using security_group_api=quantum nova-api is hardcoded to return default. If we created the ports upfront in nova-api then we could have quantum conditionally apply security groups to ports and return the correct reponse (with the correct security group) to the user who made the api call to launch an instance.

Would fix -- https://bugs.launchpad.net/nova/+bug/1175464

I haven't completely figured out how the clean up of ports should occur for failed instances but it seems to me that we can do this on the nova-compute side.

The only downside I see of moving this logic into nova-api is that we would slow down the response time from nova-api to provision instances.

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
Aaron Rosen
Direction:
Needs approval
Assignee:
Aaron Rosen
Definition:
Drafting
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

http://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html -- Aaron

The issue is not the location of the call.

The issue is one of transactionality - you want to create a neutron port implicitly while nova booting a machine, and you want all the Neutron and Nova calls to both succeed or both fail. If you can't have transactionality the old fashioned way with synchronous calls (and we can't) then you need eventual consistency: a task to clean up dead ports and the understanding that such ports may still be kicking around from previous attempts.

We should create the port, *then* attempt the attach using update - the create can succeed independently and any subsequent nova-compute attach will succeed on the previously created port rather than making a new one (possibly verifying that its 'attached' status, if the second call completed but didn't return, is a lie).

So:
create fails to return but port is created
-> run on 2nd compute node won't attempt the create, port already exists; port consumed, everything good

create returns, attach fails to return but port is attached
-> run on 2nd compute node won't attempt the create and will identify that the attachment state is bogus and overwrite it; port consumed, everything good
-> if last attempt, a port with a bogus attach is left hanging around in the DB; a cleanup job has to go looking for it and remove it; optionally anything else can spot its inconsistency and ignore or remove it. Risk of removal during the actual scheduling, in which case the schedule pass will fail; can set expiry time on port.

create succeeds, attach fails and we get to see that it's failed
-> clean up port

Moving the create may for other reasons be a good idea (because compute would *always* deal with ports and *never* with networks - a simpler API) - but it's nothing to do with solving this problem.
 -- ijw

Gerrit topic: https://review.openstack.org/#q,topic:bp/nova-api-quantum-create-port,n,z

Addressed by: https://review.openstack.org/60592
    Make network_cache more robust with neutron

Addressed by: https://review.openstack.org/60396
    Remove unneeded call to conductor in network interface

Addressed by: https://review.openstack.org/61871
    Remove unused variables in neutron api interface and neutron tests

Gerrit topic: https://review.openstack.org/#q,topic:bp/smarter-network-cache-update,n,z

Addressed by: https://review.openstack.org/64769
    Garbage collect neutron ports not in nw_cache

Addressed by: https://review.openstack.org/62104
    Correct network_model tests and __eq__ operator

Addressed by: https://review.openstack.org/73385
    Break out security group logic in allocate_for_instance

Addressed by: https://review.openstack.org/62108
    Update network_cache only if needed

Gerrit topic: https://review.openstack.org/#q,topic:master,n,z

Gerrit topic: https://review.openstack.org/#q,topic:bp/admin-event-callback-api,n,zhttp://lists.openstack.org/pipermail/openstack-dev/2013-May/009088.html -- Aaron

The issue is not the location of the call.

The issue is one of transactionality - you want to create a neutron port implicitly while nova booting a machine, and you want all the Neutron and Nova calls to both succeed or both fail. If you can't have transactionality the old fashioned way with synchronous calls (and we can't) then you need eventual consistency: a task to clean up dead ports and the understanding that such ports may still be kicking around from previous attempts.

We should create the port, *then* attempt the attach using update - the create can succeed independently and any subsequent nova-compute attach will succeed on the previously created port rather than making a new one (possibly verifying that its 'attached' status, if the second call completed but didn't return, is a lie).

So:
create fails to return but port is created
-> run on 2nd compute node won't attempt the create, port already exists; port consumed, everything good

create returns, attach fails to return but port is attached
-> run on 2nd compute node won't attempt the create and will identify that the attachment state is bogus and overwrite it; port consumed, everything good
-> if last attempt, a port with a bogus attach is left hanging around in the DB; a cleanup job has to go looking for it and remove it; optionally anything else can spot its inconsistency and ignore or remove it. Risk of removal during the actual scheduling, in which case the schedule pass will fail; can set expiry time on port.

create succeeds, attach fails and we get to see that it's failed
-> clean up port

Moving the create may for other reasons be a good idea (because compute would *always* deal with ports and *never* with networks - a simpler API) - but it's nothing to do with solving this problem.
 -- ijw

Gerrit topic: https://review.openstack.org/#q,topic:bp/nova-api-quantum-create-port,n,z

Addressed by: https://review.openstack.org/60592
    Make network_cache more robust with neutron

Addressed by: https://review.openstack.org/60396
    Remove unneeded call to conductor in network interface

Addressed by: https://review.openstack.org/61871
    Remove unused variables in neutron api interface and neutron tests

Gerrit topic: https://review.openstack.org/#q,topic:bp/smarter-network-cache-update,n,z

Addressed by: https://review.openstack.org/64769
    Garbage collect neutron ports not in nw_cache

Addressed by: https://review.openstack.org/62104
    Correct network_model tests and __eq__ operator

Addressed by: https://review.openstack.org/73385
    Break out security group logic in allocate_for_instance

Addressed by: https://review.openstack.org/62108
    Update network_cache only if needed

Gerrit topic: https://review.openstack.org/#q,topic:master,n,z

Gerrit topic: https://review.openstack.org/#q,topic:bp/admin-event-callback-api,n,z

Marking this blueprint as definition: Drafting. If you are still working on this, please re-submit via nova-specs. If not, please mark as obsolete, and add a quick comment to describe why. --johnthetubaguy (20th April 2014)

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.