Add rpc method concurrency control

Registered by Edward Hope-Morley

This bluerint proposes adding support to the amqp rpc driver for limiting the
number of concurrent operations for specific rpc methods.

This is intended to be an optional extension to the current capability of
limiting the rpc thread pool (CONF.rpc_thread_pool_size) for all operations by
allowing an admin to limit concurrent operations based on the type of operation
being executed in order to mitigate the load/impact that certain operations
have on the node they are executed on e.g. for lower resource nodes in a
heterogenous cluster. Operations that one might want to limit could include
migrations on a specific hypervisor or volume formatting operations on a volume
node.

Our currently use cases for limiting specific actions are in cinder,
nova and glance where we may want to mitigate effects of certain requests on
the load of a set of nodes with limited resources.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
Edward Hope-Morley
Direction:
Needs approval
Assignee:
Edward Hope-Morley
Definition:
Obsolete
Series goal:
None
Implementation:
Started
Milestone target:
None
Started by
Edward Hope-Morley
Completed by
Mark McLoughlin

Related branches

Sprints

Whiteboard

It is useful to limit the number of concurrent processing of operations that cause high I/O load on the same hypervisor only.

Operations that cause high I/O load on the same hypervisor are as follows.

a) live migration
b) block migration
c) snapshot
d) resize
e) boot a VM
f) delete a VM
g) attach ISO to VM from Glance
h) copy file from Glance to Volume
i) copy file from Volume to Glance

Gerrit topic: https://review.openstack.org/#q,topic:bp/rpc-concurrency-control,n,z

NOTE: this patchset has been deprecated (moved from oslo-incubator to oslo.messaging as requested by markmc) and replaced by the one refernced below.
Addressed by: https://review.openstack.org/56640
    Adds rpc method concurrency limiting support

---

I've moved the blueprint to oslo.messaging - new RPC features should go there.

I'd agree with the comment in the review that this must not be driver specific. We need this to work with all RPC drivers.

I'd also like to see a discussion on the mailing list about the requirements and the design. IMHO, we're giving the admin some ability to workaround fundamental brokenness in our system ... and perhaps we should be discussing fixing the underlying issue instead.

Also, I think the configuration options are a bit weird - you can set a single concurrency limit that applies to a subset of methods, rather than a limit per method. Bear in mind too that there can be multiple RPC endpoints in the same process which have methods by the same name.

Rather than these kind of configuration options, I think I'd be inclined towards a separate "concurrency policy" json file which would allow you to associate a limit with each method e.g.

{
    "compute": {
        "baseapi": {
            "ping": 10
        }
    }
}

But that really starts to feel like overkill, so I want to be sure we really need this feature.

-- @markmc

markmc: I've moved this across to oslo.messaging (see below). There do not appear to be driver-specific tests in oslo.messagin like there are in oslo-incubator so not sure where yopu want the unit tests.

With regards to your comment about underlying brokeness, could you elaborate on this? This new feature is really just adding a very simple extension to the existing ability to limit the size of the thread pool by allowing further limiting based on the type of operation being requested.

There are IMO some limitations to the existing design of the rpc pool e.g. the fact that nothing appears to stop a single tenant from consuming the pool with requests but that is not what this intends to solve and would be for a seperate task/discussion.

--

Ok so I think I am increasingly in agreement that this extension could lead to undesirable results. For example if the API received a load of requests that block and the API then crashes/gets restarted, it is possible and indeed likely that bad things will happen since in the case of synchronous calls the AMQP ack may have been sent back to the producer thus indicating to the producer(s) that the calls were successfully accepted and then you would end up with resources getting stuck cause the api would no longer have any knowledge of those requests having been sent.

This issue may well vary from service to service but is IMO a sufficient reason not to pursue this further in its current form.

--@dosaboy

--

Addressed by: https://review.openstack.org/58759
    Adds rpc method concurrency limiting support

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.