Keystone rate limiting

Registered by Rafael Durán Castañeda

As bug 963098 describes Keystone isn't acting on consecutive failed logins, actually it doesn't provide any way of acting on user suspicious activity. In order to that, migrating the “ratelimit” middleware from Nova probably would help a lot on this, fixing the bug 963098 (rate limiting POST /tokens).

Rate limit spec
---------------------
The behavior for rate limiting is described by RFC 6585 (section 4):
1) The status code should be 429.
2) A 'Retry-After: number' header should be included.
3) Body should include details about the error.
Further details on [1]

User identification
---------------------------
As a general rule users can be identified by the given token, but some special cases must be considered too:
1) Requests that don't need authentication.
2) Authentication requests.
3) Admin users.

Request that doesn't require authentication could be associated to a special user, “the None user”, and add custom limits for that user. This way, under heavy load for non authentication required requests we can limit them with a generic server load error; but at the same time authenticated and authentication requests would still work.

Admin users shouldn't be limited at all.

We must pay special attention on authentication requests since this the most important situation. In this case we don't have any information about the user but the 'username' parameter, however this might be missing (in that case a 401 is the current return code). If the 'username' is present, two options here:
1) Map limits to 'username', and thus consider this as an special case too, avoiding a 'get_user_by_name' request to the Indentity API.
2) Get the user ID from the 'username' (this can be cached if we consider users never change ID).

First case would have better performance (no 'get_user_by_name' needed), and would be easier to integrate with Horizon since Horizon initially requests a unescoped token and then, after retrieving the tenant list, it tries to get a scoped token. See below for further details.

The Horizon use case
--------------------------------
If we don't use 'usernamene' and we don't consider authentication as an special case, 1) from last section, in the Horizon use case we can get errors like this:
* Consider a 3 request limit to POST /v2.0/tokens
* First try user enters wrong password
* Second try again wrong password
* Third time it writes the right password and gets an unescoped token
* Get the tenant list
* Now tries getting a scoped token, getting a rate limit error (4th request if all is done along a 60 seconds period), finally failing the log in and getting a retry-after of about 20 secs.
* Users waits 20 secs and try again, gets the unescoped token, tenant list and tries get a new scoped token, getting a new rate limit error unless the scoped token request is done after 40 seconds from the first rate limit error.

Backends
--------------
As most Keystone code does, implementation should allow multiple backends.

Dynamic limits
----------------------
This blueprint doesn't consider dynamic limits, but it might be useful some kind of “limit on demand” .

Code draft avaliable at GitHub [2]

[1] http://tools.ietf.org/html/rfc6585
[2] https://github.com/rafaduran/keystone/tree/bp/Keystone-rate-limiting

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
Rafael Durán Castañeda
Definition:
Obsolete
Series goal:
None
Implementation:
Beta Available
Milestone target:
None
Started by
Rafael Durán Castañeda
Completed by
Morgan Fainberg

Related branches

Sprints

Whiteboard

(morganfainberg): I'm going to mark this as obsolete. It's a very old spec and we should revisit what "Ratelimiting" means to keystone.

Hello Rafael- Keystone Folsom will no longer allow for unscoped tokens. If a tokenId/name is not provided in the POST /tokens call, then the system wills scope the token to the default token. If no default token is available, an error will be returned. This change in functionality means you do not need to pay any special consideration to the Horizon use case.

As for whether to use username or userId, I would prefer userId as it is less likely to change... as you indicate, caching could be used to make the lookup of userId less of a burden

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.