Swift proxy-side read caching

Registered by Ondergetekende

The object servers impose relatively high overhead for small reads. Caching at the proxy node can alleviate this load, if the proxies are spec'd with a large amount of memory.

Intended workloads:
Large quantities of small reads with few writes, eg CDN

Design:
A memcache server is used to do the actual caching. For each swift object, one cache object is stored, composed of an the cache time, array of headers, and the actual object payload.

A WSGI filter sits before the proxy server, which handles the caching.

The WSGI filter adds an 'If-None-Match' and 'If-Modified-Since' HTTP header if:
- The original request didn't specify these.
- The object was found in cache.

For GET and HEAD requests: If the returned answer is 304 Not modified, the response is replaced with the cached object is returned. If the response is less than 500, the cache is invalidated. If the response status is 200 and the object is deemed cachable, it is added to the cache.

For other requests: If the response code is less than 500, the cache is invalidated.

An object is considered cacheable if:
- Its size does not exceed a configured maximum
- The request does not contain a Range header

Further points of interrest
- We'll have to handle Range transfers correctly
- Header changes are not reflected in the etag, so we might be serving stale
- Documentation should explain where the cache should be in the chain (e.g. after auth)

Ideas for further improvement:
- Allow storage of objects larger than the maximum memcache size.
- Allow write through instead of write around. (ie, cache PUT operations)
- Allow write back. (probably a bad idea)
- Leverage various Cache-Control flags to avoid contacting the object servers for cached objects.

Blueprint information

Status:
Not started
Approver:
John Dickinson
Priority:
Undefined
Drafter:
Ondergetekende
Direction:
Needs approval
Assignee:
Ondergetekende
Definition:
New
Series goal:
None
Implementation:
Not started
Milestone target:
None

Related branches

Sprints

Whiteboard

this is a very good idea, i suggest if we can cache object(user mark or hot) in node server self-memory, <email address hidden>

What advantages does this design offer over using a dedicated cache in front of the proxy server (something like Varnish)?
- Guaranteed cache validity and correct auth-token handling {ondergetekende}

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.