Put glance cache under a different prefix and port

Registered by Flavio Percoco

Current implementation adds cache resources when loading the cache middleware[0].

The idea is to move cache related paths under a different prefix "/cache", and make it listen on a different host / port but running under the same process as glance-api.

[0] https://github.com/openstack/glance/blob/master/glance/api/middleware/cache_manage.py#L36

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Discussion
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

We'll need to keep cache management available under v1 for compatibility, but that shouldn't block any progress here!
-markwash

<jbresnah>
I have some thoughts on this issue so I would like to flesh out the understanding of it that I have come to via a discussion with markwash.

A user can currently ask glance-api for information about cached images. A cached image is a image that has been pulled out of the backing store (swift, s3, etc) and stored on something more local to the particular glance-api service (like its local file system).

The API can then be queried to determine what images are cached. The problem is that several glance-api servers can be fronted by a load balancer. Thus the information returned from a cache look up will vary based upon what host the load balancer redirects the request to. Because of this it is difficult to make use of the information returned by cache service.

One solution, as pointed out here, is to run a cache service on each replicated host. No load balancer would be in front of those hosts, instead the client would contact each host directly to determine caching information and thus the results would not vary.

While this address the specific issue, I would like to explore another route. The blue print here https://blueprints.launchpad.net/glance/+spec/multiple-image-locations allows for replicated locations of the image to be exposed to a client. In a real sense, the cached image is just another replicated location. Instead of adding a new service for operators to manage on each endpoint I would like to explore the idea of treating cached images in this way.

In order for a client to know specifically where a a file URL (which a cached image will most likely be) is additional information will have to come with the image location beyond the URL. The blueprint here begins to address that work: https://blueprints.launchpad.net/glance/+spec/direct-url-meta-data
</jbresnah>

I added a blank etherpad as the specification, so we can flesh out details there.
-markwash

Hey,

We are on the same page, I guess. So, the idea is to have a way to keep the images cached in something more local that can reduce the access time. I don't agree that much with the fact that the client could access the glance-cache directly without passing through the glance-api service.

I'm thinking of this like:

1) Client asks for an image
2) Glance API checks whether it's cached in any of the glance-cache nodes that might be running
3) If cached, glance API returns that location instead of the backing store one, otherwise, it just returns the remote location / image.

What I'm trying to say is that it would be nice to treat images cached in the cache service as if they were remote locations. Though, having a separate service gives space for some extra improvements in the cache side like the ones in the description.

Does this make sense?

This specific separated service was meant to be address by the blueprint depending on this.

-- flaper87

<jbresnah>
I am simply making a more general statement about how this could be approached. It seems that Glance is going in the direction of doing replica management such that it turns logical names (Image IDs) into physical locations (URLs). A cached URL should be a lesser included of proper replica management service and in that way we could avoid special cases.

I do understand that we would likely still need to be a admin interface for clearing the cache and other functions.
</jbresnah>

Hi Flavio,

In general I think these ideas of better cache management are good. However, I'm tempted to say they are out of scope for glance proper. Really, the better types of caching and efficient transfer of bulk data feel like they belong either in a protocol (like bittorrent) or in a separate bulk data transfer service. How do you feel about that answer? Let me know in irc or email b/c I might not notice updates to this whiteboard.

markwash rejected 2014-03-14

(?)

Work Items

Dependency tree

* Blueprints in grey have been implemented.

This blueprint contains Public information 
Everyone can see this information.