Large Single Uploads

Registered by gholt on 2011-03-18

Goal: Allow > 5G single uploads, splitting the upload automatically making it as if the user uploaded several 5G segments separately and then created a manifest object to those segments (using the existing large file support).

Blueprint information

Status:
Not started
Approver:
None
Priority:
Undefined
Drafter:
None
Direction:
Needs approval
Assignee:
None
Definition:
Approved
Series goal:
None
Implementation:
Unknown
Milestone target:
None

Related branches

Sprints

Whiteboard

Here's an idea on how to implement this:

1) Proxy accepts request and begins streaming data to object servers (as normal)
2) When auto-chunk size is reached, proxy ends stream to storage node, sends additional data in HTTP footers to that storage node
3) Proxy continues reading from the client and sends to new object name.
4) Every subsequent auto-chunk size bytes causes the proxy to generate a new object name and send the next part of the stream there

The "additional data in HTTP footers" includes an X-Object-Manifest key/value. This means that the large-object support changes to support manifest files that are not zero bytes. Since the current implementation requires that the manifests are zero bytes, this should not have any compatibility issues.

The choice of auto-chunk size could be set based on the initial request in a header (X-Auto-Chunk-Bytes). If this header is not present, then the current semantics (fail after max object size bytes) apply. With this header, there would also be no limit to the Content-Length header. Question: What is set as the content-length on the manifest object in this case?

>>>>>
It would be nice if the upload was separate from the storing of the file itself. Meaning, allow the file to be uploaded in any number of chunks/sizes that the clients wants (even 5k chunks) but then the server puts them all back together into *one* file, not separate ones. This can easily be done via support of the http range headers. While the manifest file is an interesting feature, it assumes the client actually wants multiple files instead of one on the server - but there will be clients who really do want one large file, swift needs to support both variants.

>>>>>
You can already do a COPY of a manifest/segmented file to convert to a single file, assuming the result doesn't exceed the maximum file size limit for the cluster, 5G. A client should not be able to exceed the maximum cluster file size limit; that limit is there for cluster balance reasons.

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.