Robust range handling in http GET requests
To minimize http transfers, bzr issue ranged requests, i.e. instead of downloading whole files, it requests only part of them by issuing GET requests with a header specifying which ranges it is interested in.
Some http servers and proxies don't or badly implement this feature (see bugs #62029 and #62276).
Bzr can me made more robust by implementing the following scheme:
- initially the http transport will try to issue multi-range requests,
- when the transport detects that a ranged GET request is returning bogus results, he will issue a new request with a single range. That single range will be defined by a start being the start of the first range (with ranges sorted) and the end, the end of the last range (i.e a single range enclosing all the requested ranges),
- when issuing a single range request, if bogus results are detected, the transport will issue a GET request for the whole file and process the ranges locally.
These two steps will be persistent: once a transport have established that a server lacks either multi or single range requests it will never issue that kind of request anymore to that server.
If the server is cloned (for connection sharing by example), it will transmit that information to the cloned transport.
If the connection should be closed to handle an error and then opened again against the same server, that information should be preserved too.
Implementation available in the bzr.urllib.
Blueprint information
- Status:
- Complete
- Approver:
- John A Meinel
- Priority:
- High
- Drafter:
- Vincent Ladeuil
- Direction:
- Needs approval
- Assignee:
- Vincent Ladeuil
- Definition:
- Approved
- Series goal:
- None
- Implementation:
- Implemented
- Milestone target:
- 0.13
- Started by
- Vincent Ladeuil
- Completed by
- Vincent Ladeuil
Related branches
Related bugs
Bug #62029: bzr + cherokee | Fix Released |
Bug #62276: invalid range access during branch over http | Fix Released |