Object Replicator 2

Registered by gholt on 2013-02-01

Goal: Improve the speed and reliability of the object replicator.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Medium
Drafter:
gholt
Direction:
Needs approval
Assignee:
gholt
Definition:
New
Series goal:
Accepted for icehouse
Implementation:
Implemented
Milestone target:
milestone icon 1.11.0
Started by
gholt on 2013-02-01
Completed by
John Dickinson on 2013-12-03

Related branches

Sprints

Whiteboard

As a cluster grows into the tens of billions of objects, the object replicator will show some shortcomings. Multi-hour cycles will stretch into days and over-replication during capacity additions becomes more noticeable. It works and has gotten us this far, but it needs improvement to keep growing.

Details are still to be determined, of course, but I'll describe the idea we've come up with so far.

The working plan is to change the hashes.pkl into a SQLite database of all objects tracked under a ring-partition. This index.db would be kept up to date at all times, only requiring file-system-walking if the database somehow gets corrupt. Replication then is a matter of comparing the two index.dbs to discover which files need to be transferred.

Upgrading a cluster would likely happen in steps: activate the code keeping the index.db up to date, activate the code to rebuild all index.dbs, then activate the full new object-replicator.

In order to have an index.db, all writes within a ring-partition need to update that db. While we could patch rsync to do so, it's probably best to move away from rsync at this time. Again, it has gotten us this far, but extensively modifying it with no team expertise doesn't seem the best path.

Instead, the idea is to replace rsync with backend PUTs and DELETEs. Benefits here are that things like index.db become easier and failure conditions can be solved at the object level instead of the ring-partition level. Downside is that rsync is pretty well tuned for what it does, so PUTs and DELETEs are going to be less efficient. The gains from index.db should way outweigh the loss in transfer efficiency though.

This kind of change is going to take a while to implement and will require a lot of careful testing.

The first part of this, replacing rsync calls, is under review at: https://review.openstack.org/#/c/44115/

(Marked as completed after the above patch has merged. Subsequent features, as described above, will be in other blueprints)

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.