Automatic recovery from transient db connection failures
There are a variety of circumstances which can cause a transient failure in database connections, for example: restart / upgrade of the database, migration of VIP between HA pair, or just a network failure. Nova (and all projects connecting to a database) would benefit from the db/api catching these "db-has-gone-away" errors and automatically reconnecting and retrying the last operation, in such a way that the caller is able to continue what ever operation was in process. It is not necessary to abort long-running operations (such as nova boot or glance image-create) just because of a momentary interruption in db connectivity.
A (slightly brute-force) patch was previously proposed: https:/
Blueprint information
- Status:
- Complete
- Approver:
- Russell Bryant
- Priority:
- Low
- Drafter:
- aeva black
- Direction:
- Approved
- Assignee:
- Viktor Serhieiev
- Definition:
- Approved
- Series goal:
- Accepted for icehouse
- Implementation:
-
Implemented
- Milestone target:
-
2014.1
- Started by
- Viktor Serhieiev
- Completed by
- Viktor Serhieiev
Related branches
Sprints
Whiteboard
johnthetubaguy: re-setting priority, need to go through a design discussion, and this is not yet targeted for icehouse-1 anyways.
Gerrit topic: https:/
Addressed by: https:/
Automatic retry db.api query if db connection lost
Patch that should implement current blueprint
Addressed by: https:/
Automatic reconect to database (WIP)
'Proof-
This blueprint was implemented in Oslo and came to Nova with patch https:/
Updating to icehouse rc1, since it merged with the above change. --johnthetubaguy
Work Items
Dependency tree

* Blueprints in grey have been implemented.