oslo-incubator

Automatic recovery from transient db connection failures

Registered by Viktor Serhieiev on 2013-11-20

There are a variety of circumstances which can cause a transient failure in database connections, for example:
- restart / upgrade of the database,
- migration of VIP between HA pair,
- just a network failure
- and so on.

All projects, connected to a database, would benefit from the db/api catching these "db-has-gone-away" errors and automatically reconnecting and retrying the last operation, in such a way that the caller is able to continue what ever operation was in process.

It is not necessary to abort long-running operations (such as nova boot or glance image-create) just because of a momentary interruption in db connectivity.

A (slightly brute-force) patch was previously proposed to Nova: https://review.openstack.org/#/c/10797/

Current bp is similar to Nova blueprint, proposed by Devananda van der Veen. See https://blueprints.launchpad.net/nova/+spec/db-reconnect

Blueprint information

Status:: Complete

Approver:: Mark McLoughlin

Priority:: Medium

Drafter:: Viktor Serhieiev

Direction:: Approved

Assignee:: Viktor Serhieiev

Definition:: Approved

Series goal:: Accepted for icehouse

Implementation:: Implemented

Milestone target:: 2014.1

Started by: Mark McLoughlin on 2014-02-05

Completed by: Viktor Serhieiev on 2014-02-10

Related branches

Related bugs

Sprints

Whiteboard

Gerrit topic: https://review.openstack.org/#q,topic:bp/db-reconnect,n,z

Addressed by: https://review.openstack.org/33831
Automatic retry db.api query if db connection lost

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

No subscribers.