Discussion on design and software considerations for making Nova HA/Fault Tolerant

Registered by Edward Konetzko on 2011-04-07

This session is to outline limitations and provide ideas/solutions on how to resolve HA issues with Nova components: database, messaging, and the managers.

Blueprint information

Status:
Complete
Approver:
None
Priority:
Undefined
Drafter:
Edward Konetzko
Direction:
Needs approval
Assignee:
Edward Konetzko
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Vish Ishaya on 2011-05-17

Related branches

Sprints

Whiteboard

This was a discussion and has no specific code requirements at the moment, so I'm marking it obsolete. New blueprints can be opened for specific features. --vish

The current planed outline of the session is below.

1. A short presentation on the current state of nova components.
2. Discussion on the messaging system
   a. Best way to make RabbitMQ HA
     I. http://www.rabbitmq.com/pacemaker.html
     II. http://www.rabbitmq.com/clustering.html
3. Discussion on the database
   a. Best way to make mysql HA
     I. Active passive DRDB failover
     II. Master/Slave with google fast promote patches
     III. Multi-Master mysql
   b. is Mysql the best technology?
     I. Replacement Ideas
4. Nova-API
5. Nova-Compute
6. Nova-network
7. Nova-Objectstore
8. Nova-scheduler
9. Nova-Volume

On 4 - 9 the discussion would focus on design changes and requirements that should be passed for the devs for consideration.

Goal of this session is to start development on making nova a fault tolerant self healing infrastructure designed with best practices e.g. multi-master, client fail-over, resource redirection, automatic service discovery.

NOTE: This is an inital idea and maybe too ambitious for 55 minutes.

(?)

Work Items