RPC improvements

Registered by Jesse Andrews

The messaging system is currently not used in a manner that handles errors/exceptions. If a worker is down, messages will queue for it until the worker is restarted, leading to both user experience issues and delays in service restoration as the worker has to process all queued messages before processing active messages. For example when launching a VM, a message is sent from the scheduler to a compute node. If the compute worker is down, the VM will be stuck in a scheduled but not launched state until the node has returned to service. The user in the mean time has probably launched another VM, and the requested VM is launched hours later.

Nova's use of the message queue needs to be defined. Changes to both how we use rabbitmq and logic to determine if a task has been completed needs to ripple through all nova code.

Blueprint information

Status:
Complete
Approver:
Vish Ishaya
Priority:
Undefined
Drafter:
Nova Orchestration Team
Direction:
Approved
Assignee:
Nova Orchestration Team
Definition:
Obsolete
Series goal:
None
Implementation:
Unknown
Milestone target:
None
Completed by
Vish Ishaya

Related branches

Sprints

(?)

Work Items

This blueprint contains Public information 
Everyone can see this information.