watcher

timeout for instance migration should be configurable

Registered by Ajay Tikoo on 2021-07-19

When Watcher executes an instance migration task, it polls for the status of the instance every one second to check if the migration already completed or errors out. This polling is done for 120 seconds (hard coded). In a typical production cluster, where there can be large number of memory writes going on within the instance, this limit of 120 seconds is often not adequate, resulting in the action being tagged as failed (even when the migration actually completes, but after 120 seconds), which in turn results cancelling of all subsequent actions and failure of the action plan itself. Therefore, I am proposing that this limit be a configurable value, which will default to existing 120 seconds if not specified in the configuration file. The configuration file will allow a new (optional) section 'nova_helper', which can contain the live_migration_timeout parameter. The changes that I am proposing are available in this GitHub commit: https://github.com/ajaytikoo/watcher/commit/8f4578ef3207ad07f441174aa3eca43ae9786e1c

Blueprint information

Status:: Not started

Approver:: None

Priority:: Undefined

Drafter:: Ajay Tikoo

Direction:: Needs approval

Assignee:: None

Definition:: New

Series goal:: None

Implementation:: Unknown

Milestone target:: None

Related branches

Related bugs

Sprints

Whiteboard

(?)

Work Items

This blueprint contains Public information

Everyone can see this information.

Subscribers

Ajay Tikoo