PCI devices are sometime not freed after a migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Undecided
|
Steven Webster |
Bug Description
Description
===========
During stress testing of cold migration, it has been observed that sometimes the PCI devices are not freed by the resource tracker on the source node.
If on the source node the periodic resource audit kicks-in in the middle of the migration, the instance uuid is moved from tracked_migrations to tracked_instances. In which case the PCI devices won't get freed because the current logic in the code only cares about tracked_migration (see https:/
Steps to reproduce
==================
1) Boot a guest with a SR-IOV device.
2) Migrate and confirm the migration
3) Repeat 2 over and over
Expected result
===============
In this case the PCI devices will only get freed on the next periodic audit. For PCI resources such as PCI passthrough, those are limited in number and should be freed right away.
Actual result
=============
The PCI devices are not freed during the confirm_resize stage.
Environment
===========
$ git log -1
commit 633c817de5a67e7
Author: Matt Riedemann <email address hidden>
Date: Sat Nov 12 11:59:13 2016 -0500
api-ref: fix server_id in metadata docs
The api-ref was saying that the server_id was in the body of the
server metadata requests but it's actually in the path for all
of the requests.
Change-Id: Icdecd980767f89
Closes-Bug: #1641331
Changed in nova: | |
assignee: | nobody → Ludovic Beliveau (ludovic-beliveau) |
status: | New → In Progress |
Changed in nova: | |
assignee: | Ludovic Beliveau (ludovic-beliveau) → Steven Webster (swebster-wr) |
Reviewed: https:/ /review. openstack. org/370374 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=3a4909ae7e6 294e45f09950ebc a0b3d7126c80af
Committed: https:/
Submitter: Jenkins
Branch: master
commit 3a4909ae7e6294e 45f09950ebca0b3 d7126c80af
Author: Ludovic Beliveau <email address hidden>
Date: Wed Sep 14 14:44:46 2016 -0400
Release PCI devices on drop_move_claim()
On cold migration, drop_move_claim() is called in the confirm stage on the instances.
source node. Since the migration is being tracked by the resource tracker on
the destination node, the source node has the instance in it's
tracked_
So in this case the PCI devices were only freed on the next periodic audit.
For PCI resources such as PCI passthrough, those are limited in number and
should be freed right away.
This patch fixes drop_move_claim() to also free PCI devices when an instance instances( ).
is in self.tracked_
Co-Authored-By: Steven Webster <email address hidden> 0048519c571ffaa 11c025ad048
Change-Id: Ie3392f80dfd265
Closes-Bug: #1641750