VM fails to start when vcpu placement='auto'

Bug #1621121 reported by bugproxy
68
This bug affects 27 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Won't Fix
Undecided
Unassigned

Bug Description

---Problem Description---
VM fails to start when vcpu placement='auto'

---uname output---
Linux ltc-test-ci1 4.4.0-9134-generic #53-Ubuntu SMP Thu Aug 18 05:21:43 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

Machine Type = power 8 ppc64le

---Steps to Reproduce---

1. Define the VM with xml attached
virsh define vm.xml
2. Start the VM
virsh start virt-tests-vm1
error: Failed to start domain virt-tests-vm1
error: unsupported configuration: numad is not available on this host

Userspace tool common name: ii libvirt-bin 2.1.0-1ubuntu3 ppc64el programs for the libvirt library

The userspace tool has the following bit modes: both

Userspace rpm: ii libvirt-bin 2.1.0-1ubuntu3 ppc64el programs for the libvirt library

ii numad 0.5+20150602-4 ppc64el User-level daemon that monitors NUMA topology and usage

attached debug log by enabling flag, export LIBVIRT_DEBUG=1

Not much info on qemu vm log
cat libvirt/qemu/virt-tests-vm1.log
2016-08-29 16:09:19.092+0000: shutting down
2016-08-29 16:16:08.378+0000: shutting down
2016-08-29 16:16:37.811+0000: shutting down
2016-08-29 16:16:45.579+0000: shutting down
2016-08-30 05:35:21.199+0000: shutting down
2016-08-30 05:40:28.999+0000: shutting down
2016-08-30 05:41:15.155+0000: shutting down
2016-08-30 05:44:04.427+0000: shutting down
2016-08-30 05:44:16.099+0000: shutting down
2016-08-30 05:49:12.327+0000: shutting down
2016-08-30 05:53:19.455+0000: shutting down
2016-08-30 05:53:56.171+0000: shutting down

service numad status
? numad.service - numad - The NUMA daemon that manages application locality.
   Loaded: loaded (/lib/systemd/system/numad.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2016-08-22 04:12:30 CDT; 1 weeks 0 days ago
     Docs: man:numad
 Main PID: 3563 (numad)
    Tasks: 2 (limit: 12288)
   Memory: 1.5M
      CPU: 6min 43.046s
   CGroup: /system.slice/numad.service
           ??3563 /usr/bin/numad -i 15

Aug 22 04:12:29 ltc-test-ci1 systemd[1]: Starting numad - The NUMA daemon that manages application locality....
Aug 22 04:12:30 ltc-test-ci1 systemd[1]: Started numad - The NUMA daemon that manages application locality..

looks more like a configuration issue, suspecting that libvirt package is configured without numa support

Can Canonical confirm if the libvirt package provides numa support?

Revision history for this message
bugproxy (bugproxy) wrote : LIbvirt debug log

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-145430 severity-high targetmilestone-inin1610
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → libvirt (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
libvirt is build --with-numactl on
  amd64 i386 ia64 mips mipsel powerpc ppc64el s390x

You can check effectively if it was built against it by checking for the lib depend like:
ldd /usr/sbin/libvirtd | grep numa
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f99e4a39000)

For further debugging until we can reproduce, can you enable more debugging of libvirt and add the more verbose libvirt log here so you and we can check for issues in it?
See:
 http://libvirt.org/guide/html/Application_Development_Guide-Connections-Debug.html

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1621121] Re: VM fails to start when vcpu placement='auto'

The failure is about numad (vs. libnuma).

I think what's needed here is libvirt to be built with
 '--with-numa=/path/to/numad' (which it isn't).

However, numad is part of universe and libvirt can't depend upon a package
in universe since it's part of main IIUC.

On Fri, Sep 9, 2016 at 12:57 AM, ChristianEhrhardt <
<email address hidden>> wrote:

> Hi,
> libvirt is build --with-numactl on
> amd64 i386 ia64 mips mipsel powerpc ppc64el s390x
>
> You can check effectively if it was built against it by checking for the
> lib depend like:
> ldd /usr/sbin/libvirtd | grep numa
> libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1
> (0x00007f99e4a39000)
>
> For further debugging until we can reproduce, can you enable more
> debugging of libvirt and add the more verbose libvirt log here so you and
> we can check for issues in it?
> See:
> http://libvirt.org/guide/html/Application_Development_Guide-
> Connections-Debug.html
>
> --
> You received this bug notification because you are a member of Ubuntu
> Xen Team, which is subscribed to libvirt in Ubuntu.
> https://bugs.launchpad.net/bugs/1621121
>
> Title:
> VM fails to start when vcpu placement='auto'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/
> 1621121/+subscriptions
>

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in libvirt (Ubuntu):
status: New → Confirmed
bugproxy (bugproxy)
tags: added: severity-critical
removed: severity-high
Jon Grimm (jgrimm)
Changed in libvirt (Ubuntu):
milestone: none → ubuntu-17.04
Revision history for this message
Jon Grimm (jgrimm) wrote :

Will investigate this in 17.04, as we have some investigation to do to add the feature. numad isn't part of main, so there will be an MIR assuming we determine that is a pre-req for this feature.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-10-04 06:04 EDT-------
Any Updates?

bugproxy (bugproxy)
tags: added: targetmilestone-inin1704
removed: targetmilestone-inin1610
Revision history for this message
Robert Williams (q-rob-c) wrote :

Appreciate this is due to be looked at in 17.04 - but - is there a workaround for manually linking numad so that libvirt finds it? From what i can see, it's been built with the library, so, providing numad is operational it should be a case of showing libvirt where it is. Unless i'm missing something?

I have several hosts where numad is installed and operational so if i knew where libvirt was looking for numad then maybe it can be linked in to get it working under 16.04 LTS?

Thanks in advance,

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1621121] Re: VM fails to start when vcpu placement='auto'

On Tue, Oct 25, 2016 at 6:04 AM, Robert Williams <email address hidden> wrote:

> Appreciate this is due to be looked at in 17.04 - but - is there a
> workaround for manually linking numad so that libvirt finds it? From
> what i can see, it's been built with the library, so, providing numad is
> operational it should be a case of showing libvirt where it is. Unless
> i'm missing something?
>
> I have several hosts where numad is installed and operational so if i
> knew where libvirt was looking for numad then maybe it can be linked in
> to get it working under 16.04 LTS?
>

I don't think this is possible; the feature is a build-time feature and we
cannot
enable it as it requires including numad which is part of Universe for a
package
in main.

As a workaround, you could rebuild the package with the added config flag
in a PPA.

I'll see if I can get such a build into my own PPA and share the debdiff
here.

numad itself will need a MIR[1] and be approved before it can be part of
main.

1. https://wiki.ubuntu.com/MainInclusionProcess

> Thanks in advance,
>
> --
> You received this bug notification because you are a member of Ubuntu
> Virtualisation team, which is subscribed to libvirt in Ubuntu.
> https://bugs.launchpad.net/bugs/1621121
>
> Title:
> VM fails to start when vcpu placement='auto'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/16211
> 21/+subscriptions
>

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
assignee: Taco Screen team (taco-screen-team) → 991 (asstaroid)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Everybody sure that last update by Skymathrix was correct - I didn't see any solution?
Please anybody confirm if there was an alternate solution found, otherwise we should set that back to confirmed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I wanted to at least confirm the suggestion of Ryan.

In the configure of libvirt source this is:
   --with-numad use numad to manage CPU placement dynamically
                           [default=check]

I built it with numad as build time dependency to see if it picks all up correctly.
It does, it ends up with:
  configure: numad: yes

As for the binary dependency I would only choose a "suggests".
Because to quote numad's NEWS file:
  Starting or stopping the 'numad' daemon while running a large virtual
  machine may be dangerous so installing (or removing) this package won't
  start (or stop) it.

Due to its nature numad can cause severe overhead as well as impact an existing system when moving or constraining numa attributed of processes.

So I was a bit afraid it could add a binary dependency via shlibs or so.
But it seems to be ok knowing numad exists - and since it is no lib but only a binary it does not add a dep.

Here in my newly built Deb dir:
for i in $(ls -1 *.deb | sed 's/.deb//'); do apt-cache show $i | grep numad; done
Suggests: libvirt-daemon-system, numad

The diff to get there is trivial, I'll attach it.
But as Ryan suggested the thing needed is a MIR for numad.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hey,
since I had it available for test builds I created a ppa with it.
=> https://launchpad.net/~paelzer/+archive/ubuntu/bug-1621121-libvirt-numad

@IBM - since I can think of all sorts of negative impacts when enabling numad - could you please take a look at the ppa. If you confirm that with it not only the auto-placement works, but also you didn't happen to run into other new issues we could start a MIR request for numad (which is fortunately not a lot of code).
But that means if possible a full test-suite set or whatever you have defined for that usually.

Changed in libvirt (Ubuntu):
status: Fix Released → Triaged
assignee: Skymathrix (asstaroid) → nobody
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

And as it clearly was not fixed removing that status again.

tags: added: patch
Revision history for this message
bugproxy (bugproxy) wrote : VM xml

Default Comment by Bridge

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Is there any comment along the xml that bugproxy is still holding back?

Revision history for this message
Nish Aravamudan (nacc) wrote :

@paelzer, so the build-depend can live in universe (since the archive reorg in 16.04), even if libvirt's source is in main -- and if you are manually specifying the resulting dependency to be a suggests (and not a recommends), then is a MIR even needed?

The real question I have is, if libvirt is built with numad support (and a path to the binary is embedded somehow), does libvirt properly handle that binary not being present at runtime?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Nish,
I had the same conclusion and IMO we don't need a MIR.

In fact on merging recent libvirt we get this "for free" without a Delta to be added.

I haven't tested the case of uninstalled numad yet, as I was (I'm still) waiting for the reporters to confirm if the suggested way of inclusion (merging Debian would be about the same) would suit their needs first.

Only then I wanted to go on investigating more to test/verify potential pitfalls from our side.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-01-12 04:53 EDT-------
reproducible in 16.04.2 , any plans to include the fix in this stream?

#uname -a
Linux powerkvm2-lp1 4.8.0-34-generic #36~16.04.1-Ubuntu SMP Wed Dec 21 18:53:20 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

dpkg -l|grep libvirt-bin
ii libvirt-bin 1.3.1-1ubuntu10.6 ppc64el programs for the libvirt library

dpkg -l|grep numad
ii numad 0.5+20150602-4 ppc64el User-level daemon that monitors NUMA topology and usage

# virsh start --console 1610
error: Failed to start domain 1610
error: unsupported configuration: numad is not available on this host

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Satheera,
the issue is now fixed in Zesty (Development release) and and the Dependency a suggest.
But there is no intention to bring that as a feature back to Xenial.
Especially since the daemon itself is only in universe I'd not consider it all too trustworthy.

I was ok with enabling since it is opt-in and only installed by user (no hard dependency), but putting that into Xenial would take serious convincing efforts to the SRU team.

If you want / need that please give it a try (https://wiki.ubuntu.com/StableReleaseUpdates), but I'm personally not yet convinced it is worth.

Changed in libvirt (Ubuntu Xenial):
status: New → Confirmed
Changed in libvirt (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - also there was a numad issue see bug 1640783.
In case you are verifying this please consider checking the version of numad to have that included.

Revision history for this message
bugproxy (bugproxy) wrote : Guest xml

------- Comment (attachment only) From <email address hidden> 2017-02-03 08:05 EDT-------

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Lacking SRU engangement for Xenial so far I'll set the Xenial task to incomplete for now.

Changed in libvirt (Ubuntu Xenial):
status: Confirmed → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment (attachment only) From <email address hidden> 2017-02-03 08:05 EDT-------

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'm passing through old bugs, lacking further discussion and any "forcing" argument why that would be needed in Xenial (which would be feature enablement in SRU, so that would have to be a very good argument) I'm closing the Xenial task to clean up.

Also if needed users have the option to get this in Xenial by using the Cloud-Archive Ocata or Pike.

Changed in libvirt (Ubuntu Xenial):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.