snapd 2.26.14 on ubuntu-core won't start in containers anymore

Bug #1709536 reported by Stéphane Graber
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Snap Layer
Invalid
Critical
Unassigned
snapd
Fix Released
Undecided
Michael Vogt
systemd (Ubuntu)
Fix Released
High
Dimitri John Ledkov
Xenial
Fix Released
Medium
Dimitri John Ledkov
Artful
Fix Released
High
Dimitri John Ledkov

Bug Description

[Impact]

Systemd treats a failure to apply the requested Nice value as critical to unit startup.

Unprivileged LXD containers do not allow the use of negative nice values. snapd will fail to start inside containers now that snapd uses a negative Nice value.

Aug 09 05:54:37 core systemd[1]: snapd.service: Main process exited, code=exited, status=201/NICE
Aug 09 05:54:37 core systemd[1]: snapd.service: Unit entered failed state.
Aug 09 05:54:37 core systemd[1]: snapd.service: Failed with result 'exit-code'.

The fix is for systemd to ignore permission errors when attempting to setup such custom nice values in containers.

I have confirmed that setting up a unit override by hand which sets Nice = 0 does resolve the problem.

[Test Case]

Boot a Xenial image in lxd:

$ lxc launch xenial x1
$ lxc exec x1 -- systemctl --state=failed

Observe failures for snapd :

● snapd.service loaded failed failed Snappy daemon
● snapd.socket loaded failed failed Socket activation for snapp

Install updated systemd from -proposed and get status: (lxc exec <container> reboot; lxc exec <container> systemctl status)

State: running
Jobs: 0 queued
Failed: 0 units

[Regression Potential]

Services will now run with a Nice value other than what was specified in the unit if it cannot be changed for some reason.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Added an Ubuntu systemd task.

Revision history for this message
Stéphane Graber (stgraber) wrote :

This bug affects anyone currently running ubuntu-core inside a LXD container as the current stable core snap is affected by this problem.

tags: added: lxd
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

commit 5b8e457f8d883fc6f55d33d46b3474926a495d29
Author: Dimitri John Ledkov <email address hidden>
Date: Tue Aug 1 18:51:20 2017 +0100

    Ignore failures to set Nice priority on services in containers.

Is in artful-proposed. Also please see - https://github.com/systemd/systemd/pull/6503 which needs further work to get merged upstream.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Is SRU of this fix to e.g. xenial's systemd desired?

Changed in systemd (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → ubuntu-17.08
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
Stéphane Graber (stgraber) wrote :

Yeah, xenial is where most of the snapd users are for us, so that'd certainly be desired.
We wouldn't need trusty though as snapd doesn't work inside trusty containers.

Revision history for this message
Oliver Grawert (ogra) wrote :

this is in snapd since may https://github.com/snapcore/snapd/pull/3270 why did this break all of a sudden ?

Revision history for this message
Michael Vogt (mvo) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

That was not useful at all @ogra @mvo
As on the 18th systemd migrated that can set Nice in artful.... yet you disabled it on the 18th.

Please back out your comment, and re-enable setting Nice in artful.

Manually uncommenting and rebooting artful container results in:

Aug 21 12:35:23 noinit systemd[329]: snapd.service: Failed to adjust OOM setting, assuming containerized execution, ignoring: Permission denied
Aug 21 12:35:23 noinit systemd[329]: snapd.service: Failed to adjust Nice setting, assuming containerized execution, ignoring: Operation not permitted
Aug 21 12:35:23 noinit systemd[329]: snapd.service: Executing: /usr/lib/snapd/snapd

Changed in systemd (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Dimitri John Ledkov (xnox)
Changed in systemd (Ubuntu Artful):
status: Fix Committed → Fix Released
Zygmunt Krynicki (zyga)
Changed in snapd:
assignee: nobody → Michael Vogt (mvo)
Stuart Bishop (stub)
Changed in layer-snap:
importance: Undecided → Critical
Revision history for this message
Stéphane Graber (stgraber) wrote :

So I'm confused, wasn't the SRU supposed to have been fixed for this?

We're still getting reports of users that have a broken snapd because of this issue, some of whom then decided to switch to privileged containers just to avoid this problem, therefore loosing a lot of LXD's security features and potentially exposing their hosts to attacks...

Revision history for this message
Jacek Nykis (jacekn) wrote :

Is there any workaround available other than switching to privileged containers?

Revision history for this message
Oliver Grawert (ogra) wrote :

@xnox

"As on the 18th systemd migrated that can set Nice in artful.... yet you disabled it on the 18th."

our development focus is 16.04 and we do not have release specific systemd units for the forward ported snapd packages so the comment will have to stay in until xenial has a fixed systemd ...

Revision history for this message
Stéphane Graber (stgraber) wrote :

As a workaround, you can override the snapd systemd unit with:

systemctl edit snapd

Then add:
  [Service]
  Nice=0

After saving the override, run "systemctl daemon-reload" and "systemctl start snapd"

Revision history for this message
Stuart Bishop (stub) wrote :

All charms using snaps are currently failing, so I'm looking forward to a snapd release with the commented out Nice. The alternative is adding the systemd override workaround to the snap layer and making everyone rebuild and republish their charms.

Changed in systemd (Ubuntu Xenial):
status: Confirmed → In Progress
Revision history for this message
Brian Murray (brian-murray) wrote : Missing SRU information

Thanks for uploading the fix for this bug report to -proposed. However, when reviewing the package in -proposed and the details of this bug report I noticed that the bug description is missing information required for the SRU process. You can find full details at http://wiki.ubuntu.com/StableReleaseUpdates#Procedure but essentially this bug is missing some of the following: a statement of impact, a test case and details regarding the regression potential. Thanks in advance!

Mathew Hodson (mhodson)
description: updated
Revision history for this message
Mathew Hodson (mhodson) wrote :

Artful was fixed in systemd 234-2ubuntu2 - Ignore failures to set Nice priority on services in containers.

---
systemd (234-2ubuntu2) artful; urgency=medium

  * Ignore failures to set Nice priority on services in containers.
  * Disable execute test on armhf.
  * units: set ConditionVirtualization=!private-users on journald audit socket.
    It fails to start in unprivileged containers.
  * boot-smoke: refactor ADT test.
    Wait for system to settle down and get to either running or degraded state,
    then collect all metrics, and exit with an error if any of the tests failed.

 -- Dimitri John Ledkov <email address hidden> Wed, 02 Aug 2017 03:02:03 +0100

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Stéphane, or anyone else affected,

Accepted systemd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/229-4ubuntu20 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Revision history for this message
Stéphane Graber (stgraber) wrote :

I've confirmed that snapd with Nice=-5 will start with the updated systemd.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 229-4ubuntu20

---------------
systemd (229-4ubuntu20) xenial; urgency=medium

  * resolved: recognize DNS names with more than one trailing dot as invalid
    (LP: #1600000)
  * Ignore failures to set Nice priority on services in containers.
    (LP: #1709536)
  * networkd: accept `:' in ifnames in systemd/networkd. (LP: #1714933)
  * initramfs-tools: trigger udevadm add actions with subsystems first.
    (LP: #1713536)
  * networkd: Add support to set STP value on a bridge. (LP: #1665088)
  * networkd: add support for AgeingTImeSec, Priority and DefaultPVID settings.
    (LP: #1715131)
    - Drop cherrypick of uint16 config parser, superseeded by above commit.
  * networkd: add support to set ActiveSlave and PrimarySlave. (LP: #1709135)
    - networkd: add support to configure ARP, depedency of Primary/ActiveSlave.

 -- Dimitri John Ledkov <email address hidden> Tue, 05 Sep 2017 14:01:51 +0100

Changed in systemd (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Michael Vogt (mvo)
Changed in snapd:
status: New → Fix Released
Changed in layer-snap:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.