journal is broken in unprivileged LXC and nspawn containers

Bug #1457054 reported by Martin Pitt
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Won't Fix
Medium
Unassigned
systemd (Ubuntu)
Fix Released
Medium
Martin Pitt
Vivid
Fix Released
Medium
Unassigned
Wily
Fix Released
Medium
Martin Pitt

Bug Description

Test case
-------------
- Under Ubuntu 15.04 (or 15.10), set up an unprivileged container as in https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/
- Boot it. You'll get a lot of errors like

  [FAILED] Failed to start Journal Service.
  systemd-journald-audit.socket failed to listen on sockets: Operation not permitted
  [FAILED] Failed to listen on Journal Audit Socket.

- The same happens with systemd-nspawn -b.

As a result, the journal isn't working at all, and you have a bunch of failed journal related units.

With a fixed systemd package, systemd in the container should realize that it cannot listen to the audit socket (as the kernel doesn't allow that -- the audit subsystem isn't fit for namespaces right now), and "sudo journalctl" should show the journal and systemd-journald.service should be running. These systemd fixes are sufficient for nspawn, but not completely for unprivileged LXC containers -- there the journal will start working, but systemd-journald-audit.socket will still keep failing (this is less important)

REGRESSION POTENTIAL: Very low. This only affects the fallback error code path if binding to the audit socket failed. In that case the journal is currently not working at all. This usually doesn't happen on real iron/VMs (they also always CAP_AUDIT_READ), so there is no practical change there.

Revision history for this message
Martin Pitt (pitti) wrote :
Changed in systemd (Ubuntu Wily):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Martin Pitt (pitti)
tags: added: systemd-boot
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu Vivid):
status: New → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

Fixed upstream:
  http://cgit.freedesktop.org/systemd/systemd/commit/?id=417a7fdc418
  http://cgit.freedesktop.org/systemd/systemd/commit/?id=01906c76c

However, there's one more detail to fix in unprivileged containers:

root@v:/# getpcaps $$
Capabilities for `608': = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_syslog,cap_wake_alarm,cap_block_suspend,37+ep

The cap_audit_* are a lie, the audit subsystem in current kernels isn't namespace aware and thus unprivileged containers can't have these caps. The failed systemd-journald-audit.socket unit there isn't a big deal, but this should be fixed in LXC.

description: updated
Changed in lxc (Ubuntu Wily):
importance: Undecided → Medium
Revision history for this message
Martin Pitt (pitti) wrote :
Changed in systemd (Ubuntu Wily):
status: In Progress → Fix Committed
Changed in systemd (Ubuntu Vivid):
status: Confirmed → In Progress
no longer affects: lxc (Ubuntu Vivid)
no longer affects: lxc (Ubuntu Wily)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1457054] Re: journal is broken in unprivileged LXC and nspawn containers

Quoting Martin Pitt (<email address hidden>):
> The cap_audit_* are a lie, the audit subsystem in current kernels isn't

To be pedantic, it is not a lie - you have that capability against your
own user namespace, but the only check for that capability is explicitly
against the initial user namespace.

But it certainly seems the easiest (short-term) workaround is to drop
that capability. Unfortunately that will be tough coordinate with the
(soon-coming) namespaced audit. If we drop it now in container configs,
how do we tell userspace to re-enable it when available. The cleaner
way from our pov would be for systemd to check using bind() whether it
has the access. Then as soon as the kernel provided the ability to
do that in a non-init userns, containers could use it.

To put it another way, the check for capability bounding set is always
explitily a check for capabilities against your user namespace. If the
question is "can I read audit logs", then "do I have CAP_AUDIT_READ in
my bounding set" is simply the wrong check.

Revision history for this message
Martin Pitt (pitti) wrote :

> To be pedantic, it is not a lie - you have that capability against your own user namespace,

Ah, so that says "you can do it", but it's never actually going to work? I guess that's just another expression of audit not working in namespaces then..

> Unfortunately that will be tough coordinate with the (soon-coming) namespaced audit.

Ooh, is that coming? Then I guess we shouldn't bother much, it's not an important problem. For the most part unpriv containers work fine now.

Changed in lxc (Ubuntu):
status: New → Won't Fix
Martin Pitt (pitti)
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.2 KiB)

This bug was fixed in the package systemd - 219-10ubuntu1

---------------
systemd (219-10ubuntu1) wily; urgency=medium

  * Merge with Debian experimental branch. Remaining Ubuntu changes:
    - Hack to support system-image read-only /etc, and modify files in
      /etc/writable/ instead.
    - Keep our much simpler udev maintainer scripts (all platforms must
      support udev, no debconf).
    - initramfs init-top: Drop $ROOTDELAY, we do that in a more sensible way
      with wait-for-root. Will get applicable to Debian once Debian gets
      wait-for-root in initramfs-tools.
    - initramfs init-bottom: If LVM is installed, settle udev,
      otherwise we get missing LV symlinks. Workaround for LP #1185394.
    - Add debian/udev.lvm2.init: Dummy SysV init script to satisfy insserv
      dependencies to "lvm2" which is handled with udev rules in Ubuntu.
    - Add debian/udev.lvm2.service to avoid running the dummy lvm2 init
      script.
    - Provide shutdown fallback for upstart. (LP: #1370329)
    - debian/extra/ifup@.service: Additionally run for "auto" class. We don't
      really support "allow-hotplug" in Ubuntu at the moment, so we need to
      deal with "auto" devices appearing after "/etc/init.d/networking start"
      already ran. (LP: #1374521) Also run ifup in the background during boot,
      to avoid blocking network.target. (LP: #1425376)
    - ifup@.service: Drop dependency on networking.service (i. e.
      /etc/init.d/networking), and merely ensure that /run/network exists.
      This avoids unnecessary dependencies/waiting during boot and dependency
      cycles if hooks wait for other interfaces to come up (like ifenslave
      with bonding interfaces). (LP: #1414544)
    - Add Get-RTC-is-in-local-time-setting-from-etc-default-rc.patch: In
      Ubuntu we currently keep the setting whether the RTC is in local or UTC
      time in /etc/default/rcS "UTC=yes|no", instead of /etc/adjtime.
      (LP: #1377258)
    - Put session scopes into all cgroup controllers. This makes unprivileged
      user LXC containers work under systemd. (LP: #1346734)
    - systemctl: Don't forward telinit u to upstart. This works around
      upstart's Restart() always reexec'ing /sbin/init on Restart(), even if
      that changes to point to systemd during the upgrade. This avoids running
      systemd during a dist-upgrade. (LP: #1430479)
    - Drop hwdb-update dependency from udev-trigger.service, which got
      introduced in v219-stable. This causes udev and plymouth to start too
      late and isn't really needed in Ubuntu yet as we don't support stateless
      systems yet and handle hwdb.bin updates through dpkg triggers. This can
      be dropped again with initramfs-tools 0.117.
    - Lower Breaks: to plymouth version which has the udev inotify fix in
      Ubuntu.
    - Lower libappamor dep to the Ubuntu version where it moved to /lib.
    - Lower apparmor Breaks: to the Ubuntu version that dropped $remote_fs.
    - Change systemd-sysv's conflicts to upstart-sysv. (LP: #1422681)
    - Make failure of boot-and-services NSpawn.test_boot non-fatal for now.
      This currently fails when being triggered by Jenkins, but is totally
     ...

Read more...

Changed in systemd (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Martin, or anyone else affected,

Accepted systemd into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/219-7ubuntu6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in systemd (Ubuntu Vivid):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Stéphane Graber (stgraber) wrote :

I can confirm that unprivileged vivid containers now start properly with the package from proposed.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 219-7ubuntu6

---------------
systemd (219-7ubuntu6) vivid; urgency=medium

  * Fix assertion crash with empty Exec*= paths. (LP: #1454173)
  * systemd-fsckd autopkgtest: Stop assuming that
    /etc/default/grub.d/90-autopkgtest.cfg exists.
  * systemd-fsckd autopkgtest: Add missing plymouth test dependency.
  * debian/tests/boot-smoke: Allow 10 seconds for systemd jobs to settle down.
  * Fix "tentative" state of devices which are not in /dev (mostly in
    containers), and avoid overzealous cleanup unmounting of mounts from them.
    (LP: #1444402)
  * journal: Gracefully handle failure to bind to audit socket, which is known
    to fail in namespaces (containers) with current kernels. Also
    conditionalize systemd-journald-audit.socket on CAP_AUDIT_READ.
    (LP: #1457054)
  * Add sigpwr-container-shutdown.service: Power off when receiving SIGPWR in
    a container. This makes lxc-stop work for systemd containers.
    (LP: #1457321)

 -- Martin Pitt <email address hidden> Thu, 21 May 2015 14:47:46 +0200

Changed in systemd (Ubuntu Vivid):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Mathew Hodson (mhodson)
Changed in systemd (Ubuntu Vivid):
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.