zfs-initramfs fails with multiple rpool on separate disks

Bug #1867007 reported by Kevin Menard
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Triaged
Low
Unassigned
systemd (Ubuntu)
Invalid
Low
Unassigned
zfs-linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

== Test Case ==
1. On a multi disks setup, install Ubuntu with ZFS on disk 1
2. Reboot and make sure everything works as expected
3. Do a second installation and install Ubuntu with ZFS on disk 2
4. Reboot

* Expected Result *
GRUB should display all the machines available and let the user select which installation to boot

* Actual result *
- Only one machine is listed
- initramfs crashes because there are several pool with the same name but different IDs and import the pools by name
- Same problem in the systemd generator which will try to import all the rpools.

== Original Description ==

I had an Ubuntu old installation that used a ZFS root, using the layout described in the ZFS on Linux docs. Consequently, the pool name for my Ubuntu installation was "rpool". I'm currently encountering an issue with that pool that only allows me to mount it read-only. So, I'd like to replicate the datasets from there to a new device.

On the new device, I've set up a ZFS system using the Ubuntu 20.04 daily installer (March 9, 2020). This setup creates a new pool named "rpool". So, with both devices inserted, I have two distinct pools each named "rpool", one of which will kernel panic if I try to mount it read-write.

ZFS is fine with having multiple pools with the same name. In these cases, you use `zfs import` with the pool's GUID and give it a distinct pool name on import. However, the grub config for booting from ZFS doesn't appear to handle multiple pools with the same rpool name very well. Rather than using the pool's GUID, it uses the name, and as such, it's unable to boot properly when another pool with the name "rpool" is attached to the system.

I think it'd be better if the config were written in such a way that `update-grub` generated boot config bound to whatever pool it found at the time of its invocation, and not start searching through all pools dynamically upon boot. Just to be clear, I have an Ubuntu 20.04 system with a ZFS root that boots just fine. But, the moment I attach the old pool, also named "rpool", I'm no longer able to boot up my system even though I haven't removed the good pool and I haven't re-run `update-grub`. Instead of booting, I'm thrown into the grub command line.

Kevin Menard (nirvdrum)
description: updated
Kevin Menard (nirvdrum)
tags: added: zfs
affects: grub2 (Ubuntu) → zfs-linux (Ubuntu)
Changed in zfs-linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
summary: - ZFS won't boot if multiple rpools found
+ zfs-initramfs fails with multiple rpool on separate disks
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for your report. I reproduced the issue.

There are actually 3 different points to fix in grub and initramfs to display individual entries per machine in grub and import the pools by ID to prevent the name clash.

description: updated
Changed in grub2 (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Kevin Menard (nirvdrum)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.8.3-1ubuntu10

---------------
zfs-linux (0.8.3-1ubuntu10) focal; urgency=medium

  [ Jean-Baptiste Lallement ]
  [ Didier Roche ]
  * Make debian/patches/4000-zsys-support.patch more robust in case of
    corruption and faster:
    - Don’t call out zsys if the system hasn’t changed compared to its cache.
      This drastically speed up boot by some seconds.
    - Drop into emergency mode with some error message in dmesg with the
      failing command output in case some errors happen during revert.
      Prevents thus booting which would have screwed over root partition by
      creating a /boot non empty directory (like /boot/grub) in mounting over.
      Let now user still boot without reverting to be able to fix the issue.
    - Base what we import on cache file so that in the future (once grub is
      fixed) booting with multiple rpool and bpool. (LP: #1867007)
  * debian/patches/4500-fix-generator-invalid-cache.patch:
    - Removed, not needed anymore with previous patch enhancement.

 -- Didier Roche <email address hidden> Mon, 30 Mar 2020 17:05:32 +0200

Changed in zfs-linux (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Balint Reczey (rbalint) wrote :

Is there anything left to do in systemd?

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

Hey Balint. I just added the task post ZFS upload (the upload was yesterday and I added the task this morning) so indeed, there is some work needed, part of it being in systemd.

Basically, systemd isn’t capable of mounting datasets when pool names are duplicated on a machine
zfs-mount-generator generates .mount units with the pool name. systemd needs to either, for all poo«ls mactching the desired name
- prefers pool id matching zpool.cache
- check every pools for their dataset and import the first matching one (same dataset path)
- or the .mount unit should be able to import by ID and zfs-mount-generator upstream should generate a pool id somewhere in the unit file.

Changed in systemd (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → Triaged
Balint Reczey (rbalint)
Changed in systemd (Ubuntu):
importance: Undecided → Medium
Revision history for this message
satmandu (satadru-umich) wrote :

I'm not sure which package to file a bug report on, but I have multiple pools, and after this last update, bpool wouldn't mount properly due to files on /boot, and as a result update-grub updated grub.cfg with no info about my kernels. :/

Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

This is probably because your bpool is not in the zfs cache file.

Either reinstall from the beta image which has a fix in the installer, or:
- clean up any files and directories (after unmounting /boot/grub and /boot/efi) under /boot (not /boot itself)
- zpool import bpool
- zpool set cachefile= bpool
- sudo mount -a (to remount /boot/grub and /boot/efi)
- update-grub

-> you souldn’t have any issue on reboot anymore and will be equivalent to a new install from the beta image.

tags: added: rls-ff-incoming
Revision history for this message
Steve Langasek (vorlon) wrote :

Didier, there is a long-standing grub2 task on this bug, but from the comment history it's not clear to me that there are further changes required to grub or what those are, can you have another look at this please? (Given the description is "zfs-initramfs" failing, I would think that's not a grub bug, but I'm not sure.)

Changed in grub2 (Ubuntu):
assignee: nobody → Didier Roche (didrocks)
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

I will have a look (I don’t remember if the grub task is due to the grub.cfg generation or to grub code itself), but TBH, this is low priority on my list (downgrading the bug task priority as such, as this is a multi-system corner-case)

Changed in systemd (Ubuntu):
importance: Medium → Low
Changed in grub2 (Ubuntu):
importance: Medium → Low
tags: added: rls-ff-notfixing
removed: rls-ff-incoming
Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Triaged → Invalid
Changed in grub2 (Ubuntu):
assignee: Didier Roche (didrocks) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.