Ubuntu 20.04.2 LTS kernel 5.11.0-25 zfs send | receive broken

Bug #1939177 reported by Tim K.
66
This bug affects 9 people
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Fix Released
High
Dimitri John Ledkov
Focal
Fix Released
High
Unassigned

Bug Description

== SRU Justification Focal ==

[Impact]

https://github.com/openzfs/zfs/issues/12462

Ubuntu 20.04.2 LTS
Kernel: 5.11.0-25-generic #27~20.04.1-Ubuntu
zfs-0.8.3-1ubuntu12.12
zfs-kmod-2.0.2-1ubuntu5

Trying to run zfs send | receive and getting an error:

# zfs send 'rpool/home'@'autosnap_2020-08-01_00:59:01_monthly' | zfs receive -s -F 'nas/rpool_backup/home'
cannot receive: failed to read from stream
cannot receive new filesystem stream: dataset does not exist

This used to work before the recent Ubuntu kernel update from 5.8 to 5.11
Kernel 5.8 came with zfs-kmod-0.8.4-1ubuntu11.2

Ubuntu updates that broke it:

Upgrade: linux-headers-generic-hwe-20.04:amd64 (5.8.0.63.71~20.04.45, 5.11.0.25.27~20.04.10), linux-
image-generic-hwe-20.04:amd64 (5.8.0.63.71~20.04.45, 5.11.0.25.27~20.04.10), linux-generic-hwe-20.04
:amd64 (5.8.0.63.71~20.04.45, 5.11.0.25.27~20.04.10)

Sending the zfs send part to a file works, but then sending the file to zfs receive also fails. The dump file size seems reasonable but the contents may not be correct.

[Test Plan]

1. create test pool and backup pool

sudo zpool create pool /dev/vdb1
sudo zpool create backup /dev/vdc1

2. populate pool with some files and create some snapshots

sudo zfs snapshot pool@now1

create some more files etc, make another snapshot

sudo zfs snapshot pool@now2

3. perform send/recv using -s option:

sudo zfs send pool@now1 | sudo zfs receive -vFs backup
sudo zfs send -i pool@now1 pool@now2 | sudo zfs receive -vFs backup

Without the fix, the -s option on the receive fails. With the fix it works fine. Test with focal 5.4 and 5.11 kernel to exercise 0.8.x and 2.x kernel ZFS drivers.

[Where problems could occur]

The main fix nullifies the deprecated action_handle option so that it's not checked, this allows 0.8.x userspace it to be forwardly compatible with 2.x kernel ZFS and also since it is deprecated in 0.8.x it makes not difference to the 0.8.x kernel ZFS driver. Thus the risk with patch action_handle is very small.

Included in the fix is a send/recv upstream bug fix 4910-Fix-EIO-after-resuming-receive-of-new-dataset-over-a.patch that makes send/recv more resilient by making zfs receive to always unmount and remount the
destination, regardless of whether the stream is a new stream or a
resumed stream. The change is upstream for ~10 months and has minimal impact on current recv functionality.

Tags: zfs zfs-kmod
Tim K. (tkubnt)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Revision history for this message
versus167 (wingdvd-2008) wrote (last edit ):

Same problem here. But I think the transfer is correct:

root@vs-uefi:/home/volker# zfs version
zfs-0.8.3-1ubuntu12.12
zfs-kmod-2.0.2-1ubuntu5

root@vs-uefi:/home/volker# zfs send -i vs2016/source@1 vs2016/source@2 | zfs receive -vFs zfshome/source
receiving incremental stream of vs2016/source@2 into zfshome/source@2
snap zfshome/source@2 already exists; ignoring
cannot receive: failed to read from stream
cannot receive incremental stream: dataset does not exist

root@vs-uefi:/home/volker# zfs list -t snapshot vs2016/source
NAME USED AVAIL REFER MOUNTPOINT
vs2016/source@1 72K - 6,07M -
vs2016/source@2 4,21G - 4,21G -

root@vs-uefi:/home/volker# zfs list -t snapshot zfshome/source
NAME USED AVAIL REFER MOUNTPOINT
zfshome/source@1 18K - 6,28M -
zfshome/source@2 0B - 4,25G -

Revision history for this message
Colin Ian King (colin-king) wrote :

Appears that the -s option does not work. Can you double check if the recv works when not using the -s option?

Revision history for this message
Tim K. (tkubnt) wrote (last edit ):

FWIW I'm using it with sanoid/syncoid (https://github.com/jimsalterjrs/sanoid) which enables -s by default if supported by the zfs version.

syncoid has a --no-resume option that might work (but have yet to try).

Revision history for this message
Charles Hedrick (hedrick) wrote :

I've also seen this. I wondered if the problem is that the libraries and utilities haven't been updated. I doubt that it's intended for a version 0.8.3 ZFS send and receive to run on a ZFS 2.0.2. kernel.

Revision history for this message
Tim K. (tkubnt) wrote :

Yeah good point, not sure whether the userspace tools version needs to match the kmod version.

Revision history for this message
Tim K. (tkubnt) wrote :

I asked in the zfs github issue and the answer is that the versions must match. It might be an incompatibility between zfs-0.8.3-1ubuntu12.12 and zfs-kmod-2.0.2-1ubuntu5 though not sure how easy it is to deal with this when using HWE.

Revision history for this message
Michael Albert (albertmichaelj) wrote :

I want to confirm that I'm also facing the same issue and that adding the --no-resume option does indeed fix the issue for syncoid.

It does seem like the correct solution is to update the ZFS tools in the HWE stack when the kernel is updated. However, there does not seem to be a way to do this (short of compiling from source, which I would really like to avoid). Is there any possibility of moving the LTS HWE stack to a model where the ZFS tools are updated when the kernel module is updated?

Revision history for this message
Tim K. (tkubnt) wrote :

I opened a new bug to keep the tools in sync with the module for HWE.
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1939210

Changed in zfs-linux (Ubuntu):
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
description: updated
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu Focal):
status: New → Confirmed
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello TimK., or anyone else affected,

Accepted zfs-linux into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.8.3-1ubuntu12.13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zfs-linux (Ubuntu Focal):
status: Confirmed → Fix Committed
Revision history for this message
Simon Déziel (sdeziel) wrote :

It seems the amd64 build failed and left no buildlog to analyze. However the package built fine locally and I can confirm that it fixes the problem. I won't mark it as verified due to the launchpad build failure that probably needs investigation. Thanks Colin!

Revision history for this message
Simon Déziel (sdeziel) wrote :

Just a quick update to let you know that my local build is still working fine on all 5 test machines.

Revision history for this message
Lyndon Lapierre (ljlapierre) wrote :

While hot-fixing the issue is fine, I feel like this won't be the last bug like this we run into as long as we choose to mix-and-match versions.

Wouldn't the better solution be adding HWE ZFS userland packages to match the kernel module?

Revision history for this message
Jason Cullen (jcullen86) wrote :

The amd64 version still isn't built after the build failure, how can we get that build restarted?
Cheers

Revision history for this message
Tim K. (tkubnt) wrote :

> While hot-fixing the issue is fine, I feel like this won't be the last bug like this we run into as long as we choose to mix-and-match versions.
> Wouldn't the better solution be adding HWE ZFS userland packages to match the kernel module?

My thoughts exactly. Vote here.
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1939210

Revision history for this message
ianrumford (p-ian-s) wrote :

Hello all,

Hope everyone is well and safe.

This is a plea to push along the resolution of this bug.

I'm heavily invested in zfs and this has been a huge problem for me.

Just like TimK I'm a sanoid/syncoid user and have seen the zfs recv
problem. Changing syncoid's perl to not use "-s" worked for me. But
thats not been an end to my problems.

Originally I thought I had "lost" my main 10TB zpool for e.g hardware
reasons, but its not proved simple as that. I can import and read from
the zpool but see zfs timeouts regularly in syslog. If I try to write
to this pool it hangs. zfs list hangs as well. Eventually zfs seems
completely "stopped" and I have to reboot.

Assuming my "tank" was trashed, I had been using my (syncoid) backups
to reestablish my files in a new zfs pool on new hardware. Which
worked fine (+1 syncoid).

I've been trying to reestablish backups for my new pool/hardware this
morning and find whereas I can zfs send a 4.8GB snapshot ok, I can not
send a 7.2GB (or larger) snapshots - I get the rather anodyne message
"Input/output error".

I have also seen hangs when trying to zfs destroy snapshots on one of
my other pools. Whether the hanging destroy is the root cause or just
a casualty of other zfs stuff going on, I've no information.

I realise that relating the generally instability of my system is
pretty vague: its proving hard to pick apart the problems and
demonstrate repeatable effects.

So I'm eager to try the proposed fix but, as Jason has said, there is
no amd64 version yet in focal-proposed.

Can I encourage you to make the fix available please?

Really appreciate everybody's effort.

Thanks in advance.

Changed in zfs-linux (Ubuntu):
assignee: Colin Ian King (colin-king) → nobody
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
Colin Ian King (colin-king) wrote :

I've retriggered the -proposed build, it should be available in the next 24 hours

Revision history for this message
Lyndon Lapierre (ljlapierre) wrote :

The bugfix has been working as expected over the past 3 days on my 2 systems:

Intel NUC (kernel 5.11.0-37-generic)
GCP Instance (kernel 5.11.0-1020-gcp)

None of my syncoid jobs have failed, I am no longer using the --no-resume (-s) flag as a workaround.

Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.8.3-1ubuntu12.13

---------------
zfs-linux (0.8.3-1ubuntu12.13) focal; urgency=medium

  * Fix zfs receive -s when using ZFS 2.x kernel drivers (LP: #1939177)
    - 4910-Fix-EIO-after-resuming-receive-of-new-dataset-over-a.patch
    - 4911-compat-nullify-action-handle.patch

 -- Colin Ian King <email address hidden> Mon, 16 Aug 2021 15:55:52 +0100

Changed in zfs-linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote :

For some reason this doesn't seem to have the normal SRU verification tags, but this appears to be verified to me. Releasing.

Mathew Hodson (mhodson)
Changed in zfs-linux (Ubuntu):
status: Confirmed → Fix Released
Changed in zfs-linux (Ubuntu Focal):
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.