Comment 9 for bug 1969247

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

given how zfs works, to me this is normal and expected.

Also this is expected on btrfs, and on xfs with cow turned on.

zfs is copy on write filesystem, thus calling fallocate to "reserve" a large amount of free space doesn't make sense. As the created file is immutable in zfs, and any writes to it will be written to a newly allocated quota in the zvol.

There is an option to tune the behaviour of fallocate https://manpages.ubuntu.com/manpages/jammy/en/man4/zfs.4.html

zfs_fallocate_reserve_percent=110% (uint)

Since ZFS is a copy-on-write filesystem with snapshots, blocks cannot be preallocated for a file in order to guarantee that later writes will not run out of space. Instead, fallocate(2) space preallocation only checks that sufficient space is currently available in the pool or the user's project quota allocation, and then creates a sparse file of the requested size. The requested space is multiplied by zfs_fallocate_reserve_percent to allow additional space for indirect blocks and other internal metadata. Setting this to 0 disables support for fallocate(2) and causes it to return EOPNOTSUPP.

This functionality / behaviour was added in zfs-0.8.0-847-gf734301d22 upstream, this is why you don't see it with bionic GA kernel, but do observe this new behaviour with bionic HWE kernels or newer.

If things are using fallocate, on top of zfs, they should be aware that fallocate on zfs proofs that the there is enough quota in the zvol pool. Thus mysql needs to gain zfs specific knowledge w.r.t. this.

Alternatively, you can disable fallocate support on your system by adding modprobe.d snippet to set zfs module option zfs_fallocate_reserve_percent to zero.