Comment 7 for bug 1587686

Revision history for this message
LeetMiniWheat (white-phoenix) wrote :

Appreciate you looking into this, I was only able to test your builds for about 5 hours on generic kernel version so far (doing some hardware upgrades at the moment, but my current test system is torture-test stable).

My test hardware was 2x (westmere) Intel Xeon E5620's (2 NUMA nodes) with 12GB (2GBx6) ECC RDIMMs on each CPU (24GB total) on ubuntu-server 16.04. ztest was ran on default /tmp however I had /tmp mounted on tmpfs with 10G limit, but from what I could tell it was not exceeding that limit.

I believe this issue becomes more apparent in 4.4.11 and 4.4.12 (and possibly 4.4.13 now) for some reason since those were failing for me within a few hours with this "fix" applied, whereas latest stable I compiled with fix seemed okay. I think there's some race conditions of some sort with newer kernels, especially since I saw different results on the lowlatency kernel awhile back (on the same stable release).

I'll do some more testing if I have some time, and I want to test this on some other distros as well but I think the fix might not work on future kernel releases that integrate 4.4.11, 4.4.12, and 4.4.13 since some of the patches may have changed some core functions which uncovered ZFS bugs again.

It's still possible it somehow only effects my hardware/OS only. Unless I was compiling the kernel strangely, I was doing a git clone from master-next, checking out latest stable (detached head) and applying/commiting the patch. My 4.4.11 and 4.4.12 builds were were manually applied cleanly from upstream on top of xenial master-next (neither were merged into master-next at the time), so that could also have been a possible issue - there was a few redundant patches I skipped that were already in master-next though.

However, the bug still stands on stock stable xenial kernel - and this patch seems to fix it (at least on generic, still unsure about lowlatency).

Compiling debian/ubuntu kernels from git is pretty complicated though with conflicting documentation. I was using this command after checking out and appluing patch:
fakeroot debian/rules clean
fakeroot debian/rules updateconfigs
fakeroot debian/rules binary-headers binary-generic binary-perarch
(or binary-lowlatency for lowlatency builds)
I'm not using cloud-tools packages.

Anyways I guess you can close this and it can be reopened if I have time to attempt to reproduce the bug. it's not a critical patch but it's queued for 0.6.5-release upstream so there's probably no harm including it in ubuntu kernel.

Thanks