BUG: kernel NULL pointer dereference, address: 0000000000000000

Bug #1900889 reported by Simon Déziel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Fix Released
High
Colin Ian King
Focal
Invalid
High
Unassigned
Groovy
Invalid
High
Unassigned
Hirsute
Fix Released
High
Colin Ian King

Bug Description

While zfs send'ing from Bionic to Focal, my send/recv hung midway and I found this in the receiver's dmesg:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 94310 Comm: receive_writer Tainted: P O 5.4.0-52-generic #57-Ubuntu
Hardware name: System manufacturer System Product Name/C60M1-I, BIOS 0502 05/22/2014
RIP: 0010:abd_verify+0xa/0x40 [zfs]
Code: ff 85 c0 74 12 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 5b 5d c3 e8 04 ff ff ff eb e7 c3 90 55 48 89 e5 41 54 53 48 89 fb <8b> 3f e8 0f ff ff ff 85 c0 75 22 44 8b 63 1c 48 8b 7b 20 4d 85 e4
RSP: 0018:ffffb797c555baa8 EFLAGS: 00010286
RAX: 0000000000004000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
RBP: ffffb797c555bab8 R08: 0000000000000253 R09: 0000000000000000
R10: ffff953b56a17848 R11: 0000000000000000 R12: 0000000000004000
R13: ffff953ad201d280 R14: 0000000000004000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff953b56a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000151ab4000 CR4: 00000000000006f0
Call Trace:
 abd_borrow_buf+0x19/0x60 [zfs]
 abd_borrow_buf_copy+0x1a/0x50 [zfs]
 zio_crypt_copy_dnode_bonus+0x30/0x130 [zfs]
 arc_buf_untransform_in_place.isra.0+0x2b/0x40 [zfs]
 arc_buf_fill+0x1f0/0x4a0 [zfs]
 arc_untransform+0x22/0x90 [zfs]
 dbuf_read_verify_dnode_crypt+0xed/0x160 [zfs]
 ? atomic_cmpxchg+0x16/0x30 [zfs]
 dbuf_read_impl+0x3ea/0x610 [zfs]
 dbuf_read+0xcb/0x5f0 [zfs]
 ? arc_space_consume+0x54/0xe0 [zfs]
 ? do_raw_spin_unlock+0x9/0x10 [zfs]
 ? __raw_spin_unlock+0x9/0x10 [zfs]
 dmu_bonus_hold_by_dnode+0x92/0x190 [zfs]
 receive_object+0x442/0xae0 [zfs]
 ? __list_del_entry.isra.0+0x22/0x30 [zfs]
 ? atomic_dec+0xd/0x20 [spl]
 receive_process_record+0x170/0x1c0 [zfs]
 receive_writer_thread+0x9a/0x150 [zfs]
 ? receive_process_record+0x1c0/0x1c0 [zfs]
 thread_generic_wrapper+0x83/0xa0 [spl]
 kthread+0x104/0x140
 ? clear_bit+0x20/0x20 [spl]
 ? kthread_park+0x90/0x90
 ret_from_fork+0x22/0x40
Modules linked in: ip6table_filter ip6_tables xt_conntrack iptable_filter bpfilter zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) nls_iso8859_1 zlua(PO) eeepc_wmi asus_wmi sparse_keymap wmi_bmof video ccp radeon kvm r8169 realtek k10temp ttm i2c_piix4 drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt wmi sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 veth bridge 8021q garp mrp stp llc xt_tcpudp xt_owner xt_LOG nf_log_ipv6 nf_log_ipv4 nf_log_common drm ip_tables x_tables autofs4 btrfs libcrc32c xor zstd_compress raid6_pq hid_generic usbhid hid ahci libahci mac_hid
CR2: 0000000000000000
---[ end trace 374aa76997d6bc9b ]---
RIP: 0010:abd_verify+0xa/0x40 [zfs]
Code: ff 85 c0 74 12 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 5b 5d c3 e8 04 ff ff ff eb e7 c3 90 55 48 89 e5 41 54 53 48 89 fb <8b> 3f e8 0f ff ff ff 85 c0 75 22 44 8b 63 1c 48 8b 7b 20 4d 85 e4
RSP: 0018:ffffb797c555baa8 EFLAGS: 00010286
RAX: 0000000000004000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
RBP: ffffb797c555bab8 R08: 0000000000000253 R09: 0000000000000000
R10: ffff953b56a17848 R11: 0000000000000000 R12: 0000000000004000
R13: ffff953ad201d280 R14: 0000000000004000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff953b56a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000151ab4000 CR4: 00000000000006f0

The receiving side uses ZFS native encryption and had the key manually loaded before sending/receiving. The sending side is unencrypted. The recv hung after 611MiB out of the 990.4 MB delta.

Additional information:

sending side is a laptop running Bionic:

$ uname -a
Linux simon-lemur 5.4.0-52-generic #57~18.04.1-Ubuntu SMP Thu Oct 15 14:04:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l| grep zfs
ii libzfs2linux 0.7.5-1ubuntu16.10 amd64 OpenZFS filesystem library for Linux
ii zfsutils-linux 0.7.5-1ubuntu16.10 amd64 command-line tools to manage OpenZFS filesystems

receiving side is a small server running Focal:

$ uname -a
Linux ocelot 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l| grep zfs
ii libzfs2linux 0.8.3-1ubuntu12.4 amd64 OpenZFS filesystem library for Linux
ii zfs-zed 0.8.3-1ubuntu12.4 amd64 OpenZFS Event Daemon
ii zfsutils-linux 0.8.3-1ubuntu12.4 amd64 command-line tools to manage OpenZFS filesystems

Changed in zfs-linux (Ubuntu):
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Simon Déziel (sdeziel) wrote :

@Colin, if there's anything I can provide to you please let me know.

For what it's worth, rebooting the receiver side (Focal) allowed to resume the send to completion.

Revision history for this message
Colin Ian King (colin-king) wrote :

I think this may be a race condition, in which case duplicating this issue and testing a fix may be problematic.

I've created a potential fix and tested this against our internal regression tests, so it may be worth tying this to see if the issue occurs with the fix.

To try this out do the following:

sudo add-apt-repository ppa:colin-king/zfs-src-1900889
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install zfs-dkms

This will pull in the fixed version including the kernel ZFS driver - the DKMS ZFS driver may take several minutes to build. Once done, a reboot is required.

Revision history for this message
Simon Déziel (sdeziel) wrote :

Indeed, I have no way of reproducing reliably, unfortunately. I've installed the PPA your provided (thanks!):

# dmesg | grep ZFS
[ 18.284162] ZFS: Loaded module v0.8.3-1ubuntu12.5~lp1900889, ZFS pool version 5000, ZFS filesystem version 5

So far, no problem to report and I'll let you know how it goes over the next few days.

Revision history for this message
Simon Déziel (sdeziel) wrote :

FYI, still no problem with your package from the PPA. Thanks again

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks Simon. Lets let this soak test for a few more weeks and then I'll SRU this fix.

Revision history for this message
Simon Déziel (sdeziel) wrote :

@colin-king, it's been going well for the past month.

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks for reporting back. I'll get this into the stable release updates in a while.

Changed in zfs-linux (Ubuntu Groovy):
importance: Undecided → High
Changed in zfs-linux (Ubuntu Focal):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.8.4-1ubuntu15

---------------
zfs-linux (0.8.4-1ubuntu15) hirsute; urgency=medium

  * Fix null pointer dereference during zfs send (LP: #1900889)
    - 4701-enable-ARC-FILL-LOCKED-flag.patch

 -- Colin Ian King <email address hidden> Fri, 27 Nov 2020 12:22:22 +0000

Changed in zfs-linux (Ubuntu Hirsute):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu Focal):
status: New → Confirmed
Changed in zfs-linux (Ubuntu Groovy):
status: New → Confirmed
Revision history for this message
Christian Castelli (voodoo81people) wrote :
Download full text (6.0 KiB)

Got the same just a sec ago:

---
apr 14 09:45:02 thecastles kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
apr 14 09:45:02 thecastles kernel: #PF: supervisor read access in kernel mode
apr 14 09:45:02 thecastles kernel: #PF: error_code(0x0000) - not-present page
apr 14 09:45:02 thecastles kernel: PGD 0 P4D 0
apr 14 09:45:02 thecastles kernel: Oops: 0000 [#1] SMP NOPTI
apr 14 09:45:02 thecastles kernel: CPU: 1 PID: 24409 Comm: ThreadPoolForeg Tainted: G OE 5.8.0-48-generic #54~20.04.1-Ubuntu
apr 14 09:45:02 thecastles kernel: Hardware name: LENOVO 90J0000VIX/36EE, BIOS O3TKT50A 09/01/2020
apr 14 09:45:02 thecastles kernel: RIP: 0010:unlink_anon_vmas+0x22/0x1b0
apr 14 09:45:02 thecastles kernel: Code: 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 4c 8d 6f 78 41 54 53 48 83 ec 08 48 8b 47 78 48 89 7d d0 <48> 8b 30 49 39 c5 0f 84 5f 01 00 00 4c 8d 70 f0 4c 8d 66 f0 31 db
apr 14 09:45:02 thecastles kernel: RSP: 0018:ffffa29303e0bba8 EFLAGS: 00010292
apr 14 09:45:02 thecastles kernel: RAX: 0000000000000000 RBX: ffff8bd2ada57520 RCX: 0000000000000000
apr 14 09:45:02 thecastles kernel: RDX: 00007f6ec0231000 RSI: ffff8bd57bb8d3d8 RDI: ffff8bd2ada57520
apr 14 09:45:02 thecastles kernel: RBP: ffffa29303e0bbd8 R08: 00007f6ec0231000 R09: ffffffff9ca83800
apr 14 09:45:02 thecastles kernel: R10: ffff8bd3a2289b00 R11: 0000000000000001 R12: 00007f6eaf97d000
apr 14 09:45:02 thecastles kernel: R13: ffff8bd2ada57598 R14: 0000000000000000 R15: ffff8bd2ada56410
apr 14 09:45:02 thecastles kernel: FS: 00007f6ea609f700(0000) GS:ffff8bd57ee40000(0000) knlGS:0000000000000000
apr 14 09:45:02 thecastles kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
apr 14 09:45:02 thecastles kernel: CR2: 0000000000000000 CR3: 000000020075e000 CR4: 00000000003406e0
apr 14 09:45:02 thecastles kernel: Call Trace:
apr 14 09:45:02 thecastles kernel: free_pgtables+0x93/0xf0
apr 14 09:45:02 thecastles kernel: exit_mmap+0xc7/0x1b0
apr 14 09:45:02 thecastles kernel: mmput+0x5d/0x130
apr 14 09:45:02 thecastles kernel: begin_new_exec+0x431/0x9d0
apr 14 09:45:02 thecastles kernel: load_elf_binary+0x145/0xdc0
apr 14 09:45:02 thecastles kernel: ? ima_bprm_check+0x89/0xb0
apr 14 09:45:02 thecastles kernel: exec_binprm+0x134/0x430
apr 14 09:45:02 thecastles kernel: __do_execve_file.isra.0+0x50d/0x7d0
apr 14 09:45:02 thecastles kernel: __x64_sys_execve+0x39/0x50
apr 14 09:45:02 thecastles kernel: do_syscall_64+0x49/0xc0
apr 14 09:45:02 thecastles kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
apr 14 09:45:02 thecastles kernel: RIP: 0033:0x7f6ec03192fb
apr 14 09:45:02 thecastles kernel: Code: Unable to access opcode bytes at RIP 0x7f6ec03192d1.
apr 14 09:45:02 thecastles kernel: RSP: 002b:00007f6ea609dad8 EFLAGS: 00000212 ORIG_RAX: 000000000000003b
apr 14 09:45:02 thecastles kernel: RAX: ffffffffffffffda RBX: 000009d4dfa88f80 RCX: 00007f6ec03192fb
apr 14 09:45:02 thecastles kernel: RDX: 000009d4de902380 RSI: 000009d4dfa88f80 RDI: 00007f6ea609dae0
apr 14 09:45:02 thecastles kernel: RBP: 00007f6ea609dc20 R08: 0000000000000011 R09: 00007f6ea609da26
apr 14 09:45:02 thecastles kernel: R10: 0000000000000000 R11: 0000000000...

Read more...

Ronn (rong2604)
Changed in zfs-linux (Ubuntu Focal):
status: Confirmed → Incomplete
status: Incomplete → New
Changed in zfs-linux (Ubuntu Groovy):
status: Confirmed → New
Revision history for this message
Colin Ian King (colin-king) wrote :

@Christian, can you file you bug under a new bug report, the crash you are seeing is in a totally different part of the kernel compared to the initial bug report.

Changed in zfs-linux (Ubuntu Groovy):
status: New → Invalid
Changed in zfs-linux (Ubuntu Focal):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.