Problem Details: --------------- Helper scripts (create /zfs.img and mount it on /zfs; and setup kprobe events for debug) $ sudo ./zfs-mount.sh $ sudo ./zfs-kprobes.sh Print kprobe events to screen as we go: $ sudo cat /sys/kernel/debug/tracing/trace_pipe & Create file: - allocates normal/file znode (flag=0x0) - - its object number is obj=0x7 - - its znode pointer is zpp=0xffff8800a65f8000 $ touch /zfs/file <...>-20059 [000] d... 6718.949684: p_zfs_mknode_0: (zfs_mknode+0x0/0xe10 [zfs]) flag=0x0 dzp=0xffff8802115b0000 touch-20059 [000] d... 6718.949791: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x7 touch-20059 [000] d... 6718.949806: r_zfs_znode_alloc_0: (zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8800a65f8000 Set extended attribute on the file: - allocates xattr directory znode (flag=0x2) - - its parent znode is file znode (dzp=0xffff8800a65f8000) - - its object number is obj=0x8 - - its znode pointer is zpp=0xffff8802111a8000 - allocates xattr znode (flag=0x0, inherits xattr bit from parent node) - - its parent znode is xattr dir znode (dzp=0xffff8802111a8000) - - its object number is obj=0x9 - - its znode pointer is zpp=0xffff8802111a8448 $ setfattr -n user.debug -v 1 /zfs/file <...>-31701 [004] d... 6770.933127: p_zfs_mknode_0: (zfs_mknode+0x0/0xe10 [zfs]) flag=0x2 dzp=0xffff8800a65f8000 <...>-31701 [004] d... 6770.933287: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x8 <...>-31701 [004] d... 6770.933312: r_zfs_znode_alloc_0: (zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8802111a8000 <...>-31701 [004] d... 6770.933414: p_zfs_mknode_0: (zfs_mknode+0x0/0xe10 [zfs]) flag=0x0 dzp=0xffff8802111a8000 <...>-31701 [004] d... 6770.933436: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x9 setfattr-31701 [004] d... 6770.933441: r_zfs_znode_alloc_0: (zfs_mknode+0x8ae/0xe10 [zfs] <- zfs_znode_alloc) zpp=0xffff8802111a8448 Remove file: - Nothing more than zfs_zget() (i.e., "load to memory/get znode and inode for object number") on the file and xattr dir. - No node removal yet (zfs_rmnode), nor its descendent functions. $ rm /zfs/file <...>-5240 [000] d... 6796.826938: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x7 <...>-5240 [000] d... 6796.826967: r_zfs_zget_0: (zfs_dirent_lock+0x56c/0x6c0 [zfs] <- zfs_zget) rm-5240 [000] d... 6796.827023: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x8 rm-5240 [000] d... 6796.827030: r_zfs_zget_0: (zfs_remove+0x22b/0x4c0 [zfs] <- zfs_zget) When dropping caches (e.g., inode LRU list) - In one disposal list (i.e., call to dispose_list()) - Evict/Dispose the xattr node (obj 0x9) - This iput()s its parent node (obj 0x8, the xattr dir node) thus dropping its last reference (allows it to be evicted) with zfs_iput_async(). - In another disposal list, before ZFS's async iput() task runs. - Evict/Dispose the xattr dir node (obj 0x8) - This iput()s its parent node (obj 0x7, the file node) thus dropping its last reference (allows it to be evicted). - Then ZFS's async iput() task runs. - Evict/Dispose the file node (obj 0x7) - This triggers the node removal function, zfs_rmnode(). - This zfs_zget()s the xattr dir node (obj 0x8), bringing it back, note it gets another znode pointer value zpp=0xffff8802115e0000 and drops the reference to it with zfs_iput_async(), thus it's back again, and can/needs to be evicted/disposed again. $ echo 2 | sudo tee /proc/sys/vm/drop_caches ... tee-11196 [002] d... 6823.459967: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.459975: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802111a8660 tee-11196 [002] d... 6823.459980: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802111a8660 tee-11196 [002] d... 6823.459982: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802111a8448 obj=0x9 tee-11196 [002] d... 6823.459994: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8802111a8218 obj=0x8 tee-11196 [002] d... 6823.460178: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.460895: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.461876: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.463307: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.463412: p_dispose_list_0: (dispose_list+0x0/0x50) tee-11196 [002] d... 6823.463414: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802111a8218 tee-11196 [002] d... 6823.463415: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802111a8218 tee-11196 [002] d... 6823.463416: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802111a8000 obj=0x8 tee-11196 [002] d... 6823.463420: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8800a65f8218 obj=0x7 <...>-30411 [007] d... 6823.463530: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8800a65f8218 z_iput-30411 [007] d... 6823.463533: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8800a65f8218 z_iput-30411 [007] d... 6823.463535: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8800a65f8000 obj=0x7 z_iput-30411 [007] d... 6823.463540: p_zfs_rmnode_0: (zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff8800a65f8000 z_iput-30411 [007] d... 6823.463598: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x8 z_iput-30411 [007] d... 6823.463613: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x8 z_iput-30411 [007] d... 6823.463634: r_zfs_znode_alloc_0: (zfs_zget+0x1ae/0x230 [zfs] <- zfs_znode_alloc) zpp=0xffff8802115e0000 z_iput-30411 [007] d... 6823.463636: r_zfs_zget_0: (zfs_rmnode+0x249/0x350 [zfs] <- zfs_zget) z_iput-30411 [007] d... 6823.463714: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff8802115e0218 obj=0x8 When dropping the caches again, - In one disposal list - Evict/Dispose the xattr dir node (obj=0x8) - This triggers the node removal function zfs_rmnode(), and its descendent function zfs_purgedir() for xattr dir nodes. - zfs_purgedir() calls zfs_zget() on the child/xattr node (obj=0x9), bringing it to memory, note it has another znode pointer zpp=0xffff880234a58000 ). - In another disposal list - Evict/Dispose the (brought back) xattr node. $ echo 2 | sudo tee /proc/sys/vm/drop_caches ... tee-890 [001] d... 6921.482840: p_dispose_list_0: (dispose_list+0x0/0x50) tee-890 [001] dN.. 6921.482847: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff8802115e0218 tee-890 [001] d... 6921.483049: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0xffff8802115e0218 tee-890 [001] d... 6921.483140: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff8802115e0000 obj=0x8 tee-890 [001] d... 6921.483243: p_zfs_rmnode_0: (zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff8802115e0000 tee-890 [001] dN.. 6921.483255: p_zfs_purgedir_0: (zfs_purgedir+0x0/0x210 [zfs]) znode=0xffff8802115e0000 tee-890 [001] d... 6921.483491: p_zfs_zget_0: (zfs_zget+0x0/0x230 [zfs]) zsb=0xffff8802353a2000 obj=0x9 tee-890 [001] d... 6921.483595: p_zfs_znode_alloc_0: (zfs_znode_alloc+0x0/0x560 [zfs]) obj=0x9 tee-890 [001] d... 6921.483714: r_zfs_znode_alloc_0: (zfs_zget+0x1ae/0x230 [zfs] <- zfs_znode_alloc) zpp=0xffff880234a58000 tee-890 [001] d... 6921.484133: r_zfs_zget_0: (zfs_purgedir+0xb4/0x210 [zfs] <- zfs_zget) tee-890 [001] d... 6921.484394: p_zfs_iput_async_0: (zfs_iput_async+0x0/0x60 [zfs]) inode=0xffff880234a58218 obj=0x9 tee-890 [001] d... 6921.484521: r_zfs_purgedir_0: (zfs_rmnode+0x260/0x350 [zfs] <- zfs_purgedir) tee-890 [001] d... 6921.484973: p_dispose_list_0: (dispose_list+0x0/0x50) tee-890 [000] d... 6921.490662: p_dispose_list_0: (dispose_list+0x0/0x50) tee-890 [000] d... 6921.490734: p_dispose_list_0: (dispose_list+0x0/0x50) tee-890 [000] d... 6921.490791: p_dispose_list_0: (dispose_list+0x0/0x50) tee-890 [000] d... 6921.490794: p_zpl_evict_inode_0: (zpl_evict_inode+0x0/0x60 [zfs]) inode=0xffff880234a58218 tee-890 [000] d... 6921.490796: p_zfs_inactive_0: (zfs_inactive+0x0/0x270 [zfs]) inode=0xffff880234a58218 tee-890 [000] d... 6921.490798: p_zfs_zinactive_0: (zfs_zinactive+0x0/0xe0 [zfs]) znode=0xffff880234a58000 obj=0x9 tee-890 [000] d... 6921.490802: p_zfs_rmnode_0: (zfs_rmnode+0x0/0x350 [zfs]) znode=0xffff880234a58000 tee-890 [000] d... 6921.493071: p_dispose_list_0: (dispose_list+0x0/0x50) The problem would happen if, for some reason, the zfs_purgedir() call for the xattr dir node (obj=0x8) calls zfs_zget() on the xattr node (obj=0x9) while the latter has not yet been evicted/disposed (so that dmu_buf_get_user() still returns non-NULL) but is positioned later on this disposal list / marked for disposal (so that igrab() returns NULL due to inode.i_state). These two conditions create an infinite loop in zfs_zget(), which is a deadlock, because: 1) it would only finish if dmu_buf_get_user() returns NULL, which only occurs if the _xattr inode_ goes through the disposal path (evict() -> zpl_evict_inode() -> zfs_inactive() -> zfs_zinactive() -> zfs_znode_dmu_fini() -> sa_handle_destroy() -> dmu_buf_remove_user()) 2) and that is blocked waiting on the (looping) disposal of the _xattr dir inode_ (because the xattr inode is later in the disposal list), which is waiting on the disposal of the _xattr inode_.