Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Break out of zfs_zget early if unlinked znode #9583

Merged
merged 1 commit into from
Nov 15, 2019
Merged

[RFC] Break out of zfs_zget early if unlinked znode #9583

merged 1 commit into from
Nov 15, 2019

Conversation

hrasiq
Copy link
Contributor

@hrasiq hrasiq commented Nov 13, 2019

This patch is an RFC for a specific issue we've had Ubuntu users report
lately. From our testing, it looks like a valid fix but I'm unsure if it's the
most appropriate one. I would really appreciate suggestions on how to improve it
or any feedback about how to do it in a more fitting manner.
Thanks!

Motivation and Context

We've seen users hitting hangs and blocked tasks in Ubuntu, seemingly due to an
issue in ZFS commit. Usually, we have stack traces like below in the kernel
logs:

[72742.051703] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[72742.070429] mysqld          D    0  5713   2881 0x00000320
[72742.073220] Call Trace:
[72742.075305]  __schedule+0x24e/0x880
[72742.090436]  schedule+0x2c/0x80
[72742.090438]  schedule_preempt_disabled+0xe/0x10
[72742.090441]  __mutex_lock.isra.5+0x276/0x4e0
[72742.090547]  ? dmu_tx_destroy+0x105/0x130 [zfs]
[72742.090555]  __mutex_lock_slowpath+0x13/0x20
[72742.115374]  ? __mutex_lock_slowpath+0x13/0x20
[72742.132266]  mutex_lock+0x2f/0x40
[72742.134207]  zil_commit_impl+0x1b0/0x1b30 [zfs]
[72742.150428]  ? spl_kmem_alloc+0x115/0x180 [spl]
[72742.152622]  ? mutex_lock+0x12/0x40
[72742.154819]  ? zfs_refcount_add_many+0x9a/0x100 [zfs]
[72742.171450]  zil_commit+0xde/0x150 [zfs]
[72742.173687]  zfs_fsync+0x77/0xe0 [zfs]
[72742.175044]  zpl_fsync+0x80/0x110 [zfs]
[72742.191690]  vfs_fsync_range+0x51/0xb0
[72742.193876]  do_fsync+0x3d/0x70
[72742.195126]  SyS_fsync+0x10/0x20
[72742.211059]  do_syscall_64+0x73/0x130
[72742.214078]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2

The above was collected from an Ubuntu Bionic system running ZFSonLinux 0.8.1
and a mysql workload. After doing some investigation with the help of crash
dumps (excerpts below), we believe this might be a race between the
z_iput/evict thread and the ZFS writeback thread.

Crash dump analysis

The kworker thread that's currently doing the commit/writeback doesn't seem to
be stalled, but actively working on the writeback up until the hung_task_timeout
NMI:

crash> bt -s 23652
PID: 23652 TASK: ffffa04c304fc080 CPU: 5 COMMAND: "kworker/u48:2"
...
--- <NMI exception stack> ---
#5 kfree+0xd0 at ffffffff9a04ec60
#6 spl_kmem_free+0x33 at ffffffffc073db03 [spl]
#7 dbuf_hold_impl+0xaa at ffffffffc0cf43ca [zfs]
#8 dbuf_hold+0x33 at ffffffffc0cf4473 [zfs]
#9 dnode_hold_impl+0x163 at ffffffffc0d21913 [zfs]
#10 dnode_hold+0x1b at ffffffffc0d227fb [zfs]
#11 dmu_bonus_hold+0x35 at ffffffffc0d003e5 [zfs]
#12 sa_buf_hold+0xe at ffffffffc0d6e2ee [zfs]
#13 zfs_zget+0x108 at ffffffffc0e10188 [zfs]
#14 zfs_get_data+0x7e at ffffffffc0e0a30e [zfs]
#15 zil_commit_impl+0x135a at ffffffffc0e18c4a [zfs]
#16 zil_commit+0xde at ffffffffc0e194fe [zfs]
#17 zpl_writepages+0xd5 at ffffffffc0e325a5 [zfs]
#18 do_writepages+0x4b at ffffffff99fe888b
#19 __writeback_single_inode+0x45 at ffffffff9a0b3d35
#20 writeback_sb_inodes+0x1e1 at ffffffff9a0b44e1
#21 wb_writeback+0x107 at ffffffff9a0b4aa7
#22 wb_workfn+0xb3 at ffffffff9a0b52e3
#23 process_one_work+0x1de at ffffffff99eab95e
#24 worker_thread+0x32 at ffffffff99eabbd2
#25 kthread+0x121 at ffffffff99eb25d1
#26 ret_from_fork+0x35 at ffffffff9a800205 

It looks like the writeback thread is looping in this section of zfs_zget():

/*
* If igrab() returns NULL the VFS has independently
* determined the inode should be evicted and has
* called iput_final() to start the eviction process.
* The SA handle is still valid but because the VFS
* requires that the eviction succeed we must drop
* our locks and references to allow the eviction to
* complete. The zfs_zget() may then be retried.
*
* This unlikely case could be optimized by registering
* a sops->drop_inode() callback. The callback would
* need to detect the active SA hold thereby informing
* the VFS that this inode should not be evicted.
*/ 
if (igrab(ZTOI(zp)) == NULL) {
    mutex_exit(&zp->z_lock);
    sa_buf_rele(db, NULL);
    zfs_znode_hold_exit(zfsvfs, zh);
    /* inode might need this to finish evict */
    cond_resched();
    goto again;
}

I've fished around in the kworker's stack to see if we could dig out the inode
that would cause igrab() to return NULL:

crash> bt -sFF 23652
...
[ffffa04a99671d38:dmu_buf_impl_t]
crash> * dmu_buf_impl_t.db_user ffffa04a99671d38
db_user = 0xffffa04af3fd7900
crash> * sa_handle_t.sa_userp 0xffffa04af3fd7900
sa_userp = 0xffffa04acf9fd0c0
crash> * -o znode_t.z_inode 0xffffa04acf9fd0c0
typedef struct znode {
[ffffa04acf9fd2a8] struct inode z_inode;
} znode_t;
crash> * inode.i_state ffffa04acf9fd2a8
i_state = 0xa7
crash> eval -b 0xa7
...
bits set: 7 5 2 1 0

So, the inode lies in 0xffffa04acf9fd2a8 and its state is set to 0xa7. In include/linux/fs.h:

#define I_DIRTY_SYNC (1 << 0)
#define I_DIRTY_DATASYNC (1 << 1)
#define I_DIRTY_PAGES (1 << 2)
#define I_FREEING (1 << 5)
#define __I_SYNC 7
#define I_SYNC (1 << __I_SYNC)

The important bit here is I_FREEING. This looks to be the case mentioned by
the comment above the igrab() call, as igrab() will return NULL if the inode
is currently being evicted (i.e. I_FREEING is set). Furthermore, i_sb_list
has been deleted from this inode, but i_hash still has some entries:

crash> * inode.i_sb_list ffffa04acf9fd2a8
i_sb_list = {
next = 0xffffa04acf9fd3c8,
prev = 0xffffa04acf9fd3c8
}
crash> * inode.i_hash ffffa04acf9fd2a8
i_hash = {
next = 0xffffa049340bc5a0,
pprev = 0xffffb534c1787148
}

Looking at the code in evict(), this suggests that this inode is currently sitting in
inode_wait_for_writeback(). If we check the z_iput task:

crash> bt -s 1033
PID: 1033 TASK: ffffa04c2ed95600 CPU: 1 COMMAND: "z_iput"
#0 __schedule+0x24e at ffffffff9a7b6e0e
#1 schedule+0x2c at ffffffff9a7b746c
#2 bit_wait+0x11 at ffffffff9a7b7e31
#3 __wait_on_bit+0x4c at ffffffff9a7b78dc
#4 __inode_wait_for_writeback+0xb9 at ffffffff9a0b04b9
#5 inode_wait_for_writeback+0x26 at ffffffff9a0b5216
#6 evict+0xb5 at ffffffff9a0a0dd5
#7 iput+0x19c at ffffffff9a0a123c
#8 taskq_thread+0x2e5 at ffffffffc0745305 [spl]
#9 kthread+0x121 at ffffffff99eb25d1
#10 ret_from_fork+0x35 at ffffffff9a800205

That looks suspiciously close to the evict() stack we should be seeing. It also
looks like z_iput is trying to evict our specific inode:

crash> bt -sFF 1033 | grep ffffa04acf9fd2a8
ffffb534c711fcc8: [ffffa04acf9fd2a8:dmaengine-unmap-128]
ffffb534c711fd48: [ffffa04acf9fd2a8:dmaengine-unmap-128]
ffffb534c711fd70: [ffffa04acf9fd2a8:dmaengine-unmap-128]
ffffb534c711fd90: [ffffa04acf9fd2a8:dmaengine-unmap-128]

It seems that the z_iput thread is trying to evict an inode, but is currently
waiting until the inode has gone through the writeback thread. The writeback
thread on the other hand seems to be locked waiting for the inode eviction to
complete (due to the igrab() loop).

Description

We've noticed that these issues showed up on systems with slow mechanical
storage, and no SSD or caching devices. We've also noticed that the znodes
responsible for these hangs always had z_unlinked set, indicating that we were
trying to commit files that have since been deleted.

In these cases we've seen, zfs_get_data() blocks on zfs_zget() and can't
check if the znode has been marked for deletion (it would return ENOENT in
these cases and the writeback thread would continue). This patch tries to detect
if the znode has been marked for deletion inside zfs_zget(), and breaks out
early with ENOENT if that's the case.

How Has This Been Tested?

We've been testing this patch in some Ubuntu environments with the Bionic 4.15
kernel and varied mysql workloads, and results look good so far. We haven't seen
the lockups anymore, and no major breakage of other ZFS functionality has been
observed. This patch has also been tested with the ZFS test suite, and no
obvious regressions have been noticed.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@PrivatePuffin
Copy link
Contributor

Can we please have more of these awesomely documented PR's? 👍

@@ -1111,6 +1111,10 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
mutex_exit(&zp->z_lock);
sa_buf_rele(db, NULL);
zfs_znode_hold_exit(zfsvfs, zh);
/* if znode is already marked for deletion, break out early */
Copy link
Contributor

@PrivatePuffin PrivatePuffin Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line throws a style tantrum, might want to rephrase or split
(basically it's too long, over 80 because it seems to count the tabs as multiple characters)

Copy link
Contributor Author

@hrasiq hrasiq Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks for the notice! I didn't realize tabs count as 8 spaces, but it should be addressed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know either, maybe rewrite the test a bit someday... Because this should've fitted fine.
So not really your fault either :)

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Nov 13, 2019
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analysis and proposed fix are spot on, thank you for so clearly documenting the issue and getting to the root cause!

/* if znode is already marked for deletion, break out early */
if (zp->z_unlinked) {
return (SET_ERROR(ENOENT));
}
Copy link
Contributor

@behlendorf behlendorf Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good, I'd just suggest we restructure this block a little bit. What do you think of this (untested) version, it has a couple small advantages:

  • This allows us to skip the igrab() entirely when zp->z_unlinked is set.
  • There's a single mutex_exit() / sa_buf_rele() / ZFS_OBJ_HOLD_EXIT() path.
  • It aligns this logic a little more closely with the FreeBSD and illumos versions of this function (if you squint). All of the platforms need this same unlinked check, but the VFS integration specifics are different on each platform.
                mutex_enter(&zp->z_lock);
                ASSERT3U(zp->z_id, ==, obj_num);
                /*
                 * If zp->z_unlinked is set then the znode is marked
                 * for deletion and should not be discoverable.
                 *
                 * If igrab() returns NULL the VFS has independently
                 * determined the inode should be evicted and has
                 * called iput_final() to start the eviction process.
                 * The SA handle is still valid but because the VFS
                 * requires that the eviction succeed we must drop
                 * our locks and references to allow the eviction to
                 * complete.  The zfs_zget() may then be retried.
                 */
                if (zp->z_unlinked) {
                        err = SET_ERROR(ENOENT);
                } else if (igrab(ZTOI(zp)) == NULL) {
                        err = SET_ERROR(EAGAIN);
                        cond_resched();
                } else {
                        *zpp = zp;
                        err = 0;
                }

                mutex_exit(&zp->z_lock);
                sa_buf_rele(db, NULL);
                zfs_znode_hold_exit(zfsvfs, zh);

                if (err == EAGAIN)
                        goto again;

                return (err);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good to me too, but I like the revised version better. I suggest we go with that and add an original-patch-by line to the commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should have the if (zp->z_unlinked) err = ENOENT; code, like illumos. @behlendorf's way of doing that seems cleanest to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I think we should go with this version as well, it looks much cleaner and has the mentioned advantages over the original patch. Apologies for the initial "naive" fix, I'm still finding my way around the ZFS code and didn't stop to think and look at the illumos code...

I can respin the PR to be more aligned with this one, or we can go with @ryao's original-patch-by suggestion. Either one is fine by me, so please let me know how you would like to proceed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrasiq

Great. We're regression testing this, and will double-check with the user/reporter specific workload that it's all good. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hrasiq How about: Who ever gets it in for review first?
It's not worth the time debating who is sending it in.

And don't worry about it, it's not a naive fix you did... People having feedback doesn't mean your code is bad or naive ;)

@mfoliveira What are you regression testing? The proposed changes in this Review or the original fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ornias1993 The proposed changes (reported to be untested). The original fix has been regression tested for the submission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ornias1993 In that case, allow me to re-submit this one :)

@behlendorf behlendorf requested a review from ahrens November 13, 2019 21:37
@ahrens
Copy link
Member

ahrens commented Nov 14, 2019

@hrasiq I see that you know your way around a crash dump. You may be interested in sdb, a new debugger for the linux kernel, which has several extensions for debugging ZFS.
slides video repo

@codecov
Copy link

codecov bot commented Nov 14, 2019

Codecov Report

Merging #9583 into master will increase coverage by <.01%.
The diff coverage is 62.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9583      +/-   ##
==========================================
+ Coverage   79.23%   79.23%   +<.01%     
==========================================
  Files         419      418       -1     
  Lines      123696   123686      -10     
==========================================
- Hits        98014    98008       -6     
+ Misses      25682    25678       -4
Flag Coverage Δ
#kernel 79.75% <62.5%> (-0.04%) ⬇️
#user 67.2% <ø> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 64c77c4...1d89e2e. Read the comment docs.

If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
@hrasiq
Copy link
Contributor Author

hrasiq commented Nov 14, 2019

I've integrated the changes suggested by @behlendorf, but moved the cond_resched(); to after we drop the locks and references. Please let me know what you think of this version, and if it needs any further changes or tweaking.
Thanks again for the feedback!

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadfl thanks for updating this, the refreshed version looks good to me.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Nov 15, 2019
@behlendorf behlendorf merged commit 41e1aa2 into openzfs:master Nov 15, 2019
@behlendorf
Copy link
Contributor

@mfoliveira @hrasiq merged, thanks again for running this issue down and verifying the fix! We'll make sure to get this backported for the next point release.

@mfoliveira
Copy link
Contributor

@behlendorf

It looks like igrab() must be checked before z_unlinked otherwise fsetxattr() on O_TMPFILE is unhappy (test failure on tmpfile_001_pos).
I just submitted PR 9602 for that.

Sorry for the delay in getting back to this; had a holiday/travel/conference.

@PrivatePuffin
Copy link
Contributor

@mfoliveira Don't be sorry for taking just a few days to get a PR into an opensource project... Take care! :)

behlendorf pushed a commit that referenced this pull request Nov 21, 2019
The changes in commit 41e1aa2 / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9602
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes openzfs#9583
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 26, 2019
The changes in commit 41e1aa2 / PR openzfs#9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR openzfs#9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes openzfs#9602
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes openzfs#9583
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Dec 27, 2019
The changes in commit 41e1aa2 / PR openzfs#9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR openzfs#9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes openzfs#9602
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes #9583
tonyhutter pushed a commit that referenced this pull request Jan 23, 2020
The changes in commit 41e1aa2 / PR #9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR #9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes #9602
sdimitro pushed a commit to sdimitro/zfs that referenced this pull request Jan 31, 2020
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes openzfs#9583
sdimitro pushed a commit to sdimitro/zfs that referenced this pull request Jan 31, 2020
The changes in commit 41e1aa2 / PR openzfs#9583 introduced a regression on
tmpfile_001_pos: fsetxattr() on a O_TMPFILE file descriptor started
to fail with errno ENODATA:

    openat(AT_FDCWD, "/test", O_RDWR|O_TMPFILE, 0666) = 3
    <...>
    fsetxattr(3, "user.test", <...>, 64, 0) = -1 ENODATA

The originally proposed change on PR openzfs#9583 is not susceptible to it,
so just move the code/if-checks around back in that way, to fix it.

Reviewed-by: Pavel Snajdr <snajpa@snajpa.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Original-patch-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Closes openzfs#9602
allanjude pushed a commit to KlaraSystems/zfs that referenced this pull request Apr 28, 2020
If zp->z_unlinked is set, we're working with a znode that has been
marked for deletion. If that's the case, we can skip the "goto again"
loop and return ENOENT, as the znode should not be discovered.

Reviewed-by: Richard Yao <ryao@gentoo.org>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Closes openzfs#9583
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants