Skip to content

Commit

Permalink
ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()
Browse files Browse the repository at this point in the history
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks
reflect all writes, i.e. when there are no dirty data blocks.  To ensure
this, if the target dnode is dirty, they wait for the open txg to be
synced, so we can call them "stabilizing operations".  If they cause
txg_wait_synced often, it can be detrimental to performance.

Typically, a group of files are all modified, and then SEEK_DATA/HOLE
are performed on them.  In this case, the first SEEK does a
txg_wait_synced(), and subsequent SEEKs don't need to wait, so
performance is good.

However, if a workload involves an interleaved metadata modification,
the subsequent SEEK may do a txg_wait_synced() unnecessarily.  For
example, if we do a `read()` syscall to each file before we do its SEEK.
This applies even with `relatime=on`, when the `read()` is the first
read after the last write.  The txg_wait_synced() is unnecessary because
the SEEK operations only care that the structure of the tree of indirect
and data blocks is up to date on disk.  They don't care about metadata
like the contents of the bonus or spill blocks.  (They also don't care
if an existing data block is modified, but this would be more involved
to filter out.)

This commit changes the behavior of SEEK_DATA/HOLE operations such that
they do not call txg_wait_synced() if there is only a pending change to
the bonus or spill block.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by:  Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes openzfs#13368 
Issue openzfs#14594 
Issue openzfs#14512 
Issue openzfs#14009
  • Loading branch information
ahrens authored and lundman committed Mar 17, 2023
1 parent 9e01527 commit 04de098
Showing 1 changed file with 16 additions and 5 deletions.
21 changes: 16 additions & 5 deletions module/zfs/dnode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1764,20 +1764,29 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots)
}

/*
* Checks if the dnode contains any uncommitted dirty records.
* Checks if the dnode might contain any uncommitted changes to data blocks.
* Dirty metadata (e.g. bonus buffer) does not count.
*/
boolean_t
dnode_is_dirty(dnode_t *dn)
{
mutex_enter(&dn->dn_mtx);

for (int i = 0; i < TXG_SIZE; i++) {
if (multilist_link_active(&dn->dn_dirty_link[i])) {
list_t *list = &dn->dn_dirty_records[i];
for (dbuf_dirty_record_t *dr = list_head(list);
dr != NULL; dr = list_next(list, dr)) {
if (dr->dr_dbuf == NULL ||
(dr->dr_dbuf->db_blkid != DMU_BONUS_BLKID &&
dr->dr_dbuf->db_blkid != DMU_SPILL_BLKID)) {
mutex_exit(&dn->dn_mtx);
return (B_TRUE);
}
}
if (dn->dn_free_ranges[i] != NULL) {
mutex_exit(&dn->dn_mtx);
return (B_TRUE);
}
}

mutex_exit(&dn->dn_mtx);

return (B_FALSE);
Expand Down Expand Up @@ -2640,7 +2649,9 @@ dnode_next_offset(dnode_t *dn, int flags, uint64_t *offset,
rw_enter(&dn->dn_struct_rwlock, RW_READER);

if (dn->dn_phys->dn_nlevels == 0) {
error = SET_ERROR(ESRCH);
if (!(flags & DNODE_FIND_HOLE)) {
error = SET_ERROR(ESRCH);
}
goto out;
}

Expand Down

0 comments on commit 04de098

Please sign in to comment.