Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for-next (regular) #1268

Open
wants to merge 10,000 commits into
base: ci
Choose a base branch
from
Open

Test for-next (regular) #1268

wants to merge 10,000 commits into from

Conversation

kdave
Copy link
Member

@kdave kdave commented May 16, 2024

No description provided.

@kdave kdave force-pushed the test-for-next branch 3 times, most recently from 1e76edd to 6d347e7 Compare May 24, 2024 21:18
@kdave kdave changed the title Test for-next rc0 Test for-next rc1 May 28, 2024
@kdave kdave changed the title Test for-next rc1 Test for-next (6.10-rc2) Jun 3, 2024
@kdave kdave changed the title Test for-next (6.10-rc2) Test for-next (6.10-rc3) Jun 13, 2024
@kdave kdave force-pushed the test-for-next branch 2 times, most recently from 68f1b4b to 4a4d10c Compare June 24, 2024 15:33
@kdave kdave changed the title Test for-next (6.10-rc3) Test for-next (6.10-rc5) Jun 24, 2024
@kdave kdave force-pushed the test-for-next branch 4 times, most recently from a817d1a to dde24c7 Compare July 2, 2024 14:58
@kdave kdave changed the title Test for-next (6.10-rc5) Test for-next (regular) Jul 18, 2024
@kdave kdave force-pushed the test-for-next branch 3 times, most recently from d6b8894 to fc58d4b Compare August 5, 2024 16:49
@kdave kdave force-pushed the test-for-next branch 3 times, most recently from 3c3dc5e to 7b9aa31 Compare August 15, 2024 18:34
@kdave kdave force-pushed the test-for-next branch 2 times, most recently from 7f78dd5 to 00782aa Compare August 29, 2024 16:46
@kdave kdave force-pushed the test-for-next branch 2 times, most recently from 5278b7a to b7b74b3 Compare September 4, 2024 21:31
torvalds and others added 5 commits December 1, 2024 13:10
…kernel/git/trace/linux-trace

Pull bprintf() removal from Steven Rostedt:

 - Remove unused bprintf() function, that was added with the rest of the
   "bin-printf" functions.

   These are functions that are used by trace_printk() that allows to
   quickly save the format and arguments into the ring buffer without
   the expensive processing of converting numbers to ASCII. Then on
   output, at a much later time, the ring buffer is read and the string
   processing occurs then. The bprintf() was added for consistency but
   was never used. It can be safely removed.

* tag 'trace-printf-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  printf: Remove unused 'bprintf'
…nux/kernel/git/wsa/linux

Pull i2c component probing support from Wolfram Sang:
 "Add OF component probing.

  Some devices are designed and manufactured with some components having
  multiple drop-in replacement options. These components are often
  connected to the mainboard via ribbon cables, having the same signals
  and pin assignments across all options. These may include the display
  panel and touchscreen on laptops and tablets, and the trackpad on
  laptops. Sometimes which component option is used in a particular
  device can be detected by some firmware provided identifier, other
  times that information is not available, and the kernel has to try to
  probe each device.

  Instead of a delicate dance between drivers and device tree quirks,
  this change introduces a simple I2C component probe function. For a
  given class of devices on the same I2C bus, it will go through all of
  them, doing a simple I2C read transfer and see which one of them
  responds. It will then enable the device that responds"

* tag 'i2c-for-6.13-rc1-part3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  MAINTAINERS: fix typo in I2C OF COMPONENT PROBER
  of: base: Document prefix argument for of_get_next_child_with_prefix()
  i2c: Fix whitespace style issue
  arm64: dts: mediatek: mt8173-elm-hana: Mark touchscreens and trackpads as fail
  platform/chrome: Introduce device tree hardware prober
  i2c: of-prober: Add GPIO support to simple helpers
  i2c: of-prober: Add simple helpers for regulator support
  i2c: Introduce OF component probe function
  of: base: Add for_each_child_of_node_with_prefix()
  of: dynamic: Add of_changeset_update_prop_string
The continual trickle of small conversion patches is grating on me, and
is really not helping.  Just get rid of the 'remove_new' member
function, which is just an alias for the plain 'remove', and had a
comment to that effect:

  /*
   * .remove_new() is a relic from a prototype conversion of .remove().
   * New drivers are supposed to implement .remove(). Once all drivers are
   * converted to not use .remove_new any more, it will be dropped.
   */

This was just a tree-wide 'sed' script that replaced '.remove_new' with
'.remove', with some care taken to turn a subsequent tab into two tabs
to make things line up.

I did do some minimal manual whitespace adjustment for places that used
spaces to line things up.

Then I just removed the old (sic) .remove_new member function, and this
is the end result.  No more unnecessary conversion noise.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up the existing export namespace code along the same lines of
commit 33def84 ("treewide: Convert macro and uses of __section(foo)
to __section("foo")") and for the same reason, it is not desired for the
namespace argument to be a macro expansion itself.

Scripted using

  git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file;
  do
    awk -i inplace '
      /^#define EXPORT_SYMBOL_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /^#define MODULE_IMPORT_NS/ {
        gsub(/__stringify\(ns\)/, "ns");
        print;
        next;
      }
      /MODULE_IMPORT_NS/ {
        $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g");
      }
      /EXPORT_SYMBOL_NS/ {
        if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) {
  	if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ &&
  	    $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ &&
  	    $0 !~ /^my/) {
  	  getline line;
  	  gsub(/[[:space:]]*\\$/, "");
  	  gsub(/[[:space:]]/, "", line);
  	  $0 = $0 " " line;
  	}

  	$0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/,
  		    "\\1(\\2, \"\\3\")", "g");
        }
      }
      { print }' $file;
  done

Requested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc
Acked-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
masahir0y and others added 23 commits December 3, 2024 08:22
Since commit cdd30eb ("module: Convert symbol namespace to string
literal"), when MODULE_IMPORT_NS() is missing, 'make nsdeps' inserts
pointless code:

    MODULE_IMPORT_NS("ns");

Here, "ns" is not a namespace, but the variable in the semantic patch.
It must not be quoted. Instead, a string literal must be passed to
Coccinelle.

Fixes: cdd30eb ("module: Convert symbol namespace to string literal")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts the misconversions introduced by commit cdd30eb
("module: Convert symbol namespace to string literal").

The affected descriptions refer to MODULE_IMPORT_NS() tags in general,
rather than suggesting the use of the empty string ("") as the
namespace.

Fixes: cdd30eb ("module: Convert symbol namespace to string literal")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit cdd30eb ("module: Convert symbol namespace to string
literal") only converted MODULE_IMPORT_NS() and EXPORT_SYMBOL_NS(),
leaving DEFAULT_SYMBOL_NAMESPACE as a macro expansion.

This commit converts DEFAULT_SYMBOL_NAMESPACE in the same way to avoid
annoyance for the default namespace as well.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
…/xfs-linux

Pull xfs fixes from Carlos Maiolino:

 - Use xchg() in xlog_cil_insert_pcp_aggregate()

 - Fix ABBA deadlock on a race between mount and log shutdown

 - Fix quota softlimit incoherency on delalloc

 - Fix sparse inode limits on runt AG

 - remove unknown compat feature checks in SB write valdation

 - Eliminate a lockdep false positive

* tag 'xfs-fixes-6.13-rc2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay
  xfs: Use xchg() in xlog_cil_insert_pcp_aggregate()
  xfs: prevent mount and log shutdown race
  xfs: delalloc and quota softlimit timers are incoherent
  xfs: fix sparse inode limits on runt AG
  xfs: remove unknown compat feature check in superblock write validation
  xfs: eliminate lockdep false positives in xfs_attr_shortform_list
…rnel/git/jack/linux-fs

Pull quota and udf fixes from Jan Kara:
 "Two small UDF fixes for better handling of corrupted filesystem and a
  quota fix to fix handling of filesystem freezing"

* tag 'fs_for_v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Verify inode link counts before performing rename
  udf: Skip parent dir link count update if corrupted
  quota: flush quota_release_work upon quota writeback
…rnel/git/kdave/linux

Pull btrfs fixes from David Sterba:

 - add lockdep annotations for io_uring/encoded read integration, inode
   lock is held when returning to userspace

 - properly reflect experimental config option to sysfs

 - handle NULL root in case the rescue mode accepts invalid/damaged tree
   roots (rescue=ibadroot)

 - regression fix of a deadlock between transaction and extent locks

 - fix pending bio accounting bug in encoded read ioctl

 - fix NOWAIT mode when checking references for NOCOW files

 - fix use-after-free in a rb-tree cleanup in ref-verify debugging tool

* tag 'for-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix lockdep warnings on io_uring encoded reads
  btrfs: ref-verify: fix use-after-free after invalid ref action
  btrfs: add a sanity check for btrfs root in btrfs_search_slot()
  btrfs: don't loop for nowait writes when checking for cross references
  btrfs: sysfs: advertise experimental features only if CONFIG_BTRFS_EXPERIMENTAL=y
  btrfs: fix deadlock between transaction commits and extent locks
  btrfs: fix use-after-free in btrfs_encoded_read_endio()
Simplify the I/O completion path for encoded reads by using a
completion instead of a wait_queue.

Furthermore use refcount_t instead of atomic_t for reference counting the
private data.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We have been using the following check

   if (generation <= root->root_key.offset)

to make decisions about whether or not to visit a node during snapshot
delete.  This is because for normal subvolumes this is set to 0, and for
snapshots it's set to the creation generation.  The idea being that if
the generation of the node is less than or equal to our creation
generation then we don't need to visit that node, because it doesn't
belong to us, we can simply drop our reference and move on.

However reloc roots don't have their generation stored in
root->root_key.offset, instead that is the objectid of their
corresponding fs root.  This means we can incorrectly not walk into
nodes that need to be dropped when deleting a reloc root.

There are a variety of consequences to making the wrong choice in two
distinct areas.

visit_node_for_delete()

1. False positive.  We think we are newer than the block when we really
   aren't.  We don't visit the node and drop our reference to the node
   and carry on.  This would result in leaked space.
2. False negative.  We do decide to walk down into a block that we
   should have just dropped our reference to.  However this means that
   the child node will have refs > 1, so we will switch to
   UPDATE_BACKREF, and then the subsequent walk_down_proc() will notice
   that btrfs_header_owner(node) != root->root_key.objectid and it'll
   break out of the loop, and then walk_up_proc() will drop our reference,
   so this appears to be ok.

do_walk_down()

1. False positive.  We are in UPDATE_BACKREF and incorrectly decide that
   we are done and don't need to update the backref for our lower nodes.
   This is another case that simply won't happen with relocation, as we
   only have to do UPDATE_BACKREF if the node below us was shared and
   didn't have FULL_BACKREF set, and since we don't own that node
   because we're a reloc root we actually won't end up in this case.
2. False negative.  Again this is tricky because as described above, we
   simply wouldn't be here from relocation, because we don't own any of
   the nodes because we never set btrfs_header_owner() to the reloc root
   objectid, and we always use FULL_BACKREF, we never actually need to
   set FULL_BACKREF on any children.

Having spent a lot of time stressing relocation/snapshot delete recently
I've not seen this pop in practice.  But this is objectively incorrect,
so fix this to get the correct starting generation based on the root
we're dropping to keep me from thinking there's a problem here.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This helper is how we select the delayed ref to run once we've selected
the delayed ref head.  I need this exported to add a unit test for
delayed refs, and it's more natural home is in delayed-ref.c.  Rename it
to btrfs_select_delayed_ref and move it into delayed-ref.c.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The recent fix for a stupid mistake I made uncovered the fact that we
don't have adequate testing in the delayed refs code, as it took a
pretty extensive and long running stress test to uncover something that
a unit test would have uncovered right away.

Fix this by adding a delayed refs self test suite.  This will validate
that the btrfs_ref transformation does the correct thing, that we do the
correct thing when merging delayed refs, and that we get the delayed
refs in the order that we expect.  These are all crucial to how the
delayed refs operate.

I introduced various bugs (including the original bug) into the delayed
refs code to validate that these tests caught all of the shenanigans
that I could think of.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Use kmemdup() in btrfs_uring_encoded_read() rather than kmalloc() followed by
memcpy().

Link: https://lore.kernel.org/oe-kbuild-all/202411050846.GI8oh5IK-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Mark Harmstone <maharmstone@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_get_extent() will only return an PTR_ERR() or a valid
extent map pointer. It will not return NULL.

Thus the usage of PTR_ERR_OR_ZERO() inside submit_one_sector() is not
needed, use plain PTR_ERR() instead, and that is the only usage of
PTR_ERR_OR_ZERO() after btrfs_get_extent().

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The variable ret is being initialized to zero and also later re-assigned
to zero. In both cases the assignment is redundant since the value is
never read after the assignment and hence they can be removed.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
The following reproducer can cause btrfs mount to fail:

  dev="/dev/test/scratch1"
  mnt1="/mnt/test"
  mnt2="/mnt/scratch"

  mkfs.btrfs -f $dev
  mount $dev $mnt1
  btrfs subvolume create $mnt1/subvol1
  btrfs subvolume create $mnt1/subvol2
  umount $mnt1

  mount $dev $mnt1 -o subvol=subvol1
  while mount -o remount,ro $mnt1; do mount -o remount,rw $mnt1; done &
  bg=$!

  while mount $dev $mnt2 -o subvol=subvol2; do umount $mnt2; done

  kill $bg
  wait
  umount -R $mnt1
  umount -R $mnt2

The script will fail with the following error:

  mount: /mnt/scratch: /dev/mapper/test-scratch1 already mounted on /mnt/test.
        dmesg(1) may have more information after failed mount system call.
  umount: /mnt/test: target is busy.
  umount: /mnt/scratch/: not mounted

And there is no kernel error message.

[CAUSE]
During the btrfs mount, to support mounting different subvolumes with
different RO/RW flags, we need to detect that and retry if needed:

  Retry with matching RO flags if the initial mount fail with -EBUSY.

The problem is, during that retry we do not hold any super block lock
(s_umount), this means there can be a remount process changing the RO
flags of the original fs super block.

If so, we can have an EBUSY error during retry.  And this time we treat
any failure as an error, without any retry and cause the above EBUSY
mount failure.

[FIX]
The current retry behavior is racy because we do not have a super block
thus no way to hold s_umount to prevent the race with remount.

Solve the root problem by allowing fc->sb_flags to mismatch from the
sb->s_flags at btrfs_get_tree_super().

Then at the re-entry point btrfs_get_tree_subvol(), manually check the
fc->s_flags against sb->s_flags, if it's a RO->RW mismatch, then
reconfigure with s_umount lock hold.

Reported-by: Enno Gotthold <egotthold@suse.com>
Reported-by: Fabian Vogt <fvogt@suse.com>
[ Special thanks for the reproducer and early analysis pointing to btrfs. ]
Fixes: f044b31 ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Link: https://bugzilla.suse.com/show_bug.cgi?id=1231836
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 1460540 ("btrfs: initial fsverity support") introduced
fs-verity support for btrfs, but didn't add support for
FS_IOC_READ_VERITY_METADATA to directly query the Merkle tree,
descriptor and signature blocks for fs-verity enabled files.

Add the (trival) implementation: we just need to wire it through to the
fs-verity code, the same way as is done in the other two filesystems
which support this ioctl (ext4, f2fs). The fs-verity code already has
access to the required data.

This is also safe to backport to older stable trees (5.15+) if needed.

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Factor out a part of unpin_extent_range() that returns space back to the
space info, prioritizing global block reserve.  Also, move the "len"
variable into the loop to clarify we don't need to carry it beyond an
iteration.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Since commit e1e577a ("btrfs: store fs_info in space_info"), we have
the fs_info in a space_info. So, we can drop fs_info argument from
btrfs_update_space_info_*. There is no behavior change.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
On the zoned mode, once used and freed region is still not reusable after the
freeing. The underlying zone needs to be reset before reusing. Btrfs resets a
zone when it removes a block group, and then new block group is allocated on
the zones to reuse the zones. But, it is sometime too late to catch up with a
write side.

This commit introduces a new space-info reclaim method ZONE_RESET. That will
pick a block group from the unused list and reset its zone to reuse the
zone_unusable space. It is faster than removing the block group and re-creating
a new block group on the same zones.

For the first implementation, the ZONE_RESET is only applied to a block group
whose region is fully zone_unusable. Reclaiming partial zone_unusable block
group could be implemented later.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
… activation

When activating a swap file we acquire the root's snapshot drew lock and
then check if the root is dead, failing and returning with -EPERM if it's
dead but without unlocking the root's snapshot lock. Fix this by adding
the missing unlock.

Fixes: 60021bd ("btrfs: prevent subvol with swapfile from being deleted")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_drop_extents() calls BUG_ON() in case the counter of to be deleted
extents is greater than 0. But all of these code paths can handle errors,
so there's no need to crash the kernel. Instead WARN() that the condition
has been met and gracefully bail out.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
…drop_extents()

A data race occurs when the function `insert_ordered_extent_file_extent()`
and the function `btrfs_inode_safe_disk_i_size_write()` are executed
concurrently. The function `insert_ordered_extent_file_extent()` is not
locked when reading inode->disk_i_size, causing
`btrfs_inode_safe_disk_i_size_write()` to cause data competition when
writing inode->disk_i_size, thus affecting the value of `modify_tree`.

The specific call stack that appears during testing is as follows:
============DATA_RACE============
 btrfs_drop_extents+0x89a/0xa060 [btrfs]
 insert_reserved_file_extent+0xb54/0x2960 [btrfs]
 insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs]
 btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs]
 btrfs_finish_ordered_io+0x37/0x60 [btrfs]
 finish_ordered_fn+0x3e/0x50 [btrfs]
 btrfs_work_helper+0x9c9/0x27a0 [btrfs]
 process_scheduled_works+0x716/0xf10
 worker_thread+0xb6a/0x1190
 kthread+0x292/0x330
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
============OTHER_INFO============
 btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs]
 btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs]
 btrfs_finish_ordered_io+0x37/0x60 [btrfs]
 finish_ordered_fn+0x3e/0x50 [btrfs]
 btrfs_work_helper+0x9c9/0x27a0 [btrfs]
 process_scheduled_works+0x716/0xf10
 worker_thread+0xb6a/0x1190
 kthread+0x292/0x330
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
=================================

The main purpose of the check of the inode's disk_i_size is to avoid
taking write locks on a btree path when we have a write at or beyond
eof, since in these cases we don't expect to find extent items in the
root to drop. However if we end up taking write locks due to a data
race on disk_i_size, everything is still correct, we only add extra
lock contention on the tree in case there's concurrency from other tasks.
If the race causes us to not take write locks when we actually need them,
then everything is functionally correct as well, since if we find out we
have extent items to drop and we took read locks (modify_tree set to 0),
we release the path and retry again with write locks.

Since this data race does not affect the correctness of the function,
it is a harmless data race, use data_race() to check inode->disk_i_size.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Otherwise it won't catch bios turned into regular writes by the block
level zone write plugging. The additional test it adds is for emulated
zone append.

Fixes: 9b1ce7f ("block: Implement zone append emulation")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Btrfs like other file systems can't really deal with I/O not aligned to
it's internal block size (which strangely is called sector size in
btrfs, for historical reasons), but the block layer split helper doesn't
even know about that.

Round down the split boundary so that all I/Os are aligned.

Fixes: d5e4377 ("btrfs: split zone append bios in btrfs_submit_bio")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.