Skip to content

Commit

Permalink
btrfs: Only check first key for committed tree blocks
Browse files Browse the repository at this point in the history
When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
kernel warning due to first key verification:

[ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.523830] Modules linked in:
[ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.527101] Call Trace:
[ 4239.527251]  read_tree_block+0x42/0x70
[ 4239.527434]  read_node_slot+0xd2/0x110
[ 4239.527632]  push_leaf_right+0xad/0x1b0
[ 4239.527809]  split_leaf+0x4ea/0x700
[ 4239.527988]  ? leaf_space_used+0xbc/0xe0
[ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
[ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
[ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
[ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
[ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
[ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
[ 4239.529445]  ? start_transaction+0xa8/0x4f0
[ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
[ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
[ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
[ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
[ 4239.532098]  generic_file_direct_write+0xcb/0x170
[ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
[ 4239.532491]  aio_write+0xe2/0x180
[ 4239.532669]  ? lock_acquire+0xac/0x1e0
[ 4239.532839]  ? __might_fault+0x3e/0x90
[ 4239.533032]  do_io_submit+0x594/0x860
[ 4239.533223]  ? do_io_submit+0x594/0x860
[ 4239.533398]  SyS_io_submit+0x10/0x20
[ 4239.533560]  ? SyS_io_submit+0x10/0x20
[ 4239.533729]  do_syscall_64+0x75/0x1d0
[ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 4239.534182] RIP: 0033:0x7f8519741697

The problem here is, at btree_read_extent_buffer_pages() we don't have
acquired read/write lock on that extent buffer, only basic info like
level/bytenr is reliable.

So race condition leads to such false alert.

However in current call site, it's impossible to acquire proper lock
without race window.
To fix the problem, we only verify first key for committed tree blocks
(whose generation is no larger than fs_info->last_trans_committed), so
the content of such tree blocks will not change and there is no need to
get read/write lock.

Reported-by: Nikolay Borisov <nborisov@suse.com>
Fixes: 581c176 ("btrfs: Validate child tree block's level and first key")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
  • Loading branch information
adam900710 authored and kdave committed Apr 13, 2018
1 parent 852eb3a commit 5d41be6
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions fs/btrfs/disk-io.c
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,14 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
if (!first_key)
return 0;

/*
* For live tree block (new tree blocks in current transaction),
* we need proper lock context to avoid race, which is impossible here.
* So we only checks tree blocks which is read from disk, whose
* generation <= fs_info->last_trans_committed.
*/
if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
return 0;
if (found_level)
btrfs_node_key_to_cpu(eb, &found_key, 0);
else
Expand Down

0 comments on commit 5d41be6

Please sign in to comment.