Skip to content

Commit ef07b74

Browse files
fdmananakdave
authored andcommitted
btrfs: fix race between logging inode and checking if it was logged before
There's a race between checking if an inode was logged before and logging an inode that can cause us to mark an inode as not logged just after it was logged by a concurrent task: 1) We have inode X which was not logged before neither in the current transaction not in past transaction since the inode was loaded into memory, so it's ->logged_trans value is 0; 2) We are at transaction N; 3) Task A calls inode_logged() against inode X, sees that ->logged_trans is 0 and there is a log tree and so it proceeds to search in the log tree for an inode item for inode X. It doesn't see any, but before it sets ->logged_trans to N - 1... 3) Task B calls btrfs_log_inode() against inode X, logs the inode and sets ->logged_trans to N; 4) Task A now sets ->logged_trans to N - 1; 5) At this point anyone calling inode_logged() gets 0 (inode not logged) since ->logged_trans is greater than 0 and less than N, but our inode was really logged. As a consequence operations like rename, unlink and link that happen afterwards in the current transaction end up not updating the log when they should. Fix this by ensuring inode_logged() only updates ->logged_trans in case the inode item is not found in the log tree if after tacking the inode's lock (spinlock struct btrfs_inode::lock) the ->logged_trans value is still zero, since the inode lock is what protects setting ->logged_trans at btrfs_log_inode(). Fixes: 0f8ce49 ("btrfs: avoid inode logging during rename and link when possible") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 5bb0087 commit ef07b74

File tree

1 file changed

+30
-6
lines changed

1 file changed

+30
-6
lines changed

fs/btrfs/tree-log.c

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3340,6 +3340,31 @@ int btrfs_free_log_root_tree(struct btrfs_trans_handle *trans,
33403340
return 0;
33413341
}
33423342

3343+
static bool mark_inode_as_not_logged(const struct btrfs_trans_handle *trans,
3344+
struct btrfs_inode *inode)
3345+
{
3346+
bool ret = false;
3347+
3348+
/*
3349+
* Do this only if ->logged_trans is still 0 to prevent races with
3350+
* concurrent logging as we may see the inode not logged when
3351+
* inode_logged() is called but it gets logged after inode_logged() did
3352+
* not find it in the log tree and we end up setting ->logged_trans to a
3353+
* value less than trans->transid after the concurrent logging task has
3354+
* set it to trans->transid. As a consequence, subsequent rename, unlink
3355+
* and link operations may end up not logging new names and removing old
3356+
* names from the log.
3357+
*/
3358+
spin_lock(&inode->lock);
3359+
if (inode->logged_trans == 0)
3360+
inode->logged_trans = trans->transid - 1;
3361+
else if (inode->logged_trans == trans->transid)
3362+
ret = true;
3363+
spin_unlock(&inode->lock);
3364+
3365+
return ret;
3366+
}
3367+
33433368
/*
33443369
* Check if an inode was logged in the current transaction. This correctly deals
33453370
* with the case where the inode was logged but has a logged_trans of 0, which
@@ -3374,10 +3399,8 @@ static int inode_logged(const struct btrfs_trans_handle *trans,
33743399
* transaction's ID, to avoid the search below in a future call in case
33753400
* a log tree gets created after this.
33763401
*/
3377-
if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &inode->root->state)) {
3378-
inode->logged_trans = trans->transid - 1;
3379-
return 0;
3380-
}
3402+
if (!test_bit(BTRFS_ROOT_HAS_LOG_TREE, &inode->root->state))
3403+
return mark_inode_as_not_logged(trans, inode);
33813404

33823405
/*
33833406
* We have a log tree and the inode's logged_trans is 0. We can't tell
@@ -3431,16 +3454,17 @@ static int inode_logged(const struct btrfs_trans_handle *trans,
34313454
* Set logged_trans to a value greater than 0 and less then the
34323455
* current transaction to avoid doing the search in future calls.
34333456
*/
3434-
inode->logged_trans = trans->transid - 1;
3435-
return 0;
3457+
return mark_inode_as_not_logged(trans, inode);
34363458
}
34373459

34383460
/*
34393461
* The inode was previously logged and then evicted, set logged_trans to
34403462
* the current transacion's ID, to avoid future tree searches as long as
34413463
* the inode is not evicted again.
34423464
*/
3465+
spin_lock(&inode->lock);
34433466
inode->logged_trans = trans->transid;
3467+
spin_unlock(&inode->lock);
34443468

34453469
/*
34463470
* If it's a directory, then we must set last_dir_index_offset to the

0 commit comments

Comments
 (0)