Skip to content

Commit e4d1263

Browse files
fdmananagregkh
authored andcommitted
btrfs: avoid load/store tearing races when checking if an inode was logged
[ Upstream commit 986bf6e ] At inode_logged() we do a couple lockless checks for ->logged_trans, and these are generally safe except the second one in case we get a load or store tearing due to a concurrent call updating ->logged_trans (either at btrfs_log_inode() or later at inode_logged()). In the first case it's safe to compare to the current transaction ID since once ->logged_trans is set the current transaction, we never set it to a lower value. In the second case, where we check if it's greater than zero, we are prone to load/store tearing races, since we can have a concurrent task updating to the current transaction ID with store tearing for example, instead of updating with a single 64 bits write, to update with two 32 bits writes or four 16 bits writes. In that case the reading side at inode_logged() could see a positive value that does not match the current transaction and then return a false negative. Fix this by doing the second check while holding the inode's spinlock, add some comments about it too. Also add the data_race() annotation to the first check to avoid any reports from KCSAN (or similar tools) and comment about it. Fixes: 0f8ce49 ("btrfs: avoid inode logging during rename and link when possible") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent e1ed129 commit e4d1263

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

fs/btrfs/tree-log.c

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3351,15 +3351,32 @@ static int inode_logged(const struct btrfs_trans_handle *trans,
33513351
struct btrfs_key key;
33523352
int ret;
33533353

3354-
if (inode->logged_trans == trans->transid)
3354+
/*
3355+
* Quick lockless call, since once ->logged_trans is set to the current
3356+
* transaction, we never set it to a lower value anywhere else.
3357+
*/
3358+
if (data_race(inode->logged_trans) == trans->transid)
33553359
return 1;
33563360

33573361
/*
3358-
* If logged_trans is not 0, then we know the inode logged was not logged
3359-
* in this transaction, so we can return false right away.
3362+
* If logged_trans is not 0 and not trans->transid, then we know the
3363+
* inode was not logged in this transaction, so we can return false
3364+
* right away. We take the lock to avoid a race caused by load/store
3365+
* tearing with a concurrent btrfs_log_inode() call or a concurrent task
3366+
* in this function further below - an update to trans->transid can be
3367+
* teared into two 32 bits updates for example, in which case we could
3368+
* see a positive value that is not trans->transid and assume the inode
3369+
* was not logged when it was.
33603370
*/
3361-
if (inode->logged_trans > 0)
3371+
spin_lock(&inode->lock);
3372+
if (inode->logged_trans == trans->transid) {
3373+
spin_unlock(&inode->lock);
3374+
return 1;
3375+
} else if (inode->logged_trans > 0) {
3376+
spin_unlock(&inode->lock);
33623377
return 0;
3378+
}
3379+
spin_unlock(&inode->lock);
33633380

33643381
/*
33653382
* If no log tree was created for this root in this transaction, then

0 commit comments

Comments
 (0)