Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bucket garbage collection does not work correctly with MVCC enabled (and maybe even w/o it) #314

Closed
CuriousGeorgiy opened this issue Jan 17, 2022 · 0 comments
Assignees
Labels
bug Something isn't working teamS Scaling

Comments

@CuriousGeorgiy
Copy link
Member

Description

Bucket garbage collection assumes the fact that each increment to M.bucket_generation in the on_replace trigger bucket_generation_increment for box.space._bucket corresponds to a tuple with consts.BUCKET.GARBAGE or consts.BUCKET.SENT status that will be seen when querying box.space._bucket in the bucket garbage collection loop: this assumption does not hold with MVCC enabled — the transaction manager can show versions of tuples before the replace. It may also not hold even by default (with MVCC disabled): the replace which changed the bucket status could not yet get committed to the WAL at the time the bucket garbage collection fiber queries box.space._bucket — this hypothesis was not tested and requires further investigation.

Reproducers

Behaviour

The following console output (with custom log messages inserted for clarity) aims to demonstrate the system's behavior with MVCC enabled and the reproducer above:

�sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> in rebalancer before last bucket replace
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> before txn_commit_stmt
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> in bucket generation_increment
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> after txn_commit_stmt
sA-1 | 2022-01-14 15:28:24.001 [84516] main/138/vshard.gc I> in gc last bucket=--- [300, 'sending', 'bbbbbbbb-0000-0000-0000-000000000000']
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> in rebalancer after last bucket replace
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> last_tuple=300
sA-1 | 2022-01-14 15:28:24.001 [84516] main/161/vshard.rebalancer_worker_1 I> last_bucket=--- [300, 'sent', 'bbbbbbbb-0000-0000-0000-000000000000']

As described above, the bucket garbage collector never enters its loop after this point (and hence never collects the last bucket with id=300), as the bucket generation state got updated, even though the garbage collector did not see the last replaced bucket in consts.BUCKET.SENT state.

Consideration

Consulted @yngvar-antonsson, @filonenko-mikhail and @Gerold103: all consider the described behaviour a bug in vshard.

@CuriousGeorgiy CuriousGeorgiy added the bug Something isn't working label Jan 17, 2022
@kyukhin kyukhin added teamC teamS Scaling and removed teamC labels Feb 11, 2022
@Gerold103 Gerold103 self-assigned this May 12, 2022
Gerold103 added a commit that referenced this issue May 20, 2022
The instance files storage.lua and router.lua load some modules to
use in _test.lua files in server:exec() functions. It works fine
as long as there aren't many libs and the _test.lua files don't
try to load the same ones.

In some future patches there appeared a problem that certain libs
are needed both in _test.lua files and on the instances as
globals to use them in :exec() calls. It can't be done because
exec requires the function have no upvalues. The function thinks
the libs referenced in it are from _test.lua file, not from the
instance, and treats them as upvalues.

A workaround is to make in exec() function body a hack like
    local mylib = _G.mylib
then _G is loaded on the instance and not treated as an upvalue.

This patch introduces a more generic solution - use 'i' prefix for
all global libraries loaded on the instances ('i' stands for
'instance').

Then in an :exec() call 'ifiber' would reference the
instance-local lib, while outside of exec the _test.lua file can
use just 'fiber' or 'lfiber' or whatever.

Needed for #314
Gerold103 added a commit that referenced this issue May 20, 2022
Server.lua in test/luatest_helpers/ is a fork from core Tarantool.
The future patches will need the new functions about getting and
waiting vclock.

This patch fetches the latest server.lua from Tarantool.

Needed for #314
Gerold103 added a commit that referenced this issue May 20, 2022
Bucket GC doesn't work properly with MVCC. This patch ports the
existing bucket GC test to luatest in order to extend it with
new mvcc-related cases in next commits.

Needed for #314
Gerold103 added a commit that referenced this issue May 20, 2022
Bucket garbage collector wakes up on _bucket generation change. It
used to happen on each update in _bucket space. When GC fiber
wakes up, it deletes the garbage and goes to sleep until next
generation change.

It didn't work with mvcc because the generation was bumped during
replace before commit, GC fiber woke up, saw no changes, and went
to sleep until next bump. Then _bucket update would commit and
there would be a SENT/GARBAGE bucket visible, but GC fiber
wouldn't even try to find it because the generation didn't change
since last wakeup.

The patch makes so the generation is bumped on commit to _bucket,
not on replace.

Closes #314
Gerold103 added a commit that referenced this issue May 21, 2022
The instance files storage.lua and router.lua load some modules to
use in _test.lua files in server:exec() functions. It works fine
as long as there aren't many libs and the _test.lua files don't
try to load the same ones.

In some future patches there appeared a problem that certain libs
are needed both in _test.lua files and on the instances as
globals to use them in :exec() calls. It can't be done because
exec requires the function have no upvalues. The function thinks
the libs referenced in it are from _test.lua file, not from the
instance, and treats them as upvalues.

A workaround is to make in exec() function body a hack like
    local mylib = _G.mylib
then _G is loaded on the instance and not treated as an upvalue.

This patch introduces a more generic solution - use 'i' prefix for
all global libraries loaded on the instances ('i' stands for
'instance').

Then in an :exec() call 'ifiber' would reference the
instance-local lib, while outside of exec the _test.lua file can
use just 'fiber' or 'lfiber' or whatever.

Needed for #314
Gerold103 added a commit that referenced this issue May 21, 2022
Server.lua in test/luatest_helpers/ is a fork from core Tarantool.
The future patches will need the new functions about getting and
waiting vclock.

This patch fetches the latest server.lua from Tarantool.

Needed for #314
Gerold103 added a commit that referenced this issue May 21, 2022
Bucket GC doesn't work properly with MVCC. This patch ports the
existing bucket GC test to luatest in order to extend it with
new mvcc-related cases in next commits.

Needed for #314
Gerold103 added a commit that referenced this issue May 21, 2022
Bucket garbage collector wakes up on _bucket generation change. It
used to happen on each update in _bucket space. When GC fiber
wakes up, it deletes the garbage and goes to sleep until next
generation change.

It didn't work with mvcc because the generation was bumped during
replace before commit, GC fiber woke up, saw no changes, and went
to sleep until next bump. Then _bucket update would commit and
there would be a SENT/GARBAGE bucket visible, but GC fiber
wouldn't even try to find it because the generation didn't change
since last wakeup.

The patch makes so the generation is bumped on commit to _bucket,
not on replace.

Closes #314
Gerold103 added a commit that referenced this issue May 23, 2022
Bucket GC doesn't work properly with MVCC. This patch ports the
existing bucket GC test to luatest in order to extend it with
new mvcc-related cases in next commits.

Needed for #314
Gerold103 added a commit that referenced this issue May 23, 2022
Bucket garbage collector wakes up on _bucket generation change. It
used to happen on each update in _bucket space. When GC fiber
wakes up, it deletes the garbage and goes to sleep until next
generation change.

It didn't work with mvcc because the generation was bumped during
replace before commit, GC fiber woke up, saw no changes, and went
to sleep until next bump. Then _bucket update would commit and
there would be a SENT/GARBAGE bucket visible, but GC fiber
wouldn't even try to find it because the generation didn't change
since last wakeup.

The patch makes so the generation is bumped on commit to _bucket,
not on replace.

Closes #314
Gerold103 added a commit that referenced this issue May 23, 2022
The instance files storage.lua and router.lua load some modules to
use in _test.lua files in server:exec() functions. It works fine
as long as there aren't many libs and the _test.lua files don't
try to load the same ones.

In some future patches there appeared a problem that certain libs
are needed both in _test.lua files and on the instances as
globals to use them in :exec() calls. It can't be done because
exec requires the function have no upvalues. The function thinks
the libs referenced in it are from _test.lua file, not from the
instance, and treats them as upvalues.

A workaround is to make in exec() function body a hack like
    local mylib = _G.mylib
then _G is loaded on the instance and not treated as an upvalue.

This patch introduces a more generic solution - use 'i' prefix for
all global libraries loaded on the instances ('i' stands for
'instance').

Then in an :exec() call 'ifiber' would reference the
instance-local lib, while outside of exec the _test.lua file can
use just 'fiber' or 'lfiber' or whatever.

Needed for #314
Gerold103 added a commit that referenced this issue May 23, 2022
Server.lua in test/luatest_helpers/ is a fork from core Tarantool.
The future patches will need the new functions about getting and
waiting vclock.

This patch fetches the latest server.lua from Tarantool.

Needed for #314
Gerold103 added a commit that referenced this issue May 23, 2022
Bucket GC doesn't work properly with MVCC. This patch ports the
existing bucket GC test to luatest in order to extend it with
new mvcc-related cases in next commits.

Needed for #314
@sergos sergos added the 3sp label May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working teamS Scaling
Projects
None yet
Development

No branches or pull requests

4 participants