You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
That seed of {1405,406317,535184} is nice & deterministic on my Mac. YMMV, substitute now() in its place and run until you find something that fails very consistently.
This isn't a merge race, it's a model problem checking with folds that truly freeze the keydir.
Fork a fold that runs for a while.
Do a lot of puts so that the keydir is frozen at epoch 47.
Do some more puts while folding a couple more times. The epoch counter creeps up to 50.
Do a delete. The thing that we're deleting is inside file 5. The keydir_remove deletes the key in the keydir at epoch 50.
A new fold starts (by the same pid that did the delete in previous step, FWIW), bitcask:open_fold_files() starts executing. The keydir epoch is now 51, but because we're really frozen, the keydir's pending_start_epoch 47.
After all fold files are open, yes indeed, we're using a folding epoch of 47.
When folding, and we're folding file 5, we query the keydir for our key, and because the delete at epoch 51 isn't visible, the keydir says that file 5's entry is current. Thus the key appears in the fold results by pid P when P had deleted that key just moments earlier.
Good to hear it's a test model issue. I'm actually surprised we haven't seen more of these now that the keydir can freeze and cap the visibility of folds at hard to predict times without the EQC model accounting for it. shrug
I used to have such a check and would disregard results that happened during a known-frozen test case. sigh I haven't decided if I want to resurrect it or not. The model already knows that multiple folds are happening, which is a prerequisite for a frozen keydir.
Here's counterexample Cp8:
That seed of
{1405,406317,535184}
is nice & deterministic on my Mac. YMMV, substitutenow()
in its place and run until you find something that fails very consistently.The last three ops of the main thread are:
The failure is key 5 should not be found by step 36's fold, but it is.
More research required. There's a possible race with a forked merge and/or with merge at open time via
bitcask:make_merge_file()
.The text was updated successfully, but these errors were encountered: