-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need for partial merge #459
Comments
Currently merge is managed through the do_merge([], [], SinkLevel, _SinkB, _RP, NewSQN, _MaxSQN, _Opts, Additions) ->
leveled_log:log(pc011, [NewSQN, SinkLevel, length(Additions)]),
lists:reverse(Additions);
do_merge(KL1, KL2, SinkLevel, SinkB, RP, NewSQN, MaxSQN, OptsSST, Additions) ->
FileName =
leveled_penciller:sst_filename(
NewSQN, SinkLevel, length(Additions)
),
leveled_log:log(pc012, [NewSQN, FileName, SinkB]),
TS1 = os:timestamp(),
case leveled_sst:sst_newmerge(RP, FileName,
KL1, KL2, SinkB, SinkLevel, MaxSQN,
OptsSST) of
empty ->
leveled_log:log(pc013, [FileName]),
do_merge(
[], [],
SinkLevel, SinkB,
RP, NewSQN, MaxSQN,
OptsSST,
Additions
);
{ok, Pid, Reply, Bloom} ->
{{KL1Rem, KL2Rem}, SmallestKey, HighestKey} = Reply,
Entry =
leveled_pmanifest:new_entry(
SmallestKey, HighestKey, Pid, FileName, Bloom),
leveled_log:log_timer(pc015, [], TS1),
do_merge(
KL1Rem, KL2Rem,
SinkLevel, SinkB,
RP, NewSQN, MaxSQN,
OptsSST,
[Entry|Additions]
)
end. A MaxAdditions can be added so that if there is a loop around and either KL1 and KL2 is non-empty, and length(Additions) == MaxAdditions ->. the standard loop can be terminated. In this case, there is still a need to write the remainders for both the higher and lower level. For the higher level this is trivial, as there was only one file passed into the merge at this level - so the remainder cna be written using The update of the penciller manifest is then complicated by the truncation of the merge. |
This change needs to be back-ported to |
When a level is over capacity in the ledger, a file is packed almost at random (in some cases grooming can be an influence) - and that file is fully merged into the level below.
Between level 4 and Level 5 say, there are 384 files at Level 4 and 2304 files at Level 5. So an average 6 files at level 5 for everyone at level 4 - so on average one would expect to merge into 6 files below.
But what about extreme scenarios. Imagine a situation with three object buckets
<<"B0">>
,<<"B1">>
and<<"B2">>
. At the start of the database all objects loaded had keys in<<"B1">>
, but after a point, all updates are a 50/50 mix to buckets <<"B0">> and<<"B2">>
.There could be now a scenario where there are 2000 files at Level5 which contain only
<<"B0">>
objects, and amongst the files at Level 4 ones that contain only<<"B0">>
objects, ones that contain only<<"B1">>
objects - and one file in between (as each level is ordered) which contains some<<"B0">>
objects and some<<"B2">>
objects.If the file chosen for compaction is the mixed file - it will need to merge into > 2000 files below - although most of those merge events will be taking continuously from the same (lower) side in succession. This will take a long time - and if updates are ongoing, this will lead to a significant number of
slow_offer
backend pauses.These events have been rare so far. Running a perf_SUITE test with 32M objects (much bigger than normal) such an event was triggered though ... and a load which should have taken 10 minutes took 6 hours. Such severe fluctuations are very much against the principles of what leveled should achieve. So although this may be a very rare event, it should still be considered a
HIGH
priority bug.The text was updated successfully, but these errors were encountered: