Need for partial merge #459

martinsumner · 2024-11-26T09:48:31Z

When a level is over capacity in the ledger, a file is packed almost at random (in some cases grooming can be an influence) - and that file is fully merged into the level below.

Between level 4 and Level 5 say, there are 384 files at Level 4 and 2304 files at Level 5. So an average 6 files at level 5 for everyone at level 4 - so on average one would expect to merge into 6 files below.

But what about extreme scenarios. Imagine a situation with three object buckets <<"B0">>, <<"B1">> and <<"B2">>. At the start of the database all objects loaded had keys in <<"B1">>, but after a point, all updates are a 50/50 mix to buckets <<"B0">> and <<"B2">>.

There could be now a scenario where there are 2000 files at Level5 which contain only <<"B0">> objects, and amongst the files at Level 4 ones that contain only <<"B0">> objects, ones that contain only <<"B1">> objects - and one file in between (as each level is ordered) which contains some <<"B0">> objects and some <<"B2">> objects.

If the file chosen for compaction is the mixed file - it will need to merge into > 2000 files below - although most of those merge events will be taking continuously from the same (lower) side in succession. This will take a long time - and if updates are ongoing, this will lead to a significant number of slow_offer backend pauses.

These events have been rare so far. Running a perf_SUITE test with 32M objects (much bigger than normal) such an event was triggered though ... and a load which should have taken 10 minutes took 6 hours. Such severe fluctuations are very much against the principles of what leveled should achieve. So although this may be a very rare event, it should still be considered a HIGH priority bug.

The text was updated successfully, but these errors were encountered:

martinsumner · 2024-11-26T10:46:22Z

Currently merge is managed through the do_merge/8 loop in leveled_pclerk:

do_merge([], [], SinkLevel, _SinkB, _RP, NewSQN, _MaxSQN, _Opts, Additions) ->
    leveled_log:log(pc011, [NewSQN, SinkLevel, length(Additions)]),
    lists:reverse(Additions);
do_merge(KL1, KL2, SinkLevel, SinkB, RP, NewSQN, MaxSQN, OptsSST, Additions) ->
    FileName =
        leveled_penciller:sst_filename(
            NewSQN, SinkLevel, length(Additions)
        ),
    leveled_log:log(pc012, [NewSQN, FileName, SinkB]),
    TS1 = os:timestamp(),
    case leveled_sst:sst_newmerge(RP, FileName,
                                    KL1, KL2, SinkB, SinkLevel, MaxSQN,
                                    OptsSST) of
        empty ->
            leveled_log:log(pc013, [FileName]),
            do_merge(
                [], [],
                SinkLevel, SinkB,
                RP, NewSQN, MaxSQN,
                OptsSST, 
                Additions
            );                        
        {ok, Pid, Reply, Bloom} ->
            {{KL1Rem, KL2Rem}, SmallestKey, HighestKey} = Reply,
                Entry =
                    leveled_pmanifest:new_entry(
                        SmallestKey, HighestKey, Pid, FileName, Bloom),
                leveled_log:log_timer(pc015, [], TS1),
                do_merge(
                    KL1Rem, KL2Rem,
                    SinkLevel, SinkB,
                    RP, NewSQN, MaxSQN,
                    OptsSST,
                    [Entry|Additions]
                )
    end.

A MaxAdditions can be added so that if there is a loop around and either KL1 and KL2 is non-empty, and length(Additions) == MaxAdditions ->. the standard loop can be terminated.

In this case, there is still a need to write the remainders for both the higher and lower level.

For the higher level this is trivial, as there was only one file passed into the merge at this level - so the remainder cna be written using leveled_sst:sst_newmerge/8 with an empty KL2. For the lower level, this is harder, as we need to terminate when we reach a sst_pointer which is a next file - so that all subsequent files are not re-written, but are left unmodified.

The update of the penciller manifest is then complicated by the truncation of the merge.

martinsumner · 2024-11-29T09:04:54Z

This change needs to be back-ported to develop-3.1 to be added into Riak KV 3.2.3.

martinsumner mentioned this issue Nov 26, 2024

Mas d34 leveled.i459 partialmerge #460

Merged

martinsumner closed this as completed in #460 Nov 30, 2024

martinsumner mentioned this issue Nov 30, 2024

Mas d31 leveled.i459 partialmerge #461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for partial merge #459

Need for partial merge #459

martinsumner commented Nov 26, 2024 •

edited

Loading

martinsumner commented Nov 26, 2024

martinsumner commented Nov 29, 2024

Need for partial merge #459

Need for partial merge #459

Comments

martinsumner commented Nov 26, 2024 • edited Loading

martinsumner commented Nov 26, 2024

martinsumner commented Nov 29, 2024

martinsumner commented Nov 26, 2024 •

edited

Loading