Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do background compaction when the ratio of stable data covered by delete range is too large #2416

Merged
merged 32 commits into from
Aug 23, 2021
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
aa3dca4
Add tests to ensure the behavior of DeleteRange
JaySon-Huang Dec 30, 2020
e0d27b0
Cleanup some unreasonable include
JaySon-Huang May 26, 2021
6177426
Use macro to define enums
JaySon-Huang May 26, 2021
9c4f3a6
Merge branch 'master' into compact_by_delete_range
lidezhu Jul 19, 2021
adc1534
small fix
lidezhu Jul 19, 2021
e72f1c0
add more log
lidezhu Jul 20, 2021
32b9ab5
small fix
lidezhu Jul 20, 2021
b5cf9de
fix cannot gc bug
lidezhu Jul 20, 2021
e2c25d0
hack to test
lidezhu Jul 20, 2021
4eb4d00
add more log for test
lidezhu Jul 20, 2021
15a8af0
add log for debug
lidezhu Jul 20, 2021
64e4843
add more log for debug
lidezhu Jul 21, 2021
a2acfb7
try fix not merge
lidezhu Jul 21, 2021
73a51a1
try to merge all emtpy segments
lidezhu Jul 21, 2021
d780216
Merge branch 'master' into compact_by_delete_range
lidezhu Aug 4, 2021
60c6c65
remove extra log
lidezhu Aug 4, 2021
1f3f7e2
small fix
lidezhu Aug 4, 2021
b0498ff
fix conflict
lidezhu Aug 6, 2021
df90c4f
format code
lidezhu Aug 6, 2021
2aef8ba
fix compile
lidezhu Aug 6, 2021
bdc779d
remove macro
lidezhu Aug 12, 2021
1aa065a
remove comment in unit test
lidezhu Aug 12, 2021
cc32f91
small improvement for gtest comment
lidezhu Aug 12, 2021
96ec774
remove extra comment
lidezhu Aug 12, 2021
a826854
Update dbms/src/Storages/DeltaMerge/Delta/Snapshot.cpp
lidezhu Aug 20, 2021
a5f9081
fix comment
lidezhu Aug 23, 2021
9fcd6f4
Merge branch 'compact_by_delete_range' of github.com:JaySon-Huang/tic…
lidezhu Aug 23, 2021
de06502
Merge branch 'master' into compact_by_delete_range
lidezhu Aug 23, 2021
c676302
format code
lidezhu Aug 23, 2021
0757d30
fix typo
lidezhu Aug 23, 2021
bd457f0
Merge branch 'master' into compact_by_delete_range
lidezhu Aug 23, 2021
4e7a39b
Merge branch 'master' into compact_by_delete_range
lidezhu Aug 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ website/presentations
build_docker
docker/builder/tics
release-centos7/tiflash
release-centos7/tiflash-*
release-centos7/build-release
release-darwin/tiflash
release-darwin/build-release
Expand Down
1 change: 1 addition & 0 deletions dbms/src/Interpreters/Settings.h
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ struct Settings
M(SettingUInt64, dt_bg_gc_check_interval, 600, "Background gc thread check interval, the unit is second.")\
M(SettingInt64, dt_bg_gc_max_segments_to_check_every_round, 15, "Max segments to check in every gc round, value less than or equal to 0 means gc no segments.")\
M(SettingFloat, dt_bg_gc_ratio_threhold_to_trigger_gc, 1.2, "Trigger segment's gc when the ratio of invalid version exceed this threhold. Values smaller than or equal to 1.0 means gc all segments")\
M(SettingFloat, dt_bg_gc_delta_delete_ratio_to_trigger_gc, 0.8, "Trigger segment's gc when the ratio of delta delete range to stable exceeds this ratio.")\
flowbehappy marked this conversation as resolved.
Show resolved Hide resolved
M(SettingUInt64, dt_insert_max_rows, 0, "Max rows of insert blocks when write into DeltaTree Engine. By default 0 means no limit.")\
M(SettingBool, dt_enable_rough_set_filter, true, "Whether to parse where expression as Rough Set Index filter or not.") \
M(SettingBool, dt_raw_filter_range, true, "Do range filter or not when read data in raw mode in DeltaTree Engine.")\
Expand Down
2 changes: 2 additions & 0 deletions dbms/src/Storages/DeltaMerge/Delta/DeltaValueSpace.h
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,8 @@ class DeltaValueSnapshot : public std::enable_shared_from_this<DeltaValueSnapsho
size_t getBytes() const { return bytes; }
size_t getDeletes() const { return deletes; }

RowKeyRange getSquashDeleteRange() const;

const auto & getStorageSnapshot() { return storage_snap; }
const auto & getSharedDeltaIndex() { return shared_delta_index; }
};
Expand Down
13 changes: 12 additions & 1 deletion dbms/src/Storages/DeltaMerge/Delta/Snapshot.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -122,11 +122,22 @@ DeltaSnapshotPtr DeltaValueSpace::createSnapshot(const DMContext & context, bool
return snap;
}

RowKeyRange DeltaValueSnapshot::getSquashDeleteRange() const
lidezhu marked this conversation as resolved.
Show resolved Hide resolved
{
RowKeyRange squashed_delete_range = RowKeyRange::newNone(is_common_handle, rowkey_column_size);
for (auto iter = packs.cbegin(); iter != packs.cend(); ++iter)
{
const auto & pack = *iter;
if (auto dp_delete = pack->tryToDeleteRange(); dp_delete)
squashed_delete_range = squashed_delete_range.merge(dp_delete->getDeleteRange());
}
return squashed_delete_range;
}

// ================================================
// DeltaValueReader
// ================================================


DeltaValueReader::DeltaValueReader(const DMContext & context,
const DeltaSnapshotPtr & delta_snap_,
const ColumnDefinesPtr & col_defs_,
Expand Down
145 changes: 85 additions & 60 deletions dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp

Large diffs are not rendered by default.

46 changes: 26 additions & 20 deletions dbms/src/Storages/DeltaMerge/DeltaMergeStore.h
Original file line number Diff line number Diff line change
Expand Up @@ -172,9 +172,9 @@ class DeltaMergeStore : private boost::noncopyable

enum TaskRunThread
{
Thread_BG_Thread_Pool,
Thread_FG,
Thread_BG_GC,
BackgroundThreadPool,
Foreground,
BackgroundGCThread,
};

static std::string toString(ThreadType type)
Expand Down Expand Up @@ -204,21 +204,6 @@ class DeltaMergeStore : private boost::noncopyable
}
}

static std::string toString(TaskRunThread type)
{
switch (type)
{
case Thread_BG_Thread_Pool:
return "BackgroundThreadPool";
case Thread_FG:
return "Foreground";
case Thread_BG_GC:
return "BackgroundGCThread";
default:
return "Unknown";
}
}

static std::string toString(TaskType type)
{
switch (type)
Expand All @@ -240,6 +225,21 @@ class DeltaMergeStore : private boost::noncopyable
}
}

static std::string toString(TaskRunThread type)
{
switch (type)
{
case BackgroundThreadPool:
return "BackgroundThreadPool";
case Foreground:
return "Foreground";
case BackgroundGCThread:
return "BackgroundGCThread";
default:
return "Unknown";
}
}

struct BackgroundTask
{
TaskType type;
Expand Down Expand Up @@ -414,13 +414,19 @@ class DeltaMergeStore : private boost::noncopyable

SegmentPair segmentSplit(DMContext & dm_context, const SegmentPtr & segment, bool is_foreground);
void segmentMerge(DMContext & dm_context, const SegmentPtr & left, const SegmentPtr & right, bool is_foreground);
SegmentPtr segmentMergeDelta(DMContext & dm_context, const SegmentPtr & segment, const TaskRunThread thread);
SegmentPtr segmentMergeDelta(DMContext & dm_context,
const SegmentPtr & segment,
const TaskRunThread thread,
SegmentSnapshotPtr segment_snap = nullptr);

bool updateGCSafePoint();

bool handleBackgroundTask(bool heavy);

bool isSegmentValid(const SegmentPtr & segment);
// isSegmentValid should be protected by lock on `read_write_mutex`
inline bool isSegmentValid(std::shared_lock<std::shared_mutex> &, const SegmentPtr & segment) { return doIsSegmentValid(segment); }
inline bool isSegmentValid(std::unique_lock<std::shared_mutex> &, const SegmentPtr & segment) { return doIsSegmentValid(segment); }
bool doIsSegmentValid(const SegmentPtr & segment);

void restoreStableFiles();

Expand Down
1 change: 1 addition & 0 deletions dbms/src/Storages/DeltaMerge/RowKeyRange.h
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#pragma once
#include <Columns/ColumnString.h>
#include <Core/Types.h>
#include <Functions/FunctionHelpers.h>
#include <IO/WriteHelpers.h>
Expand Down
4 changes: 2 additions & 2 deletions dbms/src/Storages/DeltaMerge/Segment.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,13 +138,13 @@ class Segment : private boost::noncopyable
size_t expected_block_size = DEFAULT_BLOCK_SIZE);

/// Return a stream which is suitable for exporting data.
/// reorgize_block: put those rows with the same pk rows into the same block or not.
/// reorganize_block: put those rows with the same pk rows into the same block or not.
BlockInputStreamPtr getInputStreamForDataExport(const DMContext & dm_context,
const ColumnDefines & columns_to_read,
const SegmentSnapshotPtr & segment_snap,
const RowKeyRange & data_range,
size_t expected_block_size = DEFAULT_BLOCK_SIZE,
bool reorgnize_block = true) const;
bool reorganize_block = true) const;

BlockInputStreamPtr getInputStreamRaw(const DMContext & dm_context,
const ColumnDefines & columns_to_read,
Expand Down
13 changes: 10 additions & 3 deletions dbms/src/Storages/DeltaMerge/StableValueSpace.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -333,15 +333,22 @@ SkippableBlockInputStreamPtr StableValueSpace::Snapshot::getInputStream(const DM

RowsAndBytes StableValueSpace::Snapshot::getApproxRowsAndBytes(const DMContext & context, const RowKeyRange & range)
{
if (valid_rows == 0)
// Avoid unnessary reading IO
if (valid_rows == 0 || range.none())
return {0, 0};

size_t match_packs = 0;
size_t total_match_rows = 0;
size_t total_match_bytes = 0;
// Usually, this method will be called for some "cold" key ranges. Loading the index
// into cache may pollute the cache and make the hot index cache invalid. Set the
// index cache to nullptr so that the cache won't be polluted.
// TODO: We can use the cache if the index happens to exist in the cache, but
// don't refill the cache if the index does not exist.
for (auto & f : stable->files)
{
auto filter = DMFilePackFilter::loadFrom(f,
context.db_context.getGlobalContext().getMinMaxIndexCache(),
auto filter = DMFilePackFilter::loadFrom(f, //
nullptr,
context.hash_salt,
range,
RSOperatorPtr{},
Expand Down
3 changes: 2 additions & 1 deletion dbms/src/Storages/DeltaMerge/convertColumnTypeHelpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ namespace DB
namespace ErrorCodes
{
extern const int BAD_ARGUMENTS;
}
extern const int NOT_IMPLEMENTED;
} // namespace ErrorCodes

namespace DM
{
Expand Down
Loading