-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move parts between storage volumes according to configured TTL expressions #7420
Conversation
lol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why we have so different logic for common TTL and MoveTTL. Common TTL is just special case of MoveTTL.
dbms/src/Parsers/ASTTTLElement.h
Outdated
class ASTTTLElement : public IAST | ||
{ | ||
public: | ||
enum DestinationType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MoveDestinationType
was more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it's here? It used not only with TTLs. Let's make it separate enum class
,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
if (json.has("moves")) | ||
{ | ||
JSON moves = json["moves"]; | ||
for (auto move : moves) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const &
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_ttl_table(start_cluster): | ||
|
||
@pytest.mark.parametrize("delete_suffix", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we haven't any test for TTL moves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes so far.
if (!parser_string_literal.parse(pos, ast_space_name, expected)) | ||
return false; | ||
|
||
destination_name = ast_space_name->as<ASTLiteral &>().value.get<const String &>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bad cast of there will be a number. Better tryGet
and apparent exception in case of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I can not get the idea, it looks like we do not do that in way more other places.
{ | ||
if (seen_delete_ttl) | ||
{ | ||
throw Exception("Too many DELETE ttls", ErrorCodes::BAD_TTL_EXPRESSION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
auto new_ttl_entry = create_ttl_entry(ttl_element.children[0]); | ||
if (!only_check) | ||
{ | ||
MoveTTLEntry entry = { new_ttl_entry.expression, new_ttl_entry.result_column, ttl_element.destination_type, ttl_element.destination_name }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need assignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
for (const auto & part : data_parts) | ||
{ | ||
String reason; | ||
/// Don't report message to log, because logging is excessive | ||
if (!can_move(part, &reason)) | ||
continue; | ||
|
||
const auto ttl_entries_end = part->storage.move_ttl_entries_by_name.end(); | ||
auto best_ttl_entry_it = ttl_entries_end; | ||
time_t max_max_ttl = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto reservation = volume_ptr->reserve(part->bytes_on_disk); | ||
if (reservation) | ||
{ | ||
parts_to_move.emplace_back(part, std::move(reservation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if can_move == false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated link: https://github.com/ClickHouse/ClickHouse/pull/7420/files/7a171b46f59b7fecd5286161c83a3a3aa199c47b#r337920407
In this case we will go into this code. And part won't be moved by any of policy or table TTL.
if (it != moves_ttl.begin()) | ||
writeString(",", out); | ||
|
||
writeString("{\"expression\":", out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we store expression here? It should be stored in table definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thing is basically an additional protection level, to not make wrong moves. Expressions from here are not being analysed for anything but rather used by keys for min/max values. If someone ALTER TTL expressions, data will not be moved according to stale TTLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok. Have some minor issues. Now waiting for tests.
@@ -75,13 +75,12 @@ void buildScatterSelector( | |||
} | |||
|
|||
/// Computes ttls and updates ttl infos | |||
void updateTTL(const MergeTreeData::TTLEntry & ttl_entry, MergeTreeDataPart::TTLInfos & ttl_infos, Block & block, const String & column_name) | |||
template <typename TTLEntry> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need template here? Isn't TTLEntry a single class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. In previous version it wasn't :-P
namespace DB | ||
{ | ||
|
||
enum class TTLDestinationType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename this enum to something like PartDestinationType
, because it is used not only for TTL. And in cases when it is used for TTL, structures that contain it already have TTL
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
…disk selection in `ReplicatedMergeTree`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done. Need to fix naming, add comments and make small changes in code,
ttl_entry.expression->execute(block); | ||
|
||
auto & ttl_info = (column_name.empty() ? ttl_infos.table_ttl : ttl_infos.columns_ttl[column_name]); | ||
remove_column = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we remove columns after execution? Looks like side effect of this function. At least we need to update comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't want to spoil block with extra columns. In this case merge would complain about extra columns (different table structure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we just store expression result in block, than use to update ttl_info, and after that we remove it. OK.
@@ -78,6 +89,9 @@ void ReplicatedMergeTreeTableMetadata::write(WriteBuffer & out) const | |||
if (!ttl_table.empty()) | |||
out << "ttl: " << ttl_table << "\n"; | |||
|
|||
if (!ttl_move.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not at the end of the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to group similar things together (ttl
and move_ttl
).
return {}; | ||
} | ||
|
||
bool MergeTreeData::TTLEntry::isPartInDestination(const DiskSpace::StoragePolicyPtr & policy, const MergeTreeDataPart & part) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it can be a static method of MergeTreeData with three arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be the better solution? Idea was to separate scanning of ttl_entry
fields, because they contain non-trivial logic of selecting something depending on entry type. I do not want to spread this things to MergeTreeData
in order to have locality of data and logic of TTLEntry
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
DiskSpace::SpacePtr destination_ptr = ttl_entry->getDestination(storage_policy); | ||
if (!destination_ptr) | ||
{ | ||
if (ttl_entry->destination_type == PartDestinationType::VOLUME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this possible? Disks/Volumes configuration cannot be changed without server restart. So why we don't check it during the creation of TTL expression? And here the exception is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is planned to allow user to change disks/volumes without restart, that shall be one of upcoming pull requests. Typically, we call this method when inserting and merging data. And it shall reserve any space even if space by TTL is unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en.
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
Move parts between storage volumes according to configured TTL expressions.
Detailed description (optional):
...