Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move parts between storage volumes according to configured TTL expressions #7420

Merged
merged 50 commits into from
Dec 11, 2019
Merged

Move parts between storage volumes according to configured TTL expressions #7420

merged 50 commits into from
Dec 11, 2019

Conversation

excitoon
Copy link
Contributor

@excitoon excitoon commented Oct 22, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en.

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description (up to few sentences):

Move parts between storage volumes according to configured TTL expressions.

Detailed description (optional):

...

@amosbird
Copy link
Collaborator

I hereby not agree

lol

@alexey-milovidov alexey-milovidov changed the title Ttls Move parts between storage volumes according to configured TTL expressions. Oct 22, 2019
@alesapin alesapin self-assigned this Oct 22, 2019
Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why we have so different logic for common TTL and MoveTTL. Common TTL is just special case of MoveTTL.

class ASTTTLElement : public IAST
{
public:
enum DestinationType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MoveDestinationType was more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it's here? It used not only with TTLs. Let's make it separate enum class,

Copy link
Contributor Author

@excitoon excitoon Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if (json.has("moves"))
{
JSON moves = json["moves"];
for (auto move : moves)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const &

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def test_ttl_table(start_cluster):

@pytest.mark.parametrize("delete_suffix", [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we haven't any test for TTL moves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes so far.

if (!parser_string_literal.parse(pos, ast_space_name, expected))
return false;

destination_name = ast_space_name->as<ASTLiteral &>().value.get<const String &>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bad cast of there will be a number. Better tryGet and apparent exception in case of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I can not get the idea, it looks like we do not do that in way more other places.

dbms/src/Parsers/ExpressionElementParsers.cpp Show resolved Hide resolved
{
if (seen_delete_ttl)
{
throw Exception("Too many DELETE ttls", ErrorCodes::BAD_TTL_EXPRESSION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear message.

Copy link
Contributor Author

@excitoon excitoon Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

auto new_ttl_entry = create_ttl_entry(ttl_element.children[0]);
if (!only_check)
{
MoveTTLEntry entry = { new_ttl_entry.expression, new_ttl_entry.result_column, ttl_element.destination_type, ttl_element.destination_name };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need assignment?

Copy link
Contributor Author

@excitoon excitoon Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

for (const auto & part : data_parts)
{
String reason;
/// Don't report message to log, because logging is excessive
if (!can_move(part, &reason))
continue;

const auto ttl_entries_end = part->storage.move_ttl_entries_by_name.end();
auto best_ttl_entry_it = ttl_entries_end;
time_t max_max_ttl = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a mistake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto reservation = volume_ptr->reserve(part->bytes_on_disk);
if (reservation)
{
parts_to_move.emplace_back(part, std::move(reservation));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if can_move == false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated link: https://github.com/ClickHouse/ClickHouse/pull/7420/files/7a171b46f59b7fecd5286161c83a3a3aa199c47b#r337920407
In this case we will go into this code. And part won't be moved by any of policy or table TTL.

if (it != moves_ttl.begin())
writeString(",", out);

writeString("{\"expression\":", out);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we store expression here? It should be stored in table definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thing is basically an additional protection level, to not make wrong moves. Expressions from here are not being analysed for anything but rather used by keys for min/max values. If someone ALTER TTL expressions, data will not be moved according to stale TTLs.

@excitoon excitoon requested a review from alesapin November 11, 2019 12:19
@CurtizJ CurtizJ self-requested a review November 18, 2019 09:35
Copy link
Member

@CurtizJ CurtizJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. Have some minor issues. Now waiting for tests.

@@ -75,13 +75,12 @@ void buildScatterSelector(
}

/// Computes ttls and updates ttl infos
void updateTTL(const MergeTreeData::TTLEntry & ttl_entry, MergeTreeDataPart::TTLInfos & ttl_infos, Block & block, const String & column_name)
template <typename TTLEntry>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need template here? Isn't TTLEntry a single class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. In previous version it wasn't :-P

namespace DB
{

enum class TTLDestinationType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this enum to something like PartDestinationType, because it is used not only for TTL. And in cases when it is used for TTL, structures that contain it already have TTL prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@excitoon excitoon requested a review from CurtizJ December 1, 2019 15:21
@excitoon excitoon changed the title Move parts between storage volumes according to configured TTL expressions. Move parts between storage volumes according to configured TTL expressions Dec 2, 2019
Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done. Need to fix naming, add comments and make small changes in code,

dbms/src/Storages/MergeTree/BackgroundProcessingPool.h Outdated Show resolved Hide resolved
dbms/src/Storages/MergeTree/MergeTreeData.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/MergeTree/MergeTreeData.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/MergeTree/MergeTreeData.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/MergeTree/MergeTreeData.h Outdated Show resolved Hide resolved
dbms/src/Storages/MergeTree/MergeTreeData.h Show resolved Hide resolved
ttl_entry.expression->execute(block);

auto & ttl_info = (column_name.empty() ? ttl_infos.table_ttl : ttl_infos.columns_ttl[column_name]);
remove_column = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we remove columns after execution? Looks like side effect of this function. At least we need to update comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't want to spoil block with extra columns. In this case merge would complain about extra columns (different table structure).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we just store expression result in block, than use to update ttl_info, and after that we remove it. OK.

@@ -78,6 +89,9 @@ void ReplicatedMergeTreeTableMetadata::write(WriteBuffer & out) const
if (!ttl_table.empty())
out << "ttl: " << ttl_table << "\n";

if (!ttl_move.empty())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not at the end of the list?

Copy link
Contributor Author

@excitoon excitoon Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to group similar things together (ttl and move_ttl).

return {};
}

bool MergeTreeData::TTLEntry::isPartInDestination(const DiskSpace::StoragePolicyPtr & policy, const MergeTreeDataPart & part) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it can be a static method of MergeTreeData with three arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be the better solution? Idea was to separate scanning of ttl_entry fields, because they contain non-trivial logic of selecting something depending on entry type. I do not want to spread this things to MergeTreeData in order to have locality of data and logic of TTLEntry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

DiskSpace::SpacePtr destination_ptr = ttl_entry->getDestination(storage_policy);
if (!destination_ptr)
{
if (ttl_entry->destination_type == PartDestinationType::VOLUME)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible? Disks/Volumes configuration cannot be changed without server restart. So why we don't check it during the creation of TTL expression? And here the exception is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is planned to allow user to change disks/volumes without restart, that shall be one of upcoming pull requests. Typically, we call this method when inserting and merging data. And it shall reserve any space even if space by TTL is unavailable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@excitoon excitoon requested a review from alesapin December 5, 2019 08:05
Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@alesapin alesapin merged commit b708f86 into ClickHouse:master Dec 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants