Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an internal option to enable vertical compaction #1917

Conversation

pracucci
Copy link
Contributor

In Cortex, we're currently working on an experimental TSDB-storage layer based on Thanos 🎉❤️. The next step would be introducing the Thanos compactor in Cortex too, but to do that we would need a way to allow vertical compaction (due to how the replication works, in Cortex we have overlapping blocks).

According to our experiments, enabling vertical compaction is just a matter of disabling the checks done by Thanos to make sure blocks don't overlap. In this PR I'm suggesting to introduce an internal option (not exposed to Thanos users) to allow Cortex enabling it.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Introduce an internal option to enable vertical compaction

Verification

Manual tests.

@brancz
Copy link
Member

brancz commented Dec 20, 2019

This is just as useful for people running the Thanos receive component with replication factor >1. I think it would be useful to have this regardless. The larger question I would have is how is this different from #1276 (I realize #1276 is more, but it is capable of the same, no?)?

@pracucci
Copy link
Contributor Author

The larger question I would have is how is this different from #1276 (I realize #1276 is more, but it is capable of the same, no?)?

Thanks @brancz for your quick feedback. My take on #1276 :

  1. Looks exciting, but I suspect may take some time to get it production-grade
  2. On Cortex side we don't really need offline deduplication (we have series replicated with the same exact timestamp-value pairs), so I think we would feel more comfortable stay on TSDB vertical compaction

What's your take?

@brancz
Copy link
Member

brancz commented Dec 20, 2019

I think that's generally fair. If we go ahead with something here though I think it at least needs to work with duplicate blocks due to thanos receive replication, which would just mean that a labelset (usually a single label, but can in theory be mutliple for example when replicating to multiple regions) is removed from the candidate blocks.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but do you mind adding log/metric for when is the actual vertical compaction is triggered/enabled?

The difference between vertical and offline deduplication is well described many times already but once again:

  • Strictly same timestamps vs realistically different scrape times.
  • External-label aware deduplication.

It looks to me that actually offline deduplication should fix all cases indeed and this PR is a temporary solution working only for Cortex ingesters. For data replicated by remote write receiver we still ensure different external labels, so this does not help us out of the box.

Since the change is not very disruptive LGTM, but I think we need to take a greater look on #1276 🙈 @smalldirector who is doing amazing work on maintaining the branch with the offline dedup.

@smalldirector by any chance do you have any feedback on running your code for longer time? My assumption was that you already use it on production, or not? (: Also commented on that PR on the way we can move forward.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the add-flag-to-compactor-to-allow-overlapping-blocks branch from c3770d9 to 7c15a0f Compare December 23, 2019 13:09
@pracucci
Copy link
Contributor Author

I think it at least needs to work with duplicate blocks due to thanos receive replication, which would just mean that a labelset

@brancz I see your point. What do you think if we keep the two things separated, and I work on it in a separate PR? I would like to gather some specs first, while we may move on with this PR to unlock the compactor on the Cortex side.

do you mind adding log/metric for when is the actual vertical compaction is triggered/enabled?

@bwplotka Makes sense. Now an example of compaction-related metrics is:

prometheus_tsdb_compactions_total 30
prometheus_tsdb_vertical_compactions_total 18

thanos_compact_group_compactions_total{group="0@11146230207840493432"} 1
thanos_compact_group_compactions_total{group="0@5679675083797525161"} 29
thanos_compact_group_vertical_compactions_total{group="0@11146230207840493432"} 1
thanos_compact_group_vertical_compactions_total{group="0@5679675083797525161"} 17

About the log there's already a log by TSDB compactor (see below), so I've just added an extra key-value pair to the "compacted blocks" debug log, but I'm open to suggestions.

level=warn ts=2019-12-23T13:04:49.673716Z caller=compact.go:678 msg="found overlapping blocks during compaction" ulid=[REDACTED]

What's your take?

@pracucci
Copy link
Contributor Author

The failing tests looks unrelated. Please advise in case I've misread it.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@bwplotka
Copy link
Member

Rerunning workflow for you.

Thanks for metrics, this should do. Mentioned entry is fine as well. (:

@bwplotka bwplotka merged commit 39f8623 into thanos-io:master Dec 23, 2019
@pracucci pracucci deleted the add-flag-to-compactor-to-allow-overlapping-blocks branch December 24, 2019 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants