-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact: Offline deduplication #1276
Conversation
@bwplotka I submitted the PR for offline deduplication function, please help add reviewers and review the codes. Thanks. |
Thanks @smalldirector and we definitely need that (: Is there any way you can split this PR to smaller chunks in any way? It will will improve reviewing process majorily! |
@bwplotka Good question here. I was trying to split the PR before, however I didn't figure out one good way. For my understanding, we need to keep the function work in the PR before it is merged. However my PR only provide one function |
@bwplotka The dudep function support both of raw metrics and downsampling metrics, should I split it to two separate PRs(one is for raw metrics, another is for downsampling metrics)? |
87e94eb
to
62cc9bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for this great work, looks solid. I think this might work from the first glance, but I would love to make this bit less complex, smaller in LOCs and less memory consuming before merging.
TL;DR: Reuse stuff! (:
You are right it is hard to split this - but I think we can really decrease those codelines majorily by reusing more code.
I proposed various option to reuse code that is there already, so we can have only one way of doing things e.g:
- deduplicating (both offline and online) (iterators)
- syncing
- downloading block
- specifying time ranges
- specifying aggr type
It will reduce number of lines for this PR a lot IMO. Also we probably need to use iterators to stream the whole operation - otherwise it will take too much memory ): I know it is hard work to make this happen, but happy to help with that as well.
We could split then: Making sure Syncer
is generic enough to be used by deduper and then rest - we could start with that.
pkg/compact/dedup/replica.go
Outdated
return NewReplicas(s.labelName, result) | ||
} | ||
|
||
func (s *ReplicaSyncer) download(ctx context.Context, id ulid.ULID) (*metadata.Meta, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of adding yet another Download method. Same with Syncer. Can we reuse those?
We already have problem with having multiple Syncer so we have #1335
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/compact/dedup/replica.go
Outdated
} | ||
} | ||
|
||
func (s *ReplicaSyncer) Sync(ctx context.Context) (Replicas, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this for compactor. Can we reuse somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with above.
pkg/compact/dedup/reader.go
Outdated
postings index.Postings | ||
} | ||
|
||
func NewBlockReader(logger log.Logger, resolution int64, blockDir string) (*BlockReader, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this? We already have tsdb.BlockReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When try to get the sorted postings for one block, it needs to call four functions to make it: OpenBlock(), Index(), Postings(), SortedPostings()
. Here just use this BlockReader
to provide a convenient way to access one block's postings and chunk reader.
pkg/compact/dedup/reader.go
Outdated
// samples from non-downsample blocks | ||
RawSample SampleType = iota | ||
// samples from downsample blocks, map to downsample AggrType | ||
CountSample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just reuse AggrType
? This will avoid those mappings and simplify code (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me figure out the way to handle raw sample with AggrType.
pkg/compact/dedup/reader.go
Outdated
return chks, nil | ||
} | ||
|
||
type SampleReader struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. We deduplicate in Iterators
interface currently... Wonder if we can reuse same algorithm as here: https://github.com/thanos-io/thanos/blob/master/pkg/query/iter.go#L375
Again - to reuse code to reduce overall complexity for understanding Thanos codebase (: and to gain benefits from improvements by changing just single place (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this SampleReader
doesn't do the deduplication logic. It just provides the function to read the samples under the specified time range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And using DedupSeriesIterator
to do the dedup logic:
thanos/pkg/compact/dedup/merger.go
Line 421 in 62cc9bf
merge := func(st SampleType) []*Sample { |
So both of online and offline deduplication are using the same
penalty
algorithm.
pkg/compact/dedup/merger.go
Outdated
"github.com/thanos-io/thanos/pkg/query" | ||
) | ||
|
||
type TimeWindow struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have tsdb.TimeRange{}
Can we reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know tsdb has TimeRange
. Will change to use it.
|
||
type BlockGroups []*BlockGroup | ||
|
||
func NewBlockGroups(replicas Replicas) BlockGroups { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be doable in Iterator as well. I think it would more understandable if we would have just one method/algo for this - for both offline and online deduplication. Online deduplication is available here: https://github.com/thanos-io/thanos/blob/master/pkg/query/iter.go#L304
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlockGroups
provides the function to group all the blocks under the specified time range. It doesn't provide any deduplication here.
@bwplotka Thanks for your 1st round review. Totally agree with you that we should leverage more on the existed codes. For the time range and aggr type, I made the code changes. For the sync and downloading block, as I mentioned above, the #1335 will do the refactor task on those function. If I do it after the #1335, it will save us lots of efforts. Currently the offline deduplication logic follows the same dedup algorithm with online deduplication, so no code changes need to be done here. |
Cool @smalldirector , so is it ready for second review iteration? (: |
0d51ad0
to
78c280d
Compare
We need to get back to this finally (: cc @GiedriusS @povilasv @brancz Sorry for delay. |
395dcce
to
0031d8e
Compare
Rebased the codes for the upcoming review. |
cmd/thanos/compact.go
Outdated
f := func() error { | ||
if isEnableDedup(enableDedup, dedupReplicaLabel) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isEnableDedup(enableDedup, dedupReplicaLabel) { | |
if enableDedup && len(dedupReplicaLabel) > 0 { |
cmd/thanos/compact.go
Outdated
@@ -463,3 +479,7 @@ func generateIndexCacheFile( | |||
} | |||
return nil | |||
} | |||
|
|||
func isEnableDedup(enableDedup bool, dedupReplicaLabel string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider getting rid of this one line function, IMO no point in abstracting single line of code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can remove this function.
@@ -665,7 +665,7 @@ type RetryError struct { | |||
err error | |||
} | |||
|
|||
func retry(err error) error { | |||
func Retry(err error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be exported method? If yes add comment, but I would keep it internal if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this function is used in another package scope, I need to change it as public. I will add comment for this method.
pkg/compact/dedup/dedup.go
Outdated
"github.com/thanos-io/thanos/pkg/objstore" | ||
) | ||
|
||
type BucketDeduper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing comments on exported structs / methods / functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will review all the codes and see whether I can change some exported structs / methods / functions to interval. If not, I will add comments for them. Thanks.
Hi @smalldirector, At first, thank you for developing this feature. I tested your feature on my test environment and I will keep this running. If I encounter issues, i'll let you known. Used codeI've done a merge with thanos v0.8.1. Before De-duplicationI used During De-duplicationFor both prometheus setups I noticed a lot of warn message of the kind
De-duplication of resolution 0 (Raw-files) and resolution 300000 was done without warning or error messages. After De-duplicationAfter De-duplication was succeeded, the downsampling algorithm had starting to create new blocks for 300000 (A third block for 300000). I have no idea why that was happening. Maybe you or an other developer knows more. I interrupted this process and deleted all blocks of resolution 300000 and 3600000. After this I restarted compactor with downsampling again. After downsampling overnight everything looks nearly perfect. Proposal to change from user perspective
|
This would be just an awesome thing to have. Looking forward to it! |
0031d8e
to
cfb819c
Compare
@Reamer The logic for throwing the warning log |
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: shuaizhang <shuaizhang@ebay.com>
Signed-off-by: Alan Zhang <shuai.xyz@gmail.com>
ae51c4d
to
f37dac1
Compare
@Reamer Merged the latest codes and fixed the issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for maintaining it!
I think it's a lot of code to maintain and support but there is a way to simplify this by reusing more from Prometheus.
Yes, we need this grouping code which you did which is amazing, but for deduping we might need to just:
- Abstract in Prometheus chunks.MergeOverlappingChunks(chks) and allow injecting custom
(chks []Meta) ([]Meta, error) {
mergers. This requires a change in upstream which I think Prometheus maintainers will accept (: - Implement merger bases on our deduplication logic we have here.
What do you think? (: Would love to have this in Thanos finally ❤️
Thanks for the suggestion. |
This should be fine as we hide downsampling chunks under common Prometheus chunk interface (: |
Can anyone help make this change in Prometheus upstream? |
Happy New Year!
In what sense help? I can help, but it's as easy as proposing a change to injecting interface/function for |
Hi, I get a merge conflict with v0.10.1. @smalldirector Can you please rebase your code to current master. Thanks for your work Reamer |
@Reamer @smalldirector We looked in details and we believe we can do 90% of this PR by just reusing existing components (with small upstream changes). Cc @metalmatze We will spend some time next week on this to show what we mean (: No work will be waste as grouping etc can be used from the code here! |
@bwplotka curious, would you just add into this PR and add move the feature forward or would this PR be merged in and modified? we would like to see this feature in place as we see great value add. @smalldirector thank you for continuing work on this. |
The feature is literally 100 lines of code if we would integrate all with TSDB code and reuse more, so I would say another PR. Are you ok with this @smalldirector ? Looks like Shuai is inactive as well recently. Thank you for your hard work on this, I think we learned a lot with this implementation. Will close for now. cc @metalmatze who is working on this, this week. |
waiting for this feature, @bwplotka is there another PR already? |
@metalmatze is working on the first iteration, I am preparing interface upstream, maybe as part of this: prometheus/prometheus#5882 |
@bwplotka @metalmatze is there an update on this one? |
Yes, we are close to enable vertical compaction (which is a bit different than offline): #2250 Then for offline indeed it requires prometheus/prometheus#6990 which is cloooooooose to be done (: |
Changes
As the discussion in #1014, implement a new feature in compactor which provides offline deduplication function for the data from different replicas.
The offline deduplication follow the same design with query deduplication. The user needs to specify replica label by config
dedup.replica-label
before enable it, and the function uses current query deduplication algorithm(penalty algorithm
) to merge data points come from different replicas.The offline deduplication function is based on bucket level in remote storage, so the user needs to ensure that all replica data write to same bucket.
Verification
Below figures are the comparison for one sample metrics before dedup and after dedup. It defines one interval replica label
_agg_replica_
to represent the merged data.For below figure, each of block size is around 3GB and we have two replicas for each block. As it is using streaming read/write way to operate block, so no OOM exception happens.