Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore common label-values in ProxyResponseHeap sort function #6299

Merged

Conversation

wallee94
Copy link
Contributor

@wallee94 wallee94 commented Apr 19, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Change ProxyResponseHeap.Less function to compare only the label-values that are not in both series.
Some components are sorting the response before adding external labels, returning sets that might not be sorted after that. Because common label-values don't make a difference when comparing two label sets, ignoring these common external labels can help sort the response in the heap again.

I described the bug and how to replicate in #6257 (comment)

Verification

I added a few tests for ProxyResponseHeap that I used to replicate the bug and test the changes.

@wallee94 wallee94 force-pushed the change-proxy-response-heap-less-func branch from 258fdcd to 422865b Compare April 20, 2023 00:08
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
@wallee94 wallee94 force-pushed the change-proxy-response-heap-less-func branch from 422865b to d75fbe5 Compare April 20, 2023 00:18
Copy link
Contributor

@fpetkovski fpetkovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a stab at this!

I think a better way would be to exclude external labels from the comparison function since they are the ones that affect the sorting order. Because they are know in advance, we can construct a set of labels once per series request instead of once per comparison. The latter will likely be far more expensive.

@GiedriusS any thoughts on this solution?

pkg/store/proxy_heap_test.go Outdated Show resolved Hide resolved
@douglascamata
Copy link
Contributor

Is this and #6296 achieving the same outcome, but through different means? 🤔

@fpetkovski
Copy link
Contributor

I think we need both for this issue

Signed-off-by: Walther Lee <walthere.lee@gmail.com>
@wallee94
Copy link
Contributor Author

Ah I didn't realize the first time that st.LabelSets() were the external labels in the store. I added a field to store them in advance.
Also, did I break the end-to-end tests? 🤔

@@ -22,6 +22,132 @@ func TestRmLabelsCornerCases(t *testing.T) {
}), labels.Labels{})
}

type testingRespSet struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse the mocks and stubs from the testutil package? You can see an example usage here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find an implementation of respSet in testutil, but I changed it to use one from proxy_head.go.

pkg/store/proxy_heap.go Outdated Show resolved Hide resolved
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
@wallee94 wallee94 requested a review from fpetkovski April 26, 2023 15:20
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
@fpetkovski
Copy link
Contributor

fpetkovski commented Apr 27, 2023

Thanks for the changes 👍

I ran the proxy benchmark locally using go test -bench BenchmarkProxySeries ./pkg/store/... -run none -count 5 -benchmem and noticed that this implementation introduces a significant memory and CPU regression. I would suggest the following changes to reduce the impact a bit: wallee94#1. Here are the results from the main branch, this branch and the proposed changes:

thanos $ benchstat main.out branch.out branch-optimized.out                                           
name \ time/op                                                                          main.out     branch.out    branch-optimized.out
ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-8      126ms ± 3%    250ms ±10%            159ms ± 2%
ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-8     177ms ± 1%    179ms ± 3%            168ms ± 0%
ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-8    13.5ms ± 4%   25.3ms ± 2%           16.5ms ± 2%
ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-8  16.9ms ± 0%   17.6ms ± 3%           17.9ms ± 1%
ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-8   16.4µs ± 2%   16.8µs ± 2%           16.4µs ± 1%
ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-8  7.53µs ± 1%   7.82µs ± 1%           7.74µs ± 2%

name \ alloc/op                                                                         main.out     branch.out    branch-optimized.out
ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-8     83.9MB ± 0%  243.9MB ± 0%           83.9MB ± 0%
ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-8    83.4MB ± 0%   83.4MB ± 0%           83.4MB ± 0%
ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-8    7.64MB ± 0%  23.64MB ± 0%           7.64MB ± 0%
ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-8  8.21MB ± 0%   8.21MB ± 0%           8.21MB ± 0%
ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-8   7.85kB ± 0%   8.81kB ± 0%           8.23kB ± 0%
ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-8  3.52kB ± 0%   3.71kB ± 0%           3.77kB ± 0%

name \ allocs/op                                                                        main.out     branch.out    branch-optimized.out
ProxySeries/1000000SeriesWith1Samples/4_client_with_1_samples,_250000_series_each-8        302 ± 0%  2500311 ± 0%              308 ± 0%
ProxySeries/1000000SeriesWith1Samples/single_client_with_1_samples,_1000000_series-8       152 ± 0%      154 ± 0%              154 ± 0%
ProxySeries/100000SeriesWith100Samples/4_client_with_25_samples,_25000_series_each-8       251 ± 0%   250259 ± 0%              257 ± 0%
ProxySeries/100000SeriesWith100Samples/single_client_with_100_samples,_100000_series-8     132 ± 0%      134 ± 0%              134 ± 0%
ProxySeries/1SeriesWith10000000Samples/4_client_with_2500000_samples,_1_series_each-8      130 ± 0%      146 ± 0%              136 ± 0%
ProxySeries/1SeriesWith10000000Samples/single_client_with_10000000_samples,_1_series-8    78.0 ± 0%     80.0 ± 0%             80.0 ± 0%

If we want to completely the impact of ignoring external labels, we could probably add them in a separate field when sending data from stores, but for now we should be okay with a small regression.

fpetkovski and others added 4 commits April 27, 2023 15:47
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
…-func

Reuse buffers for label comparison
Signed-off-by: Walther Lee <walthere.lee@gmail.com>
@wallee94
Copy link
Contributor Author

Oh wow, I didn't expect so much memory regression. The new changes look great to me, I've merged them and the latest changes in main. Thanks again @fpetkovski !

Signed-off-by: Walther Lee <walthere.lee@gmail.com>
@fpetkovski
Copy link
Contributor

@wallee94 have you checked whether this change solves your problem?

@wallee94
Copy link
Contributor Author

wallee94 commented May 4, 2023

@fpetkovski Yes, I deployed the branch today to our dev cluster. We were seeing duplicated series from the ruler and sidecars, and this change fixes the issue

@fpetkovski fpetkovski merged commit 8d0eb07 into thanos-io:main May 4, 2023
fpetkovski added a commit to fpetkovski/thanos that referenced this pull request Jun 16, 2023
In thanos-io#6299 we changed the comparison
algorithm in the store proxy to ignore store-specific external labels.
This causes problems during long rollouts because old stores and new stores
will sort data differently, leading to incorrectly deduplicated results.

This commit adds a new flag to the store info message to indicate
whether the store sorts data with or without external labels.
The querier uses this flag to decide whether to resort the store response set
before starting a merge.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
fpetkovski added a commit to fpetkovski/thanos that referenced this pull request Jun 16, 2023
In thanos-io#6299 we changed the comparison
algorithm in the store proxy to ignore store-specific external labels.
This causes problems during long rollouts because old stores and new stores
will sort data differently, leading to incorrectly deduplicated results.

This commit adds a new flag to the store info message to indicate
whether the store sorts data with or without external labels.
The querier uses this flag to decide whether to resort the store response set
before starting a merge.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants