Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: Optimized receive; fixed long term overallocation #3334

Closed
wants to merge 5 commits into from

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Oct 17, 2020

Fixes: #3265

Changes:

  • Added benchmarks
  • Leveraged AddFast and removed debug prints on each sample
  • Fixed over alloc by using normal Labels for remote write. After benchmarks, it's clear it does not give much.
  • Optimized hash function.

Overall before vs after:

✗ benchstat -delta-test=none _dev/diffa.out _dev/difff.out
name                                                                                   old time/op    new time/op    delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                         21.7µs ± 0%    20.5µs ± 0%    -5.52%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12            35.0µs ± 0%    30.7µs ± 0%   -12.20%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                        72.4ms ± 0%    31.9ms ± 0%   -55.94%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12            1.09s ± 0%     0.02s ± 0%   -98.61%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                           26.5µs ± 0%    38.0µs ± 0%   +43.61%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12              38.0µs ± 0%    27.6µs ± 0%   -27.44%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                           140ms ± 0%      28ms ± 0%   -80.27%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12              1.30s ± 0%     0.01s ± 0%   -98.93%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                 18.4ms ± 0%    16.7ms ± 0%    -8.97%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12     162ms ± 0%      15ms ± 0%   -90.71%

name                                                                                   old alloc/op   new alloc/op   delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                         6.70kB ± 0%    6.32kB ± 0%    -5.76%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12            15.7kB ± 0%     9.2kB ± 0%   -41.22%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                        17.4MB ± 0%    15.3MB ± 0%   -12.22%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12            639MB ± 0%      21MB ± 0%   -96.71%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                           10.9kB ± 0%     8.8kB ± 0%   -19.57%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12              20.4kB ± 0%    11.7kB ± 0%   -42.73%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                           228MB ± 0%      16MB ± 0%   -93.07%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12              895MB ± 0%      21MB ± 0%   -97.65%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                 32.5MB ± 0%    33.0MB ± 0%    +1.33%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12    79.8MB ± 0%    33.0MB ± 0%   -58.66%

name                                                                                   old allocs/op  new allocs/op  delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                           48.0 ± 0%      66.0 ± 0%   +37.50%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12               126 ± 0%       106 ± 0%   -15.87%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                         5.12k ± 0%     5.13k ± 0%    +0.23%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12            4.00M ± 0%     0.00M ± 0%  -100.00%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                             50.0 ± 0%      67.0 ± 0%   +34.00%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12                 127 ± 0%       108 ± 0%   -14.96%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                            105k ± 0%        5k ± 0%   -95.13%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12              4.10M ± 0%     0.00M ± 0%  -100.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                   55.0 ± 0%      46.0 ± 0%   -16.36%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12       119 ± 0%        87 ± 0%   -26.89%

NOTE that HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_of_samples/OK-12 and error, were exploding (too slow), so can only compare with Add Fast method:

 HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK-12                    5.10k ± 0%     5.10k ± 0%   +0.02%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors-12        500k ± 0%        0k ± 0%  -99.97%

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
GOROOT=/home/bwplotka/.gvm/gos/go1.15 #gosetup
GOPATH=/home/bwplotka/Repos/thanosgopath #gosetup
/home/bwplotka/.gvm/gos/go1.15/bin/go test -c -o /tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive github.com/thanos-io/thanos/pkg/receive #gosetup
/tmp/___BenchmarkHandlerReceiveHTTP_in_github_com_thanos_io_thanos_pkg_receive -test.v -test.bench ^\QBenchmarkHandlerReceiveHTTP\E$ -test.run ^$
goos: linux
goarch: amd64
pkg: github.com/thanos-io/thanos/pkg/receive
BenchmarkHandlerReceiveHTTP
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_single_sample
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12      	   66613	     17432 ns/op	    6382 B/op	      47 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12         	   65179	     20922 ns/op	    9392 B/op	      90 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                     	      51	  22250679 ns/op	14286524 B/op	    5118 allocs/op
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors
BenchmarkHandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12        	      31	  47359942 ns/op	45323821 B/op	  500126 allocs/op
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                        	   33147	     30462 ns/op	   10578 B/op	      49 allocs/op
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12           	   53385	     24087 ns/op	   13606 B/op	      92 allocs/op
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                       	      49	  24593572 ns/op	15436524 B/op	    5141 allocs/op
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors
BenchmarkHandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12          	      25	  52976766 ns/op	45329167 B/op	  500129 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12              	      60	  27261861 ns/op	32538875 B/op	      54 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12 	      78	  15315152 ns/op	32540624 B/op	      97 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK-12                	      33	  35595450 ns/op	47171972 B/op	    5100 allocs/op
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors
BenchmarkHandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors-12   	      21	  58917325 ns/op	78629564 B/op	  500125 allocs/op
PASS

Process finished with exit code 0


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Tried DeepCopy but not worth it:

```
benchstat -delta-test=none _dev/diffc.out _dev/diffd.out
name                                                                                   old time/op    new time/op    delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                         18.5µs ± 0%    17.8µs ± 0%   -3.89%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12            22.5µs ± 0%    20.9µs ± 0%   -7.03%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                        25.0ms ± 0%    24.3ms ± 0%   -2.62%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12           54.2ms ± 0%    52.3ms ± 0%   -3.35%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                           33.9µs ± 0%    26.6µs ± 0%  -21.46%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12              24.3µs ± 0%    24.6µs ± 0%   +1.43%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                          24.4ms ± 0%    24.8ms ± 0%   +1.36%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12             52.4ms ± 0%    48.8ms ± 0%   -7.03%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                 15.2ms ± 0%    15.0ms ± 0%   -1.37%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12    15.9ms ± 0%    15.2ms ± 0%   -4.08%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK-12                   38.7ms ± 0%    39.6ms ± 0%   +2.28%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors-12      67.7ms ± 0%    64.8ms ± 0%   -4.34%

name                                                                                   old alloc/op   new alloc/op   delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                         7.35kB ± 0%    7.66kB ± 0%   +4.25%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12            10.4kB ± 0%    10.7kB ± 0%   +2.94%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                        14.9MB ± 0%    15.3MB ± 0%   +2.56%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12           45.3MB ± 0%    45.3MB ± 0%   -0.00%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                           11.9kB ± 0%    12.2kB ± 0%   +2.79%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12              14.9kB ± 0%    15.2kB ± 0%   +2.18%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                          15.9MB ± 0%    17.7MB ± 0%  +11.38%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12             45.3MB ± 0%    45.3MB ± 0%   -0.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                 43.0MB ± 0%    43.0MB ± 0%   -0.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12    43.0MB ± 0%    43.0MB ± 0%   -0.01%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK-12                   61.2MB ± 0%    60.8MB ± 0%   -0.72%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors-12      89.1MB ± 0%    89.1MB ± 0%   -0.00%

name                                                                                   old allocs/op  new allocs/op  delta
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/OK-12                           67.0 ± 0%      68.0 ± 0%   +1.49%
HandlerReceiveHTTP/typical_labels_under_1KB,_single_sample/conflict_errors-12               110 ± 0%       111 ± 0%   +0.91%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/OK-12                         5.14k ± 0%     5.14k ± 0%   +0.02%
HandlerReceiveHTTP/typical_labels_under_1KB,_2MB_of_samples/conflict_errors-12             500k ± 0%      500k ± 0%   -0.00%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/OK-12                             69.0 ± 0%      70.0 ± 0%   +1.45%
HandlerReceiveHTTP/bigger_labels_over_1KB,_single_sample/conflict_errors-12                 112 ± 0%       113 ± 0%   +0.89%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/OK-12                           5.17k ± 0%     5.18k ± 0%   +0.17%
HandlerReceiveHTTP/bigger_labels_over_1KB,_2MB_of_samples/conflict_errors-12               500k ± 0%      500k ± 0%   +0.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/OK-12                   57.0 ± 0%      57.0 ± 0%    0.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_single_sample/conflict_errors-12       100 ± 0%       100 ± 0%    0.00%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/OK-12                    5.11k ± 0%     5.11k ± 0%   -0.08%
HandlerReceiveHTTP/extremely_large_label_value_10MB,_2MB_samples/conflict_errors-12        500k ± 0%      500k ± 0%    0.00%
```
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

Probably we can find more optimization, so far I can see this (inuse):
image

(allocs):
image

@bwplotka bwplotka requested a review from yeya24 October 17, 2020 19:51
if len(tenant) == 0 {
http.Error(w, "no tenant ID supplied", http.StatusBadRequest)
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this if and line 319?

terrors "github.com/prometheus/prometheus/tsdb/errors"
"github.com/thanos-io/thanos/pkg/runutil"
"github.com/thanos-io/thanos/pkg/store/labelpb"
"github.com/thanos-io/thanos/pkg/testutil"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please group order these imports.

"golang.org/x/sync/errgroup"

"github.com/thanos-io/thanos/pkg/runutil"
"github.com/thanos-io/thanos/pkg/store/labelpb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to change this.

if r.ContentLength >= 0 {
compressed.Grow(int(r.ContentLength))
}
_, err := io.Copy(compressed, r.Body)
Copy link
Contributor

@yeya24 yeya24 Oct 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious how much does it optimize? It is not very clear compared to a simple one line ioutil.Readall

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will comment (:

@@ -524,7 +534,7 @@ func (h *Handler) fanoutForward(pctx context.Context, tenant string, replicas ma
b.attempt++
dur := h.expBackoff.ForAttempt(b.attempt)
b.nextAllowed = time.Now().Add(dur)
level.Debug(h.logger).Log("msg", "target unavailable backing off", "for", dur)
level.Debug(h.logger).Log(append(logTags, "msg", "msg", "target unavailable backing off", "for", dur)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
level.Debug(h.logger).Log(append(logTags, "msg", "msg", "target unavailable backing off", "for", dur)...)
level.Debug(h.logger).Log(append(logTags, "msg", "target unavailable backing off", "for", dur)...)

@kakkoyun kakkoyun changed the title received: Optimized receive; fixed long term overallocation receive: Optimized receive; fixed long term overallocation Dec 16, 2020
@stale
Copy link

stale bot commented Feb 14, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

receive: v0.16.0-rc.0 regression. Memory leak.
3 participants