Trace ID aware load-balancing exporter - 2/4 #1349

jpkrohling · 2020-10-22T13:17:38Z

Description:

Added the consistent hash ring to the load balancing exporter.

Link to tracking Issue: Partially solves open-telemetry/opentelemetry-collector#1724, next step after #1348

Testing: unit tests

Documentation: godoc

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

jpkrohling · 2020-10-22T13:18:32Z

@owais This PR is only for the second commit, the first commit is being handled in another PR.

jpkrohling · 2020-10-28T16:54:36Z

PR rebased, this is ready for review.

jpkrohling · 2020-10-28T19:31:36Z

@owais, @bogdandrutu could you please review?

codecov · 2020-10-28T19:44:08Z

Codecov Report

Merging #1349 into master will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1349      +/-   ##
==========================================
+ Coverage   88.79%   88.84%   +0.05%     
==========================================
  Files         341      342       +1     
  Lines       16679    16739      +60     
==========================================
+ Hits        14810    14872      +62     
+ Misses       1403     1402       -1     
+ Partials      466      465       -1

Flag	Coverage Δ
#integration	`70.80% <ø> (ø)`
#unit	`88.12% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...porter/loadbalancingexporter/consistent_hashing.go	`100.00% <100.00%> (ø)`
receiver/k8sclusterreceiver/watcher.go	`97.64% <0.00%> (+2.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b51650...e201142. Read the comment docs.

exporter/loadbalancingexporter/consistent_hashing.go

jpkrohling · 2020-10-29T13:54:43Z

ping @owais, @bogdandrutu

bogdandrutu

It feels to me that this is too complicated for what you try to achieve.

I would first do this with a simple slice/array:

Max buckets/entries power of 2 to simplify the search.
- hash(id) & (maxBuckets - 1) gives you the id of the bucket/entry in the list
you have N endpoints with weight
- You can simply allocate continuous blocks:
- Total weight += for(i in endpoints) weight[i]
- Each enpoint has max_buckets / total_weight * weight[i]
- Use an interval tree (can be implemented as a slice), and determine the interval where the trace_id position is.

exporter/loadbalancingexporter/consistent_hashing.go

jpkrohling · 2020-10-29T19:33:11Z

hash(id) & (maxBuckets - 1) gives you the id of the bucket/entry in the list

If I'm understanding you proposal correctly, an even better solution would be to use jumphash then. It would be easier on memory and is O(1), from what I remember. I discussed this in the original proposal with @joe-elliott, and the problem with both approaches is that a change in the list of backends would result in the entire spectrum to be shuffled, which is what we try to avoid with this implementation following Karger et al. The strongest reason to use Karger et al is that only 1/n of the spectrum will be changed, where N is the number of backends. Given that this is part of the scalable tail-based sampling solution, we should really, really avoid impacting all the trace IDs.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

bogdandrutu · 2020-10-29T20:47:45Z

@jpkrohling this is a very interesting point, I would like to understand what is the behavior we want in case of an update of the backend list. Maybe document that somewhere in the readme would be a good starting point in order to understand the best solution.

jpkrohling · 2020-10-29T21:23:19Z

It is already part of the readme:

This load balancer is especially useful for backends configured with tail-based samplers, which make a decision based on the view of the full trace.

When a list of backends is updated, around 1/n of the space will be changed, so that the same trace ID might be directed to a different backend, where n is the number of backends. This should be stable enough for most cases, and the higher the number of backends, the less disruption it should cause. Still, if routing stability is important for your use case and your list of backends are constantly changing, consider using the groupbytrace processor. This way, traces are dispatched atomically to this exporter, and the same decision about the backend is made for the trace as a whole.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/master/exporter/loadbalancingexporter/README.md

jpkrohling · 2020-10-30T15:48:03Z

@bogdandrutu are there any further concerns? I would suggest going with this implementation, as it's what people have read in the proposal already and are expecting from this feature. If we see that this isn't suitable for a general use case, we can change just this part afterwards.

For instance: if we think that in most of the scenarios the list of backends is going to be static, we can add a jumphash implementation of the hashring, or a simple traceID mod N. My belief so far is that people are going to use collectors on elastic environments, like Kubernetes with collectors scaled via HPA, so that the number of backends will vary over time.

jpkrohling · 2020-11-02T10:10:45Z

ping @bogdandrutu, @owais, @tigrannajaryan

annanay25

Reviewed the consistent hash ring, great implementation @jpkrohling!

* Added the backend resolver * Added the metrics definitions **Link to tracking Issue:** Partially solves open-telemetry/opentelemetry-collector#1724, next step after #1349 **Testing:** unit tests **Documentation:** godoc Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

…ngrade (#1349) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

jpkrohling requested a review from a team October 22, 2020 13:17

github-actions bot assigned owais Oct 22, 2020

jpkrohling mentioned this pull request Oct 26, 2020

Trace ID aware load-balancing exporter #1231

Closed

jpkrohling force-pushed the jpkrohling/1724-loadbalancing-processor-2 branch from c0a9993 to f21cd37 Compare October 28, 2020 16:53

bogdandrutu reviewed Oct 28, 2020

View reviewed changes

exporter/loadbalancingexporter/consistent_hashing.go Outdated Show resolved Hide resolved

jpkrohling mentioned this pull request Oct 28, 2020

Flaky test - awsxrayreceiver / TestSocketReadIrrecoverableNetError #1411

Closed

jpkrohling requested a review from bogdandrutu October 28, 2020 20:58

jpkrohling mentioned this pull request Oct 29, 2020

Trace ID aware load-balancing exporter - 3/4 #1412

Merged

bogdandrutu reviewed Oct 29, 2020

View reviewed changes

exporter/loadbalancingexporter/consistent_hashing.go Outdated Show resolved Hide resolved

exporter/loadbalancingexporter/consistent_hashing.go Outdated Show resolved Hide resolved

jpkrohling force-pushed the jpkrohling/1724-loadbalancing-processor-2 branch from 4a60c67 to a1a9376 Compare October 29, 2020 19:35

Trace ID aware load-balancing exporter - 2/4

e201142

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling force-pushed the jpkrohling/1724-loadbalancing-processor-2 branch from a1a9376 to e201142 Compare October 29, 2020 19:40

jpkrohling requested a review from bogdandrutu October 29, 2020 19:40

annanay25 approved these changes Nov 2, 2020

View reviewed changes

bogdandrutu approved these changes Nov 2, 2020

View reviewed changes

bogdandrutu merged commit c742efa into open-telemetry:master Nov 2, 2020

jpkrohling mentioned this pull request Nov 4, 2020

Revisit "Contributing Guide" open-telemetry/opentelemetry-collector#1952

Closed

dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021

Partially revert #1336 to check the theory for carbon performance dow…

7066029

…ngrade (#1349) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace ID aware load-balancing exporter - 2/4 #1349

Trace ID aware load-balancing exporter - 2/4 #1349

jpkrohling commented Oct 22, 2020 •

edited

Loading

jpkrohling commented Oct 22, 2020

jpkrohling commented Oct 28, 2020

jpkrohling commented Oct 28, 2020

codecov bot commented Oct 28, 2020 •

edited

Loading

jpkrohling commented Oct 29, 2020

bogdandrutu left a comment

jpkrohling commented Oct 29, 2020 •

edited

Loading

bogdandrutu commented Oct 29, 2020

jpkrohling commented Oct 29, 2020

jpkrohling commented Oct 30, 2020 •

edited

Loading

jpkrohling commented Nov 2, 2020

annanay25 left a comment

Trace ID aware load-balancing exporter - 2/4 #1349

Trace ID aware load-balancing exporter - 2/4 #1349

Conversation

jpkrohling commented Oct 22, 2020 • edited Loading

jpkrohling commented Oct 22, 2020

jpkrohling commented Oct 28, 2020

jpkrohling commented Oct 28, 2020

codecov bot commented Oct 28, 2020 • edited Loading

Codecov Report

jpkrohling commented Oct 29, 2020

bogdandrutu left a comment

Choose a reason for hiding this comment

jpkrohling commented Oct 29, 2020 • edited Loading

bogdandrutu commented Oct 29, 2020

jpkrohling commented Oct 29, 2020

jpkrohling commented Oct 30, 2020 • edited Loading

jpkrohling commented Nov 2, 2020

annanay25 left a comment

Choose a reason for hiding this comment

jpkrohling commented Oct 22, 2020 •

edited

Loading

codecov bot commented Oct 28, 2020 •

edited

Loading

jpkrohling commented Oct 29, 2020 •

edited

Loading

jpkrohling commented Oct 30, 2020 •

edited

Loading