-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trace ID aware load-balancing exporter - 2/4 #1349
Trace ID aware load-balancing exporter - 2/4 #1349
Conversation
@owais This PR is only for the second commit, the first commit is being handled in another PR. |
c0a9993
to
f21cd37
Compare
PR rebased, this is ready for review. |
@owais, @bogdandrutu could you please review? |
Codecov Report
@@ Coverage Diff @@
## master #1349 +/- ##
==========================================
+ Coverage 88.79% 88.84% +0.05%
==========================================
Files 341 342 +1
Lines 16679 16739 +60
==========================================
+ Hits 14810 14872 +62
+ Misses 1403 1402 -1
+ Partials 466 465 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ping @owais, @bogdandrutu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels to me that this is too complicated for what you try to achieve.
I would first do this with a simple slice/array:
- Max buckets/entries power of 2 to simplify the search.
hash(id) & (maxBuckets - 1)
gives you the id of the bucket/entry in the list
- you have N endpoints with weight
- You can simply allocate continuous blocks:
- Total weight += for(i in endpoints) weight[i]
- Each enpoint has max_buckets / total_weight * weight[i]
- Use an interval tree (can be implemented as a slice), and determine the interval where the trace_id position is.
If I'm understanding you proposal correctly, an even better solution would be to use jumphash then. It would be easier on memory and is O(1), from what I remember. I discussed this in the original proposal with @joe-elliott, and the problem with both approaches is that a change in the list of backends would result in the entire spectrum to be shuffled, which is what we try to avoid with this implementation following Karger et al. The strongest reason to use Karger et al is that only 1/n of the spectrum will be changed, where N is the number of backends. Given that this is part of the scalable tail-based sampling solution, we should really, really avoid impacting all the trace IDs. |
4a60c67
to
a1a9376
Compare
Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
a1a9376
to
e201142
Compare
@jpkrohling this is a very interesting point, I would like to understand what is the behavior we want in case of an update of the backend list. Maybe document that somewhere in the readme would be a good starting point in order to understand the best solution. |
It is already part of the readme:
|
@bogdandrutu are there any further concerns? I would suggest going with this implementation, as it's what people have read in the proposal already and are expecting from this feature. If we see that this isn't suitable for a general use case, we can change just this part afterwards. For instance: if we think that in most of the scenarios the list of backends is going to be static, we can add a jumphash implementation of the hashring, or a simple |
ping @bogdandrutu, @owais, @tigrannajaryan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the consistent hash ring, great implementation @jpkrohling!
* Added the backend resolver * Added the metrics definitions **Link to tracking Issue:** Partially solves open-telemetry/opentelemetry-collector#1724, next step after #1349 **Testing:** unit tests **Documentation:** godoc Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
…ngrade (#1349) Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Description:
Link to tracking Issue: Partially solves open-telemetry/opentelemetry-collector#1724, next step after #1348
Testing: unit tests
Documentation: godoc
Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de