-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributor accept multiple HA Tracker pairs in the same request #6278
Distributor accept multiple HA Tracker pairs in the same request #6278
Conversation
48783b3
to
dd9bf96
Compare
@eduardscaueru For me, #6278 looks more good. I feel like it's more readable and it didn't change the function parameters. Can you add an e2e test to make sure |
dd9bf96
to
e945eb4
Compare
@SungJin1212 @friedrichg I added a e2e test for this feature, let me know if it is ok. Regarding the other approach, PR 6279, I feel like it is more efficient in the scenario where multiple timeseries with the same HA pairs come in the same request. Mostly, if this happens, then the client cannot aggregate them and push all samples for each pair, and will have only one sample/datapoint per timeseries, therefore making a call to the KV store for all of them. The precomputed map of valid HA pairs it is efficient since if the pair is already in the map, it won't make another call to the KV store and to check if the timeseries is valid just checks the map in O(1). I know I said in the in the issue that this PR is more suitable, but I would incline more for PR 6279 given the scenario from above. |
@eduardscaueru |
@SungJin1212 The Sorry, I thought that runtime config would be by enabling the CLI flag for it. Could you please guide me on what needs to be done to enable it? |
@eduardscaueru The cortex watches a runtime config, it is applied to the cortex at the runtime periodically. |
@eduardscaueru |
I am thinking when the remote write source is other than a Prometheus one, as described in #6256, and datapoints will be mixed in the same request. |
@SungJin1212 I think I get what you mean now. So after the assertions in the e2e test, the runtime config should override the |
@eduardscaueru |
@SungJin1212 hmmm... then a change to the runtime_config_test.go test should work? |
@eduardscaueru |
pkg/distributor/distributor.go
Outdated
if err != nil { | ||
// discard sample | ||
d.dedupedSamples.WithLabelValues(userID, cluster).Add(float64(len(ts.Samples) + len(ts.Histograms))) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errors should be handled differently (CAS error, TooManyReplicaGroupsError)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to set the status 404 Bad request for the response if there is a timeseries with TooManyReplicaGroupsError?
e945eb4
to
4498e49
Compare
@SungJin1212 Added the experimental flag, but I still could not find how to set the runtime config. I only saw a runtime config for store bucket and I tried to integrate it as in one of query frontend tests but without success. Could please point me to a file or test that does it already? Or an object that I should create? |
4498e49
to
acfddf6
Compare
@eduardscaueru |
@SungJin1212 Thank you for your help. Just wanted to point out that I refactored #6279 to not change any function param and treat errors differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the wait. Looks good to me.
can you add the feature to
weight: 6 |
982b224
to
d06bd61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation makes sense to me. Thanks!
Still need to fix doc |
9402ea7
to
1ea3bae
Compare
Sorry, I forgot to add the |
This is unrelated. Have you rebased latest master? This might help it go through |
Nope. Let me do that. |
@yeya24 Updated to the latest changes. Should I squash the commits? |
…A pairs (cluster, replica) in the same requets/batch. This can be enabled with a new flag, accept_mixed_ha_samples, an will take effect only if accept_ha_samples is set to true. Fixed test by reducing the number of ingesters to 2 and replication factor to 2. Added config reference. Do not remove replica label if cluster label is not present. Added more HA mixed replicas tests with no cluster and replica labels and with cluster label only. Added e2e test for mixed HA samples in the same request. Refactored distributor mixed HA samples logic. Added experimental flag for accept_mixed_ha_samples. Handled ReplicasNotMatchError TooManyReplicaGroupsError differently. Signed-off-by: eduardscaueru <edi_scaueru@yahoo.com>
91e94ec
to
8d37ae9
Compare
@eduardscaueru It is better to keep separate commits for easier review and Github allows us to squash all commits when merging the PR. |
Ack. I assumed the PR is ready to merge and I squashed them, sorry. |
What this PR does:
Added new implementation that makes the distributor accept multiple HA pairs (cluster, replica) in the same requets/batch. This can be enabled with a new flag, accept_mixed_ha_samples, an will take effect only if accept_ha_samples is set to true.
This implementation check every timeseries from the request if it has both cluster and replica labels. If yes, then it checks the KV store to see if it matches with the elected replica. If not, the current timeseries is discarded (not added to the validatedTimeseries) and the rest of the batch moves on.
It also ensures that when the cluster label is missing the replica label is not removed.
Which issue(s) this PR fixes:
Fixes #6256
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]