Replies: 2 comments 1 reply
-
Why do you have samples that are scraped only by the standby prometheus? I
think the idea is to have 2 prometheus instances scraping exactly the same
metrics.
The reason why we only accept 1 replica is because TSDB will reject
duplicate samples (metrics with the same timestamp for the same series with
different values).
Alan Diego
…On Mon, Nov 6, 2023 at 1:08 AM aleskxyz ***@***.***> wrote:
Hi,
Cortext selects a leader from the cluster of HA Prometheus to retrieve
samples. Imagine a network partition situation where each Prometheus can
scrape data from some instances. With the current Cortext design, only
samples from the elected Prometheus will be written to long-term storage,
and samples from other Prometheuses will be discarded, resulting in gaps
for samples that are scraped only by the standby Prometheus.
Does Cortext have a solution for this, or can it handle this situation
like Thanos, which deduplicates data at query time?
Thanks.
—
Reply to this email directly, view it on GitHub
<#5633>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6XK4DPC7WEQ4D4SARLFK3YDCSJVAVCNFSM6AAAAAA67E7VSCVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZVHAYTQMBQGI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
So the problem is the "fail over time"?
The default value is 15 seconds its configurable: ha_tracker_update_timeout
Alan Diego
…On Mon, Nov 6, 2023 at 10:33 AM aleskxyz ***@***.***> wrote:
Thanks for your reply!
As I told above, we may see this inconsistency in case of network
partition.
Imagine we have 2 prometheus in 2 different racks that both of them are
scraping all instances.
when internal connection between 2 racks is disrupted, then the active
prometheus cannot scrape resources in the other rack but the local
prometheus of that rack is still working.
Thanks
—
Reply to this email directly, view it on GitHub
<#5633 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6XK4FQKIV477GPNERF4D3YDEUQTAVCNFSM6AAAAAA67E7VSCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TIOJQHEYTE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
Cortext selects a leader from the cluster of HA Prometheus to retrieve samples. Imagine a network partition situation where each Prometheus can scrape data from some instances. With the current Cortext design, only samples from the elected Prometheus will be written to long-term storage, and samples from other Prometheuses will be discarded, resulting in gaps for samples that are scraped only by the standby Prometheus.
Does Cortext have a solution for this, or can it handle this situation like Thanos, which deduplicates data at query time?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions