Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless ruler restores alert state #5230

Merged
merged 4 commits into from
Nov 11, 2022

Conversation

yeya24
Copy link
Contributor

@yeya24 yeya24 commented Mar 13, 2022

Signed-off-by: Ben Ye ben.ye@bytedance.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

The idea here is still reusing the upstream code and implement a queryable for thanos queriers.
That way the prometheus rule manager can get the ALERTS_FOR_STATE series from thanos queriers and restore the state.

Verification

E2E test added.

@yeya24 yeya24 force-pushed the restore-alert-for-state branch 2 times, most recently from 3e2375c to cdc28e3 Compare March 13, 2022 09:33
@yeya24 yeya24 changed the title WIP: stateless ruler restores alert state Stateless ruler restores alert state Mar 14, 2022
@yeya24 yeya24 force-pushed the restore-alert-for-state branch from 2481793 to 0a7e3a2 Compare March 14, 2022 07:47
@GiedriusS
Copy link
Member

It would be great to fix this issue. Is this ready for review?

cmd/thanos/rule.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeya24 is anything still missing from this PR? It's looking good to me overall. Would be also great to run it against the alert compliance test to make sure alerting will now also properly work with stateless ruler.

testutil.Ok(t, e2e.StartAndWaitReady(rulers[1]))

// Wait for 4 * rule evaluation interval to make sure the alert state is restored.
time.Sleep(time.Second * 8)
Copy link
Member

@GiedriusS GiedriusS Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we could use metrics for synchronization instead of hard-coding time duration? My concern is that this will be flaky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I am still looking for which metrics to use to indicate that the alerts states are restored but failed to really find one.

Copy link
Contributor Author

@yeya24 yeya24 Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to wait for 4 times of rule evaluation intervals. Let's see if it is better this time...

// Wait for 4 * rule evaluation interval to make sure the alert state is restored.
time.Sleep(time.Second * 8)
// Wait until the alert is firing on the second ruler.
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using runutil.Repeat here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

logger log.Logger
promClients []*promclient.Client
queryClients []*httpconfig.Client
restoreIgnoreLabels []string
Copy link
Member

@GiedriusS GiedriusS Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make this a bit more generic and rename it to ignoredLabelNames?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -189,6 +190,8 @@ func (qc *queryConfig) registerFlag(cmd extkingpin.FlagClause) *queryConfig {
Default("POST").EnumVar(&qc.httpMethod, "GET", "POST")
cmd.Flag("query.sd-dns-resolver", "Resolver to use. Possible options: [golang, miekgdns]").
Default("golang").Hidden().StringVar(&qc.dnsSDResolver)
cmd.Flag("query.default-step", "Default range query step to use. This is only used in stateless Ruler and alert state restoration.").
Copy link
Member

@GiedriusS GiedriusS Jul 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this so small by default? 🤔

Copy link
Contributor Author

@yeya24 yeya24 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we want to expose this to users or not. The step can be calculated simply by max((maxt - mint) / 250, 1s). But in E2E test we might still want to modify it.

For the default value 1s, I just use the same default query step set in query.go. Do you have any suggested value?

@yeya24
Copy link
Contributor Author

yeya24 commented Jul 12, 2022

@yeya24 is anything still missing from this PR? It's looking good to me overall. Would be also great to run it against the alert compliance test to make sure alerting will now also properly work with stateless ruler.

Thanks for the review Matej. I am still struggling fixing the E2E test and the results are flaky. Besides, unit tests are still missing for the new queryable.

@michael-burt
Copy link

FWIW, I deployed this branch and verified that ALERTS & ALERTS_FOR_STATE are being remote-written by the Ruler.

@sdufel
Copy link
Contributor

sdufel commented Sep 7, 2022

Is there anything left to do here?

@yeya24 yeya24 force-pushed the restore-alert-for-state branch from 24859f5 to a41ea98 Compare October 17, 2022 03:36
@yeya24
Copy link
Contributor Author

yeya24 commented Oct 17, 2022

Sorry for the delay. I will focus on this pr next week to fix the broken E2E test.

@benjaminhuo
Copy link
Contributor

@yeya24 It'll be great if we can release v0.28.2 to include this important fix!

@yeya24 yeya24 force-pushed the restore-alert-for-state branch 3 times, most recently from 1f86600 to d865211 Compare October 31, 2022 08:02
@yeya24
Copy link
Contributor Author

yeya24 commented Oct 31, 2022

@GiedriusS @matej-g This is ready for another review now. E2E test passed finally.

@matej-g
Copy link
Collaborator

matej-g commented Oct 31, 2022

@yeya24 have you also tried to run it through the compliance test?

matej-g
matej-g previously approved these changes Oct 31, 2022
Copy link
Collaborator

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks @yeya24! It would be great to also see results of the compliance test, but this has been opened already long enough and if we still find any discrepancy with compliance, we can iterate on this.

cmd/thanos/rule.go Show resolved Hide resolved
@yeya24
Copy link
Contributor Author

yeya24 commented Nov 1, 2022

This looks good to me, thanks @yeya24! It would be great to also see results of the compliance test, but this has been opened already long enough and if we still find any discrepancy with compliance, we can iterate on this.

Yeah probably let's iterate on this. I tried the compliance test on both stateful ruler and stateless ruler but all got errors like below:

18:57:39 alert_generator_compliance_tester-exec: ------------------------------------------
18:57:39 alert_generator_compliance_tester-exec: The following rule groups failed the API and metrics check:
18:57:39 alert_generator_compliance_tester-exec: Group Name: ZeroFor_SmallFor
18:57:39 alert_generator_compliance_tester-exec: Error 1: error in alert reception
18:57:39 alert_generator_compliance_tester-exec: Group Name: NewAlerts_OrderCheck
18:57:39 alert_generator_compliance_tester-exec: Error 1: error in alert reception
18:57:39 alert_generator_compliance_tester-exec: Group Name: PendingAndFiringAndResolved
18:57:39 alert_generator_compliance_tester-exec: Error 1: error in alert reception
18:57:39 alert_generator_compliance_tester-exec: ------------------------------------------
18:57:39 alert_generator_compliance_tester-exec: The following rule groups faced alert reception issues:
18:57:39 alert_generator_compliance_tester-exec: Group Name: NewAlerts_OrderCheck
18:57:39 alert_generator_compliance_tester-exec: Reason: Missed some alerts that were expected (time is approx)
18:57:39 alert_generator_compliance_tester-exec: 1: Expected time: 2022-11-01T01:46:49.312541133Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 2: Expected time: 2022-11-01T01:46:49.312541133Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 3: Expected time: 2022-11-01T01:46:49.312541133Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 4: Expected time: 2022-11-01T01:46:49.312541133Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 5: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 6: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 7: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 8: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 9: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 10: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 11: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 12: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 13: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 14: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 15: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 16: Expected time: 2022-11-01T01:46:49.293688716Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 17: Expected time: 2022-11-01T01:49:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: false
18:57:39 alert_generator_compliance_tester-exec: 18: Expected time: 2022-11-01T01:50:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 19: Expected time: 2022-11-01T01:51:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 20: Expected time: 2022-11-01T01:52:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 21: Expected time: 2022-11-01T01:53:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 22: Expected time: 2022-11-01T01:49:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: resolved, Resend: false
18:57:39 alert_generator_compliance_tester-exec: 23: Expected time: 2022-11-01T01:50:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 24: Expected time: 2022-11-01T01:51:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 25: Expected time: 2022-11-01T01:52:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 26: Expected time: 2022-11-01T01:53:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 27: Expected time: 2022-11-01T01:51:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: false
18:57:39 alert_generator_compliance_tester-exec: 28: Expected time: 2022-11-01T01:52:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 29: Expected time: 2022-11-01T01:53:13.125Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: Reason: Alerts mismatch while received at right time
18:57:39 alert_generator_compliance_tester-exec: 1: At 2022-11-01T01:54:22.381599466Z, Expected State: resolved, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, Error: mismatch in EndsAt, expected range: [2022-11-01T01:51:13.125Z, 2022-11-01T01:51:43.125Z], got: 2022-11-01T01:53:54.239864092Z
18:57:39 alert_generator_compliance_tester-exec: 2: At 2022-11-01T01:54:22.381599466Z, Expected State: resolved, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, Error: mismatch in EndsAt, expected range: [2022-11-01T01:49:13.125Z, 2022-11-01T01:49:43.125Z], got: 2022-11-01T01:53:54.239864092Z
18:57:39 alert_generator_compliance_tester-exec: 3: At 2022-11-01T01:54:22.392155591Z, Expected State: resolved, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, Error: mismatch in EndsAt, expected range: [2022-11-01T01:49:13.125Z, 2022-11-01T01:49:43.125Z], got: 2022-11-01T01:53:54.239864092Z
18:57:39 alert_generator_compliance_tester-exec: Reason: Unexpected alerts (Example: alerts that we didn't expect OR received outside expected time range OR duplicate alerts)
18:57:39 alert_generator_compliance_tester-exec: 1: At 2022-11-01T01:47:54.275808091Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="two"}, Annotations: {description="This should produce more alerts later"}, StartsAt: 2022-11-01T01:44:24.239864092Z, EndsAt: 2022-11-01T01:51:54.239864092Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22NewAlerts_OrderCheck_Rule1%22%2Crulegroup%3D%22NewAlerts_OrderCheck%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 2: At 2022-11-01T01:47:54.275808091Z, Labels: {alertname="NewAlerts_OrderCheck_Rule1", foo="bar", rulegroup="NewAlerts_OrderCheck", variant="one"}, Annotations: {description="This should produce more alerts later"}, StartsAt: 2022-11-01T01:38:24.239864092Z, EndsAt: 2022-11-01T01:51:54.239864092Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22NewAlerts_OrderCheck_Rule1%22%2Crulegroup%3D%22NewAlerts_OrderCheck%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 3: At 2022-11-01T01:47:54.290132341Z, Labels: {alertname="NewAlerts_OrderCheck_Rule2", alertstate="firing", ba_dum="tss", foo="baz", rulegroup="NewAlerts_OrderCheck"}, Annotations: {description="Based on ALERTS. Old alertname was NewAlerts_OrderCheck_Rule1. foo was bar."}, StartsAt: 2022-11-01T01:44:24.239864092Z, EndsAt: 2022-11-01T01:51:54.239864092Z, GeneratorURL: /graph?g0.expr=%28ALERTS%7Balertname%3D%22NewAlerts_OrderCheck_Rule1%22%2Calertstate%3D%22firing%22%2Cfoo%3D%22bar%22%2Crulegroup%3D%22NewAlerts_OrderCheck%22%2Cvariant%3D%22one%22%7D+%2B+ignoring+%28variant%29+ALERTS%7Balertname%3D%22NewAlerts_OrderCheck_Rule1%22%2Calertstate%3D%22firing%22%2Cfoo%3D%22bar%22%2Crulegroup%3D%22NewAlerts_OrderCheck%22%2Cvariant%3D%22two%22%7D%29+%3D%3D+2&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: Group Name: PendingAndFiringAndResolved
18:57:39 alert_generator_compliance_tester-exec: Reason: Missed some alerts that were expected (time is approx)
18:57:39 alert_generator_compliance_tester-exec: 1: Expected time: 2022-11-01T01:49:29.359412388Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 2: Expected time: 2022-11-01T01:49:29.359412388Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: firing, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 3: Expected time: 2022-11-01T01:50:13.125Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: resolved, Resend: false
18:57:39 alert_generator_compliance_tester-exec: 4: Expected time: 2022-11-01T01:51:13.125Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 5: Expected time: 2022-11-01T01:52:13.125Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 6: Expected time: 2022-11-01T01:53:13.125Z, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: Reason: Alerts mismatch while received at right time
18:57:39 alert_generator_compliance_tester-exec: 1: At 2022-11-01T01:54:32.453025512Z, Expected State: resolved, Labels: {alertname="PendingAndFiringAndResolved_SimpleAlert", foo="bar", rulegroup="PendingAndFiringAndResolved"}, Annotations: {description="SimpleAlert is firing", summary="The value is 19 19"}, Error: mismatch in EndsAt, expected range: [2022-11-01T01:50:13.125Z, 2022-11-01T01:50:43.125Z], got: 2022-11-01T01:54:04.301194476Z
18:57:39 alert_generator_compliance_tester-exec: Group Name: ZeroFor_SmallFor
18:57:39 alert_generator_compliance_tester-exec: Reason: Missed some alerts that were expected (time is approx)
18:57:39 alert_generator_compliance_tester-exec: 1: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 2: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 3: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 4: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 5: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 6: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 7: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 8: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 9: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 10: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 11: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 12: Expected time: 2022-11-01T01:42:21.540960676Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 13: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 14: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 15: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 16: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 17: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 18: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 19: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 20: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 21: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 22: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 23: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: 24: Expected time: 2022-11-01T01:42:21.555753467Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, State: resolved, Resend: true
18:57:39 alert_generator_compliance_tester-exec: Reason: Unexpected alerts (Example: alerts that we didn't expect OR received outside expected time range OR duplicate alerts)
18:57:39 alert_generator_compliance_tester-exec: 1: At 2022-11-01T01:43:26.539105717Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, StartsAt: 2022-11-01T01:35:26.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_ZeroFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 2: At 2022-11-01T01:43:26.549773134Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, StartsAt: 2022-11-01T01:35:56.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_SmallFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+13&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 3: At 2022-11-01T01:44:26.540890551Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, StartsAt: 2022-11-01T01:35:26.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_ZeroFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 4: At 2022-11-01T01:44:26.554866551Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, StartsAt: 2022-11-01T01:35:56.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_SmallFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+13&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 5: At 2022-11-01T01:45:56.558574259Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, StartsAt: 2022-11-01T01:35:26.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_ZeroFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 6: At 2022-11-01T01:45:56.572273884Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, StartsAt: 2022-11-01T01:35:56.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_SmallFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+13&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 7: At 2022-11-01T01:47:56.533435967Z, Labels: {alertname="ZeroFor_SmallFor_ZeroFor", foo="bar", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should immediately fire", template_query_test="Args are: foo bar 99. first_id:101,101:1,102:2,103:3,", template_test="1.049M 1Mi 2m 15s 95.9% 2022-01-25 12:36:43 +0000 UTC"}, StartsAt: 2022-11-01T01:35:26.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_ZeroFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+10&g0.tab=1
18:57:39 alert_generator_compliance_tester-exec: 8: At 2022-11-01T01:47:56.548741675Z, Labels: {alertname="ZeroFor_SmallFor_SmallFor", ba_dum="tss", rulegroup="ZeroFor_SmallFor"}, Annotations: {description="This should fire after an interval", template_test="This Part IS TESTING the strings. ::1 127.0.0.1. 7815. replaced text. ."}, StartsAt: 2022-11-01T01:35:56.508362253Z, EndsAt: 2022-11-01T01:38:26.508362253Z, GeneratorURL: /graph?g0.expr=%7B__name__%3D%22alert_generator_test_suite%22%2Calertname%3D%22ZeroFor_SmallFor_SmallFor%22%2Crulegroup%3D%22ZeroFor_SmallFor%22%7D+%3E+13&g0.tab=1
    compatibility_test.go:254: compatibility_test.go:254:
        
         unexpected error: exit status 1

Need more investigation on this.

@benjaminhuo
Copy link
Contributor

benjaminhuo commented Nov 10, 2022

Can we merge this PR now? @yeya24 @GiedriusS @matej-g

matej-g
matej-g previously approved these changes Nov 10, 2022
Copy link
Collaborator

@matej-g matej-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just fix the changelog conflict and get this in, cc @yeya24

Ben Ye and others added 3 commits November 10, 2022 09:19
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the restore-alert-for-state branch from 6d3d6fb to b3b97d7 Compare November 10, 2022 17:20
@yeya24
Copy link
Contributor Author

yeya24 commented Nov 11, 2022

I will merge it then and follow up later.

@yeya24 yeya24 merged commit 7d2585d into thanos-io:main Nov 11, 2022
@yeya24 yeya24 deleted the restore-alert-for-state branch November 11, 2022 08:39
frezes pushed a commit to frezes/thanos that referenced this pull request Nov 11, 2022
* stateless ruler restores alert state

Signed-off-by: Ben Ye <benye@amazon.com>

* update e2e

Signed-off-by: Ben Ye <benye@amazon.com>

* update compatibility test

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

Signed-off-by: Ben Ye <benye@amazon.com>
Co-authored-by: Ben Ye <ben.ye@bytedance.com>
@ahurtaud
Copy link
Contributor

Hello,

I am not able to restore alert state from my setup.

After looking at e2e tests from the PR I suspect that the test TestStatelessRulerAlertStateRestore is not working. To me, the value (epoch timestamp) of the ALERTS_FOR_STATE metric was never retrieved properly, neither by ruler-1 or ruler-2. Am I correct?

Does anyone succeeded to restore their alert state?
Thank you,

logs from e2e:

17:26:54 querier-1: level=warn name=querier-1 ts=2022-11-10T17:26:54.745559199Z caller=proxy.go:282 component=proxy request="min_time:1668100914744 max_time:1668101214744 matchers:<name:\"__name__\" value:\"ALERTS_FOR_STATE\" > aggregates:COUNT aggregates:SUM " err="No StoreAPIs matched for this query" stores=
17:26:59 querier-1: level=warn name=querier-1 ts=2022-11-10T17:26:59.746848487Z caller=proxy.go:282 component=proxy request="min_time:1668100919745 max_time:1668101219745 matchers:<name:\"__name__\" value:\"ALERTS_FOR_STATE\" > aggregates:COUNT aggregates:SUM " err="No StoreAPIs matched for this query" stores="store Addr: stateless-state-receive-1:9091 LabelSets: {receive=\"receive-1\", tenant_id=\"default-tenant\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1668100919745,1668101219745]. Store time ranges: [9223372036854775807,9223372036854775807]"

@ahurtaud
Copy link
Contributor

Hello,

I am not able to restore alert state from my setup.

After looking at e2e tests from the PR I suspect that the test TestStatelessRulerAlertStateRestore is not working. To me, the value (epoch timestamp) of the ALERTS_FOR_STATE metric was never retrieved properly, neither by ruler-1 or ruler-2. Am I correct?

Does anyone succeeded to restore their alert state? Thank you,

logs from e2e:

17:26:54 querier-1: level=warn name=querier-1 ts=2022-11-10T17:26:54.745559199Z caller=proxy.go:282 component=proxy request="min_time:1668100914744 max_time:1668101214744 matchers:<name:\"__name__\" value:\"ALERTS_FOR_STATE\" > aggregates:COUNT aggregates:SUM " err="No StoreAPIs matched for this query" stores=
17:26:59 querier-1: level=warn name=querier-1 ts=2022-11-10T17:26:59.746848487Z caller=proxy.go:282 component=proxy request="min_time:1668100919745 max_time:1668101219745 matchers:<name:\"__name__\" value:\"ALERTS_FOR_STATE\" > aggregates:COUNT aggregates:SUM " err="No StoreAPIs matched for this query" stores="store Addr: stateless-state-receive-1:9091 LabelSets: {receive=\"receive-1\", tenant_id=\"default-tenant\"} Mint: 9223372036854775807 Maxt: 9223372036854775807 filtered out: does not have data within this time period: [1668100919745,1668101219745]. Store time ranges: [9223372036854775807,9223372036854775807]"

Edit, it works!
Finding the right --restore-ignored-label flags are not easy, but once they're okay, the feature just works! Shout out to @yeya24 for implementation <3 !
Regarding the e2e tests I still think there might be an issue TBC.
I am also facing a Panic while opening the ruler /alerts page. I will open an issue on this on friday.

ngraham20 pushed a commit to ngraham20/thanos that referenced this pull request May 18, 2023
* stateless ruler restores alert state

Signed-off-by: Ben Ye <benye@amazon.com>

* update e2e

Signed-off-by: Ben Ye <benye@amazon.com>

* update compatibility test

Signed-off-by: Ben Ye <benye@amazon.com>

* update changelog

Signed-off-by: Ben Ye <benye@amazon.com>

Signed-off-by: Ben Ye <benye@amazon.com>
Co-authored-by: Ben Ye <ben.ye@bytedance.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants