-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E Tests: Improve Prometheus configurability and make query tests more resilient #5181
E2E Tests: Improve Prometheus configurability and make query tests more resilient #5181
Conversation
- Move prom config method to shared e2ethanos package - Make scraping Prometheus instance optional Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tackling test flakiness! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽 thanks! just minor optional nit - up to you
Signed-off-by: Matej Gera <matejgera@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 let's fix the conflicts
Done, should be good to go now! |
Changes
I have noticed that some of the query E2E tests are prone to flakiness, I've seen a couple CI failures (e.g. https://github.com/thanos-io/thanos/runs/5262364980?check_suite_focus=true#step:5:383) as well as failed runs locally.
The tests fail due to an out of bounds error upon trying to remote write, excerpt:
I suspect this is due to the fact that in some test cases, we're remote writing time series with timestamps in the past. However, since with the current default Prometheus configuration, Prometheus is configured to scrape itself as well. I suspect that on occasion, if Prometheus is scraped before remote write, this makes the minimum valid time to be set to 'now', which causes the remote write requests with past timestamps to return out of bounds error.
The change to mitigate here is to make the self-scraping of the test Prometheus instance optional, as for these mentioned test cases this might interfere with remote write requests. Additionally, I moved the Prometheus config method to
e2ethanos
package from the query test file, since it's used across multiple E2E tests and fits better there. Lastly, I bumped slightly the wait time forminio
to be ready, since I'm still seeing on occasion error whenminio
is not ready to accept requests.Verification
Ran E2E locally multiple times with success (as opposed to hitting out of bounds error before).