Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_ts_of_lsn_api flakyness #5768

Closed
koivunej opened this issue Nov 2, 2023 · 6 comments
Closed

test_ts_of_lsn_api flakyness #5768

koivunej opened this issue Nov 2, 2023 · 6 comments
Assignees
Labels
a/test/flaky Area: related to flaky tests a/test Area: related to testing c/storage/pageserver Component: storage: pageserver

Comments

@koivunej
Copy link
Member

koivunej commented Nov 2, 2023

I've seen this once: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-5756/6730946357/index.html#suites/2568f9ab62eb9e71321fab6263eed23e/269955fd321927a9:

test_runner/regress/test_lsn_mapping.py:236: in test_ts_of_lsn_api
    assert timestamp >= before_timestamp, "before_timestamp before timestamp"
E   AssertionError: before_timestamp before timestamp
E   assert datetime.datetime(2023, 11, 2, 10, 13, 28, 106532, tzinfo=datetime.timezone.utc) >= datetime.datetime(2023, 11, 2, 10, 13, 28, 208426, tzinfo=datetime.timezone.utc)

It does sound like it is non-trivial to fix.

@koivunej koivunej added c/storage/pageserver Component: storage: pageserver a/test Area: related to testing a/test/flaky Area: related to flaky tests labels Nov 2, 2023
@arpad-m
Copy link
Member

arpad-m commented Nov 2, 2023

In the CI run of #5497 the tests were already so flaky that it was more or less impossible to merge the PR. So I lowered the flakiness by adding sleeps. I think one way to reduce the occurence of test failures is to increase the amount slept. See the "Make test more robust" commit in #5497.

@koivunej
Copy link
Member Author

koivunej commented Nov 2, 2023

"Make test more robust" commit

f6946e9 -- difference between timestamps is longer than 5ms, almost 10ms, this could still be related to the sleep, did not check the test.

@jcsp
Copy link
Collaborator

jcsp commented May 2, 2024

In last 3 days this has failed 6 times

@problame
Copy link
Contributor

problame commented May 6, 2024

Test is fundamentally timing-sensitive. Arpad will start thread in #team-storage

arpad-m added a commit that referenced this issue May 6, 2024
Changes parameters to fix the flakiness of `test_ts_of_lsn_api`. Already
now, the amount of flakiness of the test is pretty low. With this, it's
even lower.

cc #5768
@jcsp
Copy link
Collaborator

jcsp commented May 7, 2024

Let's check again in 1 week for ongoing failures

conradludgate pushed a commit that referenced this issue May 8, 2024
Changes parameters to fix the flakiness of `test_ts_of_lsn_api`. Already
now, the amount of flakiness of the test is pretty low. With this, it's
even lower.

cc #5768
@arpad-m
Copy link
Member

arpad-m commented May 20, 2024

14 days after the patch merged: not a single flaky occurence. Last one was at 2024-05-03 14:37:10.

There has been a large gap of no occurences between April 13 and April 26, but we are (slightly) past the size of that gap and therefore, I think this has been resolved. Will revisit if it starts becoming a problem again.

@arpad-m arpad-m closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/test/flaky Area: related to flaky tests a/test Area: related to testing c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

No branches or pull requests

4 participants