Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endpoint for querying time info for lsn #5497

Merged
merged 28 commits into from
Oct 19, 2023
Merged

Add endpoint for querying time info for lsn #5497

merged 28 commits into from
Oct 19, 2023

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Oct 7, 2023

Problem

See #5468.

Summary of changes

Add a new get_timestamp_of_lsn endpoint, returning the timestamp
associated with the given lsn.

Fixes #5468.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@arpad-m arpad-m requested review from a team as code owners October 7, 2023 07:51
@arpad-m arpad-m requested review from shanyp and save-buffer and removed request for a team October 7, 2023 07:51
it isnt really an issue as it's in a test but whatever,
couldn't find ways to disable it for the tests dir.
@github-actions
Copy link

github-actions bot commented Oct 10, 2023

2322 tests run: 2207 passed, 0 failed, 115 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_pageserver_recovery: release

Postgres 14

  • test_competing_branchings_from_loading_race_to_ok_or_err: debug

Code coverage (full report)

  • functions: 52.9% (8339 of 15753 functions)
  • lines: 80.7% (48552 of 60182 lines)

The comment gets automatically updated with the latest test results
f6946e9 at 2023-10-19T02:08:35.574Z :recycle:

@arpad-m
Copy link
Member Author

arpad-m commented Oct 11, 2023

@xenomote I have implemented this. Is this what you were asking for and does it fit your use cases?

@xenomote
Copy link

Is this what you were asking for and does it fit your use cases?

yea seems good to me, I can update the go client once this is merged in 👍

Copy link
Collaborator

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern about this is how large that vector of timestamps will grow, since I don't know how many SLRU blocks we can expect to see. I've asked for some guidance in #postgres -- maybe this is no issue at all, but from the existing code it isn't obvious.

We could avoid storing the whole list if we didn't output the median, so if this becomes a problem in practice we might need to do that.

@shanyp
Copy link
Contributor

shanyp commented Oct 18, 2023

@arpad-m if the median is not a requirement and imposes issues, lets drop it

@arpad-m arpad-m requested a review from knizhnik October 18, 2023 15:37
@arpad-m
Copy link
Member Author

arpad-m commented Oct 18, 2023

This test failure is interesting:

2023-10-18T16:58:07.0716962Z [gw6] linux -- Python 3.9.2 /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/bin/python
2023-10-18T16:58:07.0717704Z test_runner/regress/test_lsn_mapping.py:160: in test_ts_of_lsn_api
2023-10-18T16:58:07.0718462Z     timestamp = datetime.fromisoformat(result).replace(tzinfo=timezone.utc)
2023-10-18T16:58:07.0719183Z E   ValueError: Invalid isoformat string: '2023-10-18T16:50:28.990633000Z'

With:

2023-10-18T16:58:07.0754908Z 2023-10-18 16:50:31.684 INFO [test_lsn_mapping.py:158] result: 2023-10-18T16:50:28.990633000Z, after_ts: 2023-10-18 16:50:28.993508+00:00

On the other hand, on my local box it works, with:

2023-10-18 19:12:50.341 INFO [test_lsn_mapping.py:158] result: 2023-10-18T17:12:49.492124000Z, after_ts: 2023-10-18 17:12:49.501023+00:00

Apparently parsing time strings with Z in their name is a feature that's available since Python 3.11, which I have on my local box, but CI has Python 3.9... python/cpython#92177

pageserver/src/http/openapi_spec.yml Outdated Show resolved Hide resolved
pageserver/src/http/openapi_spec.yml Outdated Show resolved Hide resolved
@arpad-m
Copy link
Member Author

arpad-m commented Oct 19, 2023

For rerun attempt 2 for the latest commit "merge branch 'main' into arpad/ts_by_lsn", of the 6 regress tests, 2 failed.

logs for regress-tests (debug v15):

2023-10-18T19:35:53.6388101Z E   requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:30597/v1/tenant/02e588451d194f16bec8efc9679d0b0e/timeline/7c89ab12c4c81339777a1b96d99fe8c2/get_timestamp_of_lsn?lsn=0/14B6000
2023-10-18T19:35:53.6389524Z 
2023-10-18T19:35:53.6389820Z The above exception was the direct cause of the following exception:
2023-10-18T19:35:53.6390421Z test_runner/regress/test_lsn_mapping.py:153: in test_ts_of_lsn_api
2023-10-18T19:35:53.6390842Z     result = client.timeline_get_timestamp_of_lsn(
2023-10-18T19:35:53.6391315Z test_runner/fixtures/pageserver/http.py:461: in timeline_get_timestamp_of_lsn
2023-10-18T19:35:53.6391737Z     self.verbose_error(res)
2023-10-18T19:35:53.6392057Z test_runner/fixtures/pageserver/http.py:167: in verbose_error
2023-10-18T19:35:53.6392616Z     raise PageserverApiException(msg, res.status_code) from e
2023-10-18T19:35:53.6393259Z E   fixtures.pageserver.http.PageserverApiException

and regress-tests (release v15):

2023-10-18T19:33:42.2903323Z E   requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://localhost:25062/v1/tenant/51e028d4977dcedcef9e40f63a1e4919/timeline/be64fdd2c046133cfe0dbe31255d95a3/get_timestamp_of_lsn?lsn=0/14B6000
2023-10-18T19:33:42.2904239Z 
2023-10-18T19:33:42.2904444Z The above exception was the direct cause of the following exception:
2023-10-18T19:33:42.2904944Z test_runner/regress/test_lsn_mapping.py:153: in test_ts_of_lsn_api
2023-10-18T19:33:42.2905507Z     result = client.timeline_get_timestamp_of_lsn(
2023-10-18T19:33:42.2906091Z test_runner/fixtures/pageserver/http.py:461: in timeline_get_timestamp_of_lsn
2023-10-18T19:33:42.2906608Z     self.verbose_error(res)
2023-10-18T19:33:42.2907010Z test_runner/fixtures/pageserver/http.py:167: in verbose_error
2023-10-18T19:33:42.2907719Z     raise PageserverApiException(msg, res.status_code) from e
2023-10-18T19:33:42.2908244Z E   fixtures.pageserver.http.PageserverApiException

So the same error two times.

The rerun attempt 1 for the same commit showed different failures.

For regress-tests (debug v14):

2023-10-18T18:46:14.1431409Z [gw4] linux -- Python 3.9.2 /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/bin/python
2023-10-18T18:46:14.1432251Z test_runner/regress/test_lsn_mapping.py:168: in test_ts_of_lsn_api
2023-10-18T18:46:14.1432984Z     assert timestamp >= before_timestamp, "before_timestamp before timestamp"
2023-10-18T18:46:14.1433643Z E   AssertionError: before_timestamp before timestamp
2023-10-18T18:46:14.1434736Z E   assert datetime.datetime(2023, 10, 18, 18, 39, 6, 879910, tzinfo=datetime.timezone.utc) >= datetime.datetime(2023, 10, 18, 18, 39, 6, 908394, tzinfo=datetime.timezone.utc)

For regress-tests (debug v15), regress-tests (release v15) we have the 404 error again, and for regress-tests (release v16) we have the assertion failure again:

2023-10-18T18:45:06.5601640Z [gw6] linux -- Python 3.9.2 /github/home/.cache/pypoetry/virtualenvs/neon-_pxWMzVK-py3.9/bin/python
2023-10-18T18:45:06.5606381Z test_runner/regress/test_lsn_mapping.py:168: in test_ts_of_lsn_api
2023-10-18T18:45:06.5608107Z     assert timestamp >= before_timestamp, "before_timestamp before timestamp"
2023-10-18T18:45:06.5608482Z E   AssertionError: before_timestamp before timestamp
2023-10-18T18:45:06.5609027Z E   assert datetime.datetime(2023, 10, 18, 18, 38, 8, 467197, tzinfo=datetime.timezone.utc) >= datetime.datetime(2023, 10, 18, 18, 38, 8, 469206, tzinfo=datetime.timezone.utc)

We can get this PR probably merged by pressing on the "retry" button a few times but the test would still be extremely flaky.

Therefore, I have added a commit that adds a few sleeps and don't consider the first lsn (where we likely encounter the 404). let's see if this helps.

@arpad-m arpad-m merged commit f842b22 into main Oct 19, 2023
@arpad-m arpad-m deleted the arpad/ts_by_lsn branch October 19, 2023 02:50
arpad-m added a commit that referenced this pull request Nov 9, 2023
The `get_timestamp_of_lsn` pageserver endpoint has been added in #5497,
but the yml it added was wrong: the lsn is expected in hex format, not
in integer (decimal) format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API to request the timestamp for given LSN
6 participants