Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to /vcf endpoint to force async writes to complete before returning response #75

Merged
merged 8 commits into from
Jan 23, 2024

Conversation

ehclark
Copy link
Contributor

@ehclark ehclark commented Jan 8, 2024

The new Snowflake object store writes VRS objects to the database asynchronously to allow better response times. However, since searches and fetches only read committed database state, this creates the opportunity for reads to not reflect the most recent VRS objects.

The current default behavior, and default behavior moving forward, is for all writes to complete before the response is returned to the client. To enable writes to complete asynchronously after the response has been returned, clients can add the allow_async_writes=yes parameter to the request.

See also #65

jsstevenson
jsstevenson previously approved these changes Jan 11, 2024
with self.batch_thread.cond:
if (
len(
list(filter(lambda x: x is batch, self.batch_thread.pending_batch_list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking that is is appropriate here -- the object doesn't get copied or moved around while being held on the queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be a strict identity check. The empty batch is essentially a sentinel object and batch objects are not modified as they move through the queue.

@@ -127,6 +127,10 @@ def __delitem__(self, name: str) -> None:
cur.execute("DELETE FROM vrs_objects WHERE vrs_id = %s;", [name])
self.conn.commit()

def wait_for_writes(self):
"""Return true once any currently pending database modifications have been completed."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the pass is necessary. Is there an existing issue to do this for postgres backend?

Suggested change
pass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we shouldn't worry about implementing it PG-side until there's a use case/demand for doing something like queueing. I think there's a world in which it could be preferable to retain the PG implementation as a more streamlined read/write option.

That said, it might be good to note in the docstring that this method is just a stub to retain compatibility with the base class, but shouldn't do anything unless we decide to implement queueing logic for the postgres writer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I don't think it should be an abstractmethod in the _Storage class and should only exist in the snowflake backend

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wait_for_writes() method should remain on _Storage because it is called from anyvar.restapi.main which does not understand which backend it is using. The REST API method implementations only calls methods on the _Storage base class. Without it on the base class, calling /vcf with allow_async_write=no results in a 500 error if the backend store in Postgres.

I cleaned up the comments and removed the pass line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehclark that makes sense. Thanks

@@ -202,6 +202,30 @@ def __delitem__(self, name: str) -> None:
cur.execute(f"DELETE FROM {self.table_name} WHERE vrs_id = ?;", [name]) # nosec B608
self.conn.commit()

def wait_for_writes(self):
"""Return true once any currently pending database modifications have been completed."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but I don't think this method returns true

src/anyvar/storage/snowflake.py Outdated Show resolved Hide resolved
src/anyvar/storage/snowflake.py Outdated Show resolved Hide resolved
@ehclark
Copy link
Contributor Author

ehclark commented Jan 23, 2024

@jsstevenson Could you do another review when you get a chance. Thanks!

Copy link
Contributor

@jsstevenson jsstevenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 lgtm

@ehclark ehclark merged commit d0e55c0 into main Jan 23, 2024
5 checks passed
@ehclark ehclark deleted the issue-65 branch January 23, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a parameter to the /vcf endpoint that ensures all async VRS object writes are completed before returning
3 participants