-
Notifications
You must be signed in to change notification settings - Fork 168
feat(experimental): flush the last chunk in append method #1699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Pulkit0110, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the asynchronous object writing mechanism to improve data persistence guarantees. By shifting the responsibility of flushing the final data chunk from the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the AsyncAppendableObjectWriter to flush the last chunk of data within the append method itself, rather than waiting for the close method. This change improves the writer's behavior by ensuring data is persisted more promptly. The implementation correctly handles flushing based on both the flush interval and whether a chunk is the last in an append call. The close and finalize methods have been updated to reflect this new behavior, and the stream is now correctly closed within finalize. The accompanying test changes, including a new system test, effectively validate the new logic. I have one suggestion to slightly refactor the flushing logic for improved clarity.
google/cloud/storage/_experimental/asyncio/async_appendable_object_writer.py
Show resolved
Hide resolved
|
|
||
| if is_last_chunk: | ||
| request.state_lookup = True | ||
| request.flush = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also set self.bytes_appended_since_last_flush to 0
| async def finalize(self) -> _storage_v2.Object: | ||
| """Finalizes the Appendable Object. | ||
| Note: Once finalized no more data can be appended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since during our discussion there was confusion between close and finalize. Can you add one more sentence in Note, to avoid confusion b/w us and users or anyone.
Please note this method is different from `close`, if `.close()` is called data may still be appended to object at a later point in time by opening with generation number. (i.e. `open(..., generation=<object_generation_number>)` . However if `.finalize()` is called no more data can be appended to the object.
| assert final_object.size == total_size | ||
|
|
||
| # Verify the full content of the object. | ||
| full_data = small_data + large_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of downloading , should we just match checksum ? it'll be faster (& our system tests will be faster)
| _storage_v2.BidiWriteObjectRequest(finish_write=True) | ||
| ) | ||
| mock_stream.recv.assert_awaited_once() | ||
| mock_stream.close.assert_awaited_once() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add two more asserts
assert writer._is_stream_open is False
assert writer.offset is None
| This method sends the provided `data` to the GCS server in chunks. | ||
| and persists data in GCS at every `_MAX_BUFFER_SIZE_BYTES` bytes by | ||
| calling `self.simple_flush`. | ||
| sending `flush=true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method sends the provided `data` to the GCS server in chunks and persists data in GCS at every `_MAX_BUFFER_SIZE_BYTES` bytes or at the last chunk whichever is earlier. Persisting is done by setting `flush=True` on request.
| request.state_lookup = True | ||
| request.flush = True | ||
|
|
||
| await self.write_obj_stream.send(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to add one writer_obj_stream.recv() call because we're sending state_lookup=True in last request.
otherwise gRPC client side buffer will gets filled up in scenario where user just continuously calls appends.
Earlier the last chunk was being flushed while calling the close() method. Now it will be done inside the append method itself.