-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide high bandwidth performance for bulk frame writes. #236
Comments
@b-butler any thoughts or suggestions? |
On second thought, this does not need to be API breaking. Make the new methods
remains with the current behavior, but tools can opt-in to the new API. |
I like the proposed option of creating two new functions which |
Yes, the buffer frame method would increment the frame counter. Maybe a better name is |
Unfortunately, the implementation is not so simple as I wrote in the description. The commit call should not write out partial frame data that has not been ended. Otherwise, a killed job might include an incomplete frame at the end. Therefore, we will need another index buffer to track the buffered index writes separate from those that are specifically buffered for the current (partial frame). On second thought, I would like to provide high performance for all users. GSD versions 2.0 - 2.8.0 buffered output (at the OS's discretion), so we could release 2.9 (or call it 3.0 if concerned) that buffers output internally and requires either a One challenge with this is that The HOOMD tutorials that write a gsd file and read in the same notebook will need to call |
Completed in #237. |
Description
Allow gsd to provide high bandwidth writes while ensuring file integrity.
Proposed solution
gsd_end_frame()
intogsd_end_frame()
andgsd_commit()
.gsd_commit()
ingsd_close()
.The new
gsd_end_frame()
will only increment the frame counter:gsd/gsd/gsd.c
Lines 1875 to 1887 in aad91d2
The new
gsd_commit()
will flush the buffers and sync:gsd/gsd/gsd.c
Lines 1889 to 1954 in aad91d2
Additional context
With this API, the caller can push many frames into the in-memory buffer and flush them all at an appropriate time (e.g. after buffering a full batch). Otherwise, the remaining buffer will be written when the file is closed.
#232 introduced a per-frame
fsync
call to ensure data integrity in the file. This lowers performance on long latency filesystems. On Frontier's Orion, realistic uses in HOOMD-blue are limited to ~10 frames per second written to the file. Commenting out thefsync
gives ~10x performance improvements, depending on system size as the writes are now bandwidth limited. This proposed API allows the caller to gain the bandwidth efficiency by batching many frames in memory and finally writing the buffer withgsd_commit()
, ensuring data integrity for the batch.As an API breaking change, this should be introduced in gsd 3.0.
The text was updated successfully, but these errors were encountered: