gh-117151: optimize algorithm to grow the buffer size for readall() on files #131052
+17
−19
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
continuing my PRs to optimize buffers.
file
readall()
sets the buffer to the filesize.new_buffersize
is used to grow the buffer gradually. we need to optimize that later case.that
new_buffersize
function was written 16 years ago and had little optimization since.it's reading the file with a 8kB buffer to start with and increasing in steps of around 8kB, it's doing a lot of small inefficient writes.
it's increasing in steps of 12.5% after 65kB, which is still minuscule.
I spent some days looking into this function (as part of the attached ticket looking into optimizing buffers), the attached PR is what I could come up with to optimize the function.
considerations and gotcha:
is it worth explaining all of that in comments?
the existing code is sparse in explanation and I had to go through a fair amount of tickets and debugging to understand the history.
See code below to benchmark with different file sizes on your machine.
This PR is a draft to discuss the fix before I spend more time on it. and to get a CI build passing.