Skip to content

gh-129005: Align FileIO.readall allocation #129458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Jan 29, 2025

Both _io and _pyio now use a pre-allocated buffer of length bufsize, fill it using a os.readinto / _Py_read, and have matching "expand buffer" logic.

On my machine (Linux, Debug build) this takes: ./python -m test -M8g -uall test_largefile -m test_large_read -v from ~3.7 seconds to ~3.3 seconds

_pyio still uses 2x the memory, there are two remaining copies

  1. the bytes(result) currently copies. I'd like to either just rely on "duck typing" / bytearray is close enough to bytes, or would need to do something similar to C++ "move" semantics where the bytes could take ownership of the data buffer from the bytearray without copying... Not sure what is most Pythonic
  2. _pyio.BufferedIO._read_unlocked in the read-all case where no buffer has been allocated always does return buf[:pos] + chunk which causes another copy. Patch for that case / "if buf is length 0, just return" coming shortly.

Both now use a pre-allocated buffer of length `bufsize`, fill it using
a readinto, and have matching "expand buffer" logic.

On my machine this takes:

`./python -m test -M8g -uall test_largefile -m test_large_read -v`
from ~3.7 seconds to ~3.4 seconds
@vstinner vstinner enabled auto-merge (squash) January 30, 2025 10:50
@vstinner
Copy link
Member

LGTM. I enabled auto-merge.

@vstinner vstinner merged commit f927204 into python:main Jan 30, 2025
39 checks passed
@cmaloney cmaloney deleted the cmalnoey/pyio_fileio_readall_2 branch January 30, 2025 21:57
cmaloney added a commit to cmaloney/cpython that referenced this pull request Feb 2, 2025
vstinner pushed a commit that referenced this pull request Feb 2, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
Both now use a pre-allocated buffer of length `bufsize`, fill it using
a readinto(), and have matching "expand buffer" logic.

On my machine this takes:

`./python -m test -M8g -uall test_largefile -m test_large_read -v`

from ~3.7 seconds to ~3.4 seconds.
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants