-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
Speed up open().read() pattern by reducing the number of system calls #120754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Startup time is always an important thing to optimize for. |
I found the source of the remaining In the read whole file case that seems to be a lot of unnecessary work (Allocate a additional buffer and lock, do the seek, a list of "chunks" for read that get joined w/ special fast cases for "only one chunk", etc) which gets discarded shortly after. While it is likely possible to rewrite |
Summary# main
read_file_large: Mean +- std dev: 24.0 us +- 0.4 us
# readall_faster, removes a stat (#120755):
read_file_large: Mean +- std dev: 21.2 us +- 0.6 us
# No isatty call (no pr yet)
read_file_large: Mean +- std dev: 20.2 us +- 0.3 us No patch that removes the BufferedIO/TextIO DetailsIt touches a lot of code to remove the remaining seek. BufferedIO has implementation for the "absolute position" The main
cmaloney/readall_faster (#120755)
No isatty (local only currently):
That reduces the time for reading a single large I built a little larger benchmark to more match code I ran into this with: "read all the .rst and .py in the cpython repo" import pyperf
from pathlib import Path
def read_all(all_paths):
for p in all_paths:
p.read_bytes()
def read_file(path_obj):
path_obj.read_text()
all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found .rst files"
assert all_py, "Should have found .py source files"
runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))
runner.bench_func("read_all_rst", read_all, all_rst)
runner.bench_func("read_all_py", read_all, all_py) And on my Linux box I'm iterating on top of my existing PR code. Both removing the extra |
…20755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Could this be reopened? My plan is to keep working on trying to simplify what happens in this case, would like to attach to this issue (strace pr followup test, removing seeks, possibly removing isatty) |
Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.
Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ``` openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ``` openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a tty. The `open()` API requires a `stat` call at present in order to make sure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There's a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY. The `open()` Python builtin requires a `stat` call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY. The `open()` Python builtin requires a `stat` call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```
See also: gh-90102 |
…se (python#120755) This reduces the system call count of a simple program[0] that reads all the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my linux system, 5813 -> 4875 on my macOS) This reduces the number of `fstat()` calls always and seek calls most the time. Stat was always called twice, once at open (to error early on directories), and a second time to get the size of the file to be able to read the whole file in one read. Now the size is cached with the first call. The code keeps an optimization that if the user had previously read a lot of data, the current position is subtracted from the number of bytes to read. That is somewhat expensive so only do it on larger files, otherwise just try and read the extra bytes and resize the PyBytes as needeed. I built a little test program to validate the behavior + assumptions around relative costs and then ran it under `strace` to get a log of the system calls. Full samples below[1]. After the changes, this is everything in one `filename.read_text()`: ```python3 openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3` fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0` ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` This does make some tradeoffs 1. If the file size changes between open() and readall(), this will still get all the data but might have more read calls. 2. I experimented with avoiding the stat + cached result for small files in general, but on my dev workstation at least that tended to reduce performance compared to using the fstat(). [0] ```python3 from pathlib import Path nlines = [] for filename in Path("cpython/Doc").glob("**/*.rst"): nlines.append(len(filename.read_text())) ``` [1] Before small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` After small file: ``` openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343 read(3, "", 1) = 0 close(3) = 0 ``` Before large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffe52525930) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` After large file: ``` openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0 ioctl(3, TCGETS, 0x7ffdfac04b40) = -1 ENOTTY (Inappropriate ioctl for device) lseek(3, 0, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104 read(3, "", 1) = 0 close(3) = 0 ``` Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Sometimes a large file is truncated (test_largefile). While estimated_size is used as a estimate (the read will stil get the number of bytes in the file), that it is much larger than the actual size of data can result in a significant over allocation and sometimes lead to a MemoryError / running out of memory. This brings the C implementation to match the Python _pyio implementation.
open_code is used to read code files in full, the `.read()` case has an optimized readall case, so constructing a BufferedIO, its lock, its buffer, etc. is unnecessary work. Interpreter startup (ex. pymain_run_file) uses a different codepath which relies on unmarshalling individual bytes, see run_pyc_file. That path is not changed in this PR.
Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.
Planning to close this issue once the two outstanding PRs are closed. There are more improvements I think are possible, but they're large enough scoped individual changes they should likely get their own Discourse / github issue / etc. In particular I'm looking at:
# Read full contents of a list of files
@unleash_dragons(gather_io=True)
def read_configs(all_paths):
for p in all_paths:
p.read_bytes() Currently that does a lot of round-trips from Python -> OS -> Python, pipelining / parallelizing those requests (one io_uring wait call) significantly improves performance. |
For the startup "read 4 times" case, I looked a bit and it's really complex to unwind for likely minimal performance (OS filecache saves quite a bit). It's a combination of supporting https://peps.python.org/pep-0263/ and the tokenizer taking a Some additional complexity in refactoring is that there is a long-standing hook io.open_code opens in Python's I/O then that needs to become a FILE*, and the API is somewhat exposed in the C API (https://docs.python.org/3/c-api/file.html#c.PyFile_FromFd) |
GH-113977, GH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…pythonGH-122101) pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. (cherry picked from commit 556dc9b) Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com> Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…pythonGH-122101) pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. (cherry picked from commit 556dc9b) Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com> Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…122101) (#126347) gh-113977, gh-120754: Remove unbounded reads from zipfile (GH-122101) GH-113977, GH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. (cherry picked from commit 556dc9b) Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com> Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…122101) (#126348) gh-113977, gh-120754: Remove unbounded reads from zipfile (GH-122101) GH-113977, GH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. (cherry picked from commit 556dc9b) Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com> Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Congrats :-) |
Thanks for your help in reviewing, helping me learn CPython, and getting the patches landed; sorry there were quite so many follow up patches, but think we got there :). Was measuring 3.13 vs. 3.14 main "read a whole file" performance and it's generally between 10-15% reading OS cached files on MacOS and Linux for a LTO optimized build. Nice little wins while keeping overall API guarantees / stability :) DetailsPython 3.14 commit: commit 7d7d56d (HEAD -> main, origin/main, origin/HEAD) Python 3.13 commit: commit 60403a5 (tag: v3.13.0, v3.13.0) Python 3.13
Python 3.14a0+ (gh-120754 done)
MacOSMacOS v3.13.0
MacOS v3.14.0a0+ commit 34653bb (HEAD -> main, origin/main, origin/HEAD)
import pyperf
from pathlib import Path
all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files. Is the current directory a cpython checkout?"
assert all_py, "Should have found python source files. Is the current directory a cpython checkout?"
def read_all_bytes(all_paths):
for p in all_paths:
p.read_bytes()
def read_all_text(all_paths):
for p in all_paths:
p.read_text()
def read_file(path_obj):
path_obj.read_text()
def read_bytes(path_obj):
path_obj.read_bytes()
# print(f"all_rst={len(all_rst)}|all_py={len(all_py)}")
# Make a fixed number across python versions.
# File sizes may still vary a bit, but this helps a lot for stability,
all_rst = all_rst[:200]
all_py = all_py[:3000]
assert len(all_rst) == 200
assert len(all_py) == 3000
runner = pyperf.Runner()
runner.bench_func('read_file_small', read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func('read_file_large', read_file, Path("Doc/c-api/typeobj.rst"))
runner.bench_func('read_all_rst_bytes', read_all_bytes, all_rst)
runner.bench_func('read_all_py_bytes', read_all_bytes, all_py)
# NOTE: No test to read all `.py` as text as some contains invalid utf-8 byte sequences.
runner.bench_func('read_all_rst_text', read_all_text, all_rst) |
Maybe you should document this optimization at: https://docs.python.org/dev/whatsnew/3.14.html#optimizations |
…n#125166) Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.
…n().read(), Take 2 (python#123413)
…pythonGH-122101) pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
…n#125166) Performed an audit of `fileio.c` and `_pyio` and made sure anytime the fd changes the stat result, if set, is also cleared/changed. There's one case where it's not cleared, if code would clear it in __init__, keep the memory allocated and just do another fstat with the existing memory.
…n().read(), Take 2 (python#123413)
…pythonGH-122101) pythonGH-113977, pythonGH-120754: Remove unbounded reads from zipfile Read without a size may read an unbounded amount of data + allocate unbounded size buffers. Move to capped size reads to prevent potential issues. Co-authored-by: Daniel Hillier <daniel.hillier@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Feature or enhancement
Proposal:
I came across some seemingly redundant
fstat()
andlseek()
calls when working on a tool that scanned a directory of lots of small YAML files and loaded their contents as config. In tracing I found most execution time wasn't in the python interpreter but system calls (on top of NFS in that case, which made some I/O calls particularly slow).I've been experimenting with a program that reads all
.rst
files in the pythonDocs
directory to try and remove some of those redundant system calls..Test Program
In my experimentation, with some tweaks to fileio can remove over 10% of the system calls the test program makes when scanning the whole
Doc
folders for.rst
files on both macOS and Linux (don't have a Windows machine to measure on).Current State (9 system calls)
Currently on my Linux machine to read a whole
.rst
file with the above code there is this series of system calls:Target State (
75 system calls)It would be nice to get it down to (for small files, large file caveat in PR / get an additional seek):
In a number of cases (ex. importing modules) there is often a
fstat
followed immediately by an open / read the file (which does anotherfstat
typically), but that is an extension point and I want to keep that out of scope for now.Questions rattling around in my head around this
Some of these are likely better for Discourse / longer form discussion, happy to start threads there as appropriate.
python simple.py
that containsprint("Hello, World!")
) currently readssimple.py
in full at least 4 times and does over 5 seeks. I have been pulling on that thread but it interacts with importlib as well as how the python compiler currently works, still trying to get my head around. Would removing more of those overheads be something of interest / should I keep working to get my head around it?_blksize
member of fileio was added in bpo-21679. It is not used much as far as I can tell as its reflection_blksize
in python or in the code. The only usage I can find is https://github.com/python/cpython/blob/main/Modules/_io/_iomodule.c#L365-L374, where we could just query for it when needed in that case to save some storage on allfileio
objects. The behavior of using the stat returned st_blksize is part of the docs, so doesn't feel like we can fully remove it.Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
Linked PRs
io
open()
and.read()
optimization to what's new #126466The text was updated successfully, but these errors were encountered: