Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-120754: Remove isatty call during regular open #121593

Closed
wants to merge 5 commits into from

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Jul 10, 2024

For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY.

The open() Python builtin requires a stat call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the stat_result.st_size member cannot be modified from Python, which is needed by the _pyio implementation, so make the whole stat object optional. In the _io implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a write they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the stat results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. importlib).

With this change on my Linux machine reading a small plain text file is down to 6 system calls.

openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0

Performance

On my Mac:

python bm_readall.py

import pyperf
from pathlib import Path


def read_all(all_paths):
    for p in all_paths:
        p.read_bytes()


def read_file(path_obj):
    path_obj.read_text()


all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"


runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))

runner.bench_func("read_all_rst", read_all, all_rst)
runner.bench_func("read_all_py", read_all, all_py)

main

.....................
read_file_small: Mean +- std dev: 7.89 us +- 0.06 us
.....................
read_file_large: Mean +- std dev: 21.1 us +- 0.3 us
.....................
read_all_rst: Mean +- std dev: 4.17 ms +- 0.07 ms
.....................
read_all_py: Mean +- std dev: 19.3 ms +- 0.2 ms

cmaloney/stash_fstat

.....................
read_file_small: Mean +- std dev: 7.47 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 20.5 us +- 0.3 us
.....................
read_all_rst: Mean +- std dev: 3.92 ms +- 0.07 ms
.....................
read_all_py: Mean +- std dev: 18.4 ms +- 0.2 ms

On Linux the performance change is in the noise on my machine. For both MacOS and Linux I suspect removing the remaining lseek will provide a bit more performance swing based on profiles, but it also touches very differently shaped code than the system call removals so far.

For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a TTY.

The `open()` Python builtin requires a `stat` call at present in order
to ensure the file being opened isn't a directory. That result includes
the file mode which tells us if it is a regular file. There are a number
of attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
@cmaloney cmaloney changed the title GH-120754: Remove isatty call during regular read GH-120754: Remove isatty call during regular open Jul 10, 2024
@cmaloney
Copy link
Contributor Author

The WASI and x86 Windows failures are around large readall on a large zip file, working on figuring out what I changed / broke for those cases.

@cmaloney cmaloney marked this pull request as draft July 11, 2024 05:14
@serhiy-storchaka
Copy link
Member

See also #90102.

@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 11, 2024

@serhiy-storchaka
Cool! Hadn't realized there were a couple attempts at that previously. This doesn't change the behavior of isatty call at all, and the isatty check is only optimized during __init__ so I think is safe in terms of what didn't quite work with #112495 (changes to the underlying file/fd after open; the stat + don't call isatty is inside the same single library call in this PR)

I don't have a strong preference around my "is regular" check vs. S_ISCHR (#112495) vs. Size. If there's one which is best to use or a combo happy to implement that.

@cmaloney cmaloney marked this pull request as ready for review July 11, 2024 07:25
@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 11, 2024

The test_zipimport highlighted a couple issues to me which I think should be solved, but separately from this (which keeps the behavior from before this PR):

  1. If the size is large, then this code with slight hard to notice/read changes (which does break buildbots) allocates larger than the max byte string size

    if ((size_t)size > (size_t)PY_SSIZE_T_MAX - PyBytesObject_SIZE) {
    PyErr_SetString(PyExc_OverflowError,
    "byte string is too large");
    return NULL;
    }
    vs.
    #if defined(MS_WINDOWS) || defined(__APPLE__)
    /* On Windows, the count parameter of read() is an int (bpo-9015, bpo-9611).
    On macOS 10.13, read() and write() with more than INT_MAX bytes
    fail with EINVAL (bpo-24658). */
    # define _PY_READ_MAX INT_MAX
    # define _PY_WRITE_MAX INT_MAX
    #else
    /* write() should truncate the input to PY_SSIZE_T_MAX bytes,
    but it's safer to do it ourself to have a portable behaviour */
    # define _PY_READ_MAX PY_SSIZE_T_MAX
    # define _PY_WRITE_MAX PY_SSIZE_T_MAX
    #endif

    This is worked around currently by "if the size is large, always use a dynamically resized buffer" and the end reads end up being small (ex. 1 byte for test_largefile). It feels like that's pointing out two things which I would like to solve with a separate PRs (This keeps the same behavior around PY_READ_MAX + estimated size as before this PR)

    1. It would be good to have a test in test_fileio around this behavior, rather than relying on other modules. In specific, on a very large / max size file a seek to near the end of the file then do a .read(). Also truncate the file then do a read.
    2. The zipfile module relies on read without a size after seeking to a certain distance from the end of the file, and discards if size doesn't match what it expects exactly. That would work more efficiently by doing a fixed-size read
  2. In the CPython codebase doing a seek after open then doing a readall / read without size occurs multiple times, which likely means it is common other places as well

    1. Should improve behavior for that (Likely either seek and truncate should invalidate the cached size or some form of general "if large readall must call fstat to get up to date info")
    2. Should add a test that we don't over-allocate which can lead to an OOM on smaller memory boxes in these cases (is what broke the bot in gh-120754: Update estimated_size in C truncate #121357)
  3. The dynamic buffer resizing doesn't pay attention to the max PyBytes size and should be improved. Currently it tries allocating _PY_READ_MAX which is bigger than the biggest possible bytes resulting in a bytes too large failure).

cmaloney added a commit to cmaloney/cpython that referenced this pull request Jul 21, 2024
In the process of speeding up readall, A number of related tests
(ex. large file tests in test_zipfile) found problems with the
change I was making. This adds I/O tests to specifically test these
cases to help ensure they don't regress and hopefully make debugging
easier.

This is part of the improvements from
python#121593 (comment)
@cmaloney
Copy link
Contributor Author

Created PRs to improve

  1. zipfile to not do unbounded reads -- gh-113977, gh-120754: Remove unbounded reads from zipfile #122101
  2. explicit I/O tests around seek + readall on a large file (and comment on the existing truncate + readall) -- GH-120754: Add more tests around seek + readall #122103

hauntsaninja pushed a commit that referenced this pull request Jul 24, 2024
In the process of speeding up readall, A number of related tests
(ex. large file tests in test_zipfile) found problems with the
change I was making. This adds I/O tests to specifically test these
cases to help ensure they don't regress and hopefully make debugging
easier.

This is part of the improvements from
#121593 (comment)
nohlson pushed a commit to nohlson/cpython that referenced this pull request Jul 24, 2024
In the process of speeding up readall, A number of related tests
(ex. large file tests in test_zipfile) found problems with the
change I was making. This adds I/O tests to specifically test these
cases to help ensure they don't regress and hopefully make debugging
easier.

This is part of the improvements from
python#121593 (comment)
nohlson pushed a commit to nohlson/cpython that referenced this pull request Jul 24, 2024
In the process of speeding up readall, A number of related tests
(ex. large file tests in test_zipfile) found problems with the
change I was making. This adds I/O tests to specifically test these
cases to help ensure they don't regress and hopefully make debugging
easier.

This is part of the improvements from
python#121593 (comment)
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is hard to review since it changes multiple things at once. Would it be possible to only introduce "stat_atopen" and use it to get the block size? (Create a first smaller PR.)

@cmaloney
Copy link
Contributor Author

@vstinner I have opened a new PR GH-123412 which contains just the refactor to stat_atopen member, memory allocation management of that new member. It has two commits, one for _io and a second for _pyio, trying to keep those in sync but allowing the changes to hopefully be looked at side to side. I'll work on a new PR to skip checking isatty for common regular files on top of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants