GH-120754: Remove isatty call during regular open #121593

cmaloney · 2024-07-10T19:35:18Z

For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY.

The open() Python builtin requires a stat call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the stat_result.st_size member cannot be modified from Python, which is needed by the _pyio implementation, so make the whole stat object optional. In the _io implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a write they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the stat results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. importlib).

With this change on my Linux machine reading a small plain text file is down to 6 system calls.

openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0

Performance

On my Mac:

python bm_readall.py

import pyperf
from pathlib import Path


def read_all(all_paths):
    for p in all_paths:
        p.read_bytes()


def read_file(path_obj):
    path_obj.read_text()


all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found rst files"
assert all_py, "Should have found python source files"


runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))

runner.bench_func("read_all_rst", read_all, all_rst)
runner.bench_func("read_all_py", read_all, all_py)

main

.....................
read_file_small: Mean +- std dev: 7.89 us +- 0.06 us
.....................
read_file_large: Mean +- std dev: 21.1 us +- 0.3 us
.....................
read_all_rst: Mean +- std dev: 4.17 ms +- 0.07 ms
.....................
read_all_py: Mean +- std dev: 19.3 ms +- 0.2 ms

cmaloney/stash_fstat

.....................
read_file_small: Mean +- std dev: 7.47 us +- 0.05 us
.....................
read_file_large: Mean +- std dev: 20.5 us +- 0.3 us
.....................
read_all_rst: Mean +- std dev: 3.92 ms +- 0.07 ms
.....................
read_all_py: Mean +- std dev: 18.4 ms +- 0.2 ms

On Linux the performance change is in the noise on my machine. For both MacOS and Linux I suspect removing the remaining lseek will provide a bit more performance swing based on profiles, but it also touches very differently shaped code than the system call removals so far.

Issue: Speed up open().read() pattern by reducing the number of system calls #120754

For POSIX, TTYs are never regular files, so if the interpreter knows the file is regular it doesn't need to do an additional system call to check if the file is a TTY. The `open()` Python builtin requires a `stat` call at present in order to ensure the file being opened isn't a directory. That result includes the file mode which tells us if it is a regular file. There are a number of attributes from the stat which are stashed one off currently, move to stashing the whole object rather than just individual members. The stat object is reasonably large and currently the `stat_result.st_size` member cannot be modified from Python, which is needed by the `_pyio` implementation, so make the whole stat object optional. In the `_io` implementation this makes handling a stat failure simpler. At present there is no explicit user call to clear it, but if one is needed (ex. a program which has a lot of open FileIO objects and the memory becomes a problem) it would be straightforward to add. Ideally would be able to automatically clear (the values are generally used during I/O object initialization and not after. After a `write` they are no longer useful in current cases). It is fairly common pattern to scan a directory, look at the `stat` results (ex. is this file changed), and then open/read the file. In this PR I didn't update open's API to allow passing in a stat result to use, but that could be beneficial for some cases (ex. `importlib`). With this change on my Linux machine reading a small plain text file is down to 6 system calls. ```python openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "from pathlib import Path\n\npath ="..., 88) = 87 read(3, "", 1) = 0 close(3) = 0 ```

cmaloney · 2024-07-10T21:12:59Z

The WASI and x86 Windows failures are around large readall on a large zip file, working on figuring out what I changed / broke for those cases.

…anding buffer

serhiy-storchaka · 2024-07-11T07:09:35Z

See also #90102.

cmaloney · 2024-07-11T07:24:08Z

@serhiy-storchaka
Cool! Hadn't realized there were a couple attempts at that previously. This doesn't change the behavior of isatty call at all, and the isatty check is only optimized during __init__ so I think is safe in terms of what didn't quite work with #112495 (changes to the underlying file/fd after open; the stat + don't call isatty is inside the same single library call in this PR)

I don't have a strong preference around my "is regular" check vs. S_ISCHR (#112495) vs. Size. If there's one which is best to use or a combo happy to implement that.

cmaloney · 2024-07-11T07:47:20Z

The test_zipimport highlighted a couple issues to me which I think should be solved, but separately from this (which keeps the behavior from before this PR):

If the size is large, then this code with slight hard to notice/read changes (which does break buildbots) allocates larger than the max byte string size

cpython/Objects/bytesobject.c

Lines 86 to 90 in 690b935

    
           if ((size_t)size > (size_t)PY_SSIZE_T_MAX - PyBytesObject_SIZE) { 
        
               PyErr_SetString(PyExc_OverflowError, 
        
                               "byte string is too large"); 
        
               return NULL; 
        
           }

vs.

cpython/Include/internal/pycore_fileutils.h

Lines 65 to 76 in 690b935

    
           #if defined(MS_WINDOWS) || defined(__APPLE__) 
        
               /* On Windows, the count parameter of read() is an int (bpo-9015, bpo-9611). 
        
                  On macOS 10.13, read() and write() with more than INT_MAX bytes 
        
                  fail with EINVAL (bpo-24658). */ 
        
           #   define _PY_READ_MAX  INT_MAX 
        
           #   define _PY_WRITE_MAX INT_MAX 
        
           #else 
        
               /* write() should truncate the input to PY_SSIZE_T_MAX bytes, 
        
                  but it's safer to do it ourself to have a portable behaviour */ 
        
           #   define _PY_READ_MAX  PY_SSIZE_T_MAX 
        
           #   define _PY_WRITE_MAX PY_SSIZE_T_MAX 
        
           #endif

This is worked around currently by "if the size is large, always use a dynamically resized buffer" and the end reads end up being small (ex. 1 byte for test_largefile). It feels like that's pointing out two things which I would like to solve with a separate PRs (This keeps the same behavior around PY_READ_MAX + estimated size as before this PR)

It would be good to have a test in test_fileio around this behavior, rather than relying on other modules. In specific, on a very large / max size file a seek to near the end of the file then do a .read(). Also truncate the file then do a read.
The zipfile module relies on read without a size after seeking to a certain distance from the end of the file, and discards if size doesn't match what it expects exactly. That would work more efficiently by doing a fixed-size read

In the CPython codebase doing a seek after open then doing a readall / read without size occurs multiple times, which likely means it is common other places as well
1. Should improve behavior for that (Likely either seek and truncate should invalidate the cached size or some form of general "if large readall must call fstat to get up to date info")
2. Should add a test that we don't over-allocate which can lead to an OOM on smaller memory boxes in these cases (is what broke the bot in gh-120754: Update estimated_size in C truncate #121357)
The dynamic buffer resizing doesn't pay attention to the max PyBytes size and should be improved. Currently it tries allocating _PY_READ_MAX which is bigger than the biggest possible bytes resulting in a bytes too large failure).

In the process of speeding up readall, A number of related tests (ex. large file tests in test_zipfile) found problems with the change I was making. This adds I/O tests to specifically test these cases to help ensure they don't regress and hopefully make debugging easier. This is part of the improvements from python#121593 (comment)

cmaloney · 2024-07-21T22:48:15Z

Created PRs to improve

zipfile to not do unbounded reads -- gh-113977, gh-120754: Remove unbounded reads from zipfile #122101
explicit I/O tests around seek + readall on a large file (and comment on the existing truncate + readall) -- GH-120754: Add more tests around seek + readall #122103

In the process of speeding up readall, A number of related tests (ex. large file tests in test_zipfile) found problems with the change I was making. This adds I/O tests to specifically test these cases to help ensure they don't regress and hopefully make debugging easier. This is part of the improvements from #121593 (comment)

In the process of speeding up readall, A number of related tests (ex. large file tests in test_zipfile) found problems with the change I was making. This adds I/O tests to specifically test these cases to help ensure they don't regress and hopefully make debugging easier. This is part of the improvements from python#121593 (comment)

vstinner

This PR is hard to review since it changes multiple things at once. Would it be possible to only introduce "stat_atopen" and use it to get the block size? (Create a first smaller PR.)

cmaloney · 2024-08-28T00:28:49Z

@vstinner I have opened a new PR GH-123412 which contains just the refactor to stat_atopen member, memory allocation management of that new member. It has two commits, one for _io and a second for _pyio, trying to keep those in sync but allowing the changes to hopefully be looked at side to side. I'll work on a new PR to skip checking isatty for common regular files on top of that.

bedevere-app bot mentioned this pull request Jul 10, 2024

Speed up open().read() pattern by reducing the number of system calls #120754

Closed

bedevere-app bot added the awaiting review label Jul 10, 2024

cmaloney and others added 2 commits July 10, 2024 13:26

WASI sets st_blksize to 0, update _pyio to handle. _io already handled

a51e8ee

Merge branch 'main' into cmaloney/stash_fstat_pr

bc55828

cmaloney changed the title ~~GH-120754: Remove isatty call during regular read~~ GH-120754: Remove isatty call during regular open Jul 10, 2024

Read must be smaller than _PY_READ_MAX

991fb01

cmaloney marked this pull request as draft July 11, 2024 05:14

bedevere-app bot removed the awaiting review label Jul 11, 2024

Zipimport test relies on having a size > PY_READ_MAX meaning auto-exp…

a487c6b

…anding buffer

cmaloney marked this pull request as ready for review July 11, 2024 07:25

bedevere-app bot added the awaiting review label Jul 11, 2024

cmaloney mentioned this pull request Jul 21, 2024

gh-113977, gh-120754: Remove unbounded reads from zipfile #122101

Merged

cmaloney mentioned this pull request Jul 21, 2024

GH-120754: Add more tests around seek + readall #122103

Merged

cmaloney mentioned this pull request Aug 7, 2024

Unbounded reads by zipfile may cause a MemoryError. #113977

Closed

cmaloney mentioned this pull request Aug 25, 2024

Avoid calling isatty() for most open() calls #90102

Closed

vstinner reviewed Aug 26, 2024

View reviewed changes

cmaloney mentioned this pull request Aug 28, 2024

gh-120754: Refactor I/O modules to stash whole stat result rather than individual members #123412

Merged

cmaloney closed this Aug 28, 2024

cmaloney mentioned this pull request Sep 18, 2024

gh-120754: Fix memory leak in FileIO.__init__() #124225

Merged

cmaloney mentioned this pull request Oct 3, 2024

gh-90102: Remove isatty call during regular open #124922

Merged

cmaloney deleted the cmaloney/stash_fstat_pr branch November 2, 2024 09:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GH-120754: Remove isatty call during regular open #121593

GH-120754: Remove isatty call during regular open #121593

Uh oh!

cmaloney commented Jul 10, 2024 •

edited by bedevere-app bot

Loading

Uh oh!

cmaloney commented Jul 10, 2024

Uh oh!

serhiy-storchaka commented Jul 11, 2024

Uh oh!

cmaloney commented Jul 11, 2024 •

edited

Loading

Uh oh!

cmaloney commented Jul 11, 2024 •

edited

Loading

Uh oh!

cmaloney commented Jul 21, 2024

Uh oh!

vstinner left a comment

Uh oh!

cmaloney commented Aug 28, 2024

Uh oh!

Uh oh!

Uh oh!

GH-120754: Remove isatty call during regular open #121593

GH-120754: Remove isatty call during regular open #121593

Uh oh!

Conversation

cmaloney commented Jul 10, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance

Uh oh!

cmaloney commented Jul 10, 2024

Uh oh!

serhiy-storchaka commented Jul 11, 2024

Uh oh!

cmaloney commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Jul 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmaloney commented Jul 21, 2024

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

cmaloney commented Aug 28, 2024

Uh oh!

Uh oh!

cmaloney commented Jul 10, 2024 •

edited by bedevere-app bot

Loading

cmaloney commented Jul 11, 2024 •

edited

Loading

cmaloney commented Jul 11, 2024 •

edited

Loading