-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The io module doesn't support non-blocking files #57531
Comments
According to the the documentation, BufferedReader.read() and BufferedWriter.write() should raise io.BlockingIOError if the file is in non-blocking mode and the operation cannot succeed without blocking. However, BufferedReader.read() returns None (which is what RawIOBase.read() is documented as doing), and BufferedWriter.write() raises IOError with a message like
I tested this on Linux with Python 2.6, 2.7 and 3.x. Attached is a unit test. |
BufferedReader.readinto() should also raise BlockingIOError according to the docs. Updated unittest checks for that also. BTW, The documentation for BufferedIOBase.read() says that BlockingIOError should be raised if nothing can be read in non-blocking mode. BufferedReader inherits from BufferedIOBase and overrides the read() method. This is the documentation for BufferedReader.read(): read([n])
Read and return n bytes, or if n is not given or negative,
until EOF or if the read call would block in non-blocking mode. This sentence is complete gobbledygook, and it makes no mention of what should happen if nothing can be read in non-blocking mode. So I presume behaviour for BufferedReader.read() should match the documented behaviour for BufferedIOBase.read(). |
Wierdly, it looks like BlockingIO is not raised anywhere in the code for the C implementation of io. Even more wierdly, in the Python implementation of io, BlockingIOError is only ever raised by except clauses which have already caught BlockingIOError. So, of course, these clauses are dead code. The only code in CPython which can ever succesfully raise BlockingIOError is MockNonBlockWriterIO.write() in test/test_io.py. I don't know what the correct behaviour is for flush() and close() if you get EAGAIN. I think flush() should raise an error rather than blocking, and that close() should delegate to self.raw.close() before raising the error. The docs say that read(), readinto() and write() can raise BlockingIOError. But what should readall() and readline() do? Should we just try to emulate whatever Python's old libc IO system did (with BlockingIOError replacing IOError(EAGAIN))? |
That would explain why it isn't raised :) This is a hairy issue: read(n) is documented as returning either n bytes or nothing. But what if less than n bytes are available non-blocking? Currently we return a partial read. readline() behaviour is especially problematic: >>> fcntl.fcntl(r, fcntl.F_SETFL, os.O_NDELAY)
0
>>> rf = open(r, mode='rb')
>>> os.write(w, b'xy')
2
>>> rf.read(3)
b'xy'
>>> os.write(w, b'xy')
2
>>> rf.readline()
b'xy' We should probably raise BlockingIOError in these cases, but that complicates the implementation quite a bit: where do we buffer the partial data? The internal (fixed size) buffer might not be large enough. write() is a bit simpler, since BlockingIOError has a "characters_written" attribute which is meant to inform you of the partial success: we can just reuse that. That said, BlockingIOError could grow a "partial_read" attribute containing the read result... Of course, we may also question whether it's useful to use buffered I/O objects around non-blocking file descriptors; if you do non-blocking I/O, you generally want to be in control, which means not having any implicit buffer between you and the OS. (this may be a topic for python-dev) |
Indeed. Performing partial read/write may sound imperfect, but using buffered I/O around non-blockind FD is definitely not a good idea. Note that Java's BufferedInputStream and ReadableByteChannel also return partial reads. So I'm somewhat inclined to keep the current behavior (it would however probably be a good idea to update the documentation to warn about this limitation, though). |
But what about the buggy readline() behaviour? |
Apparently, they are specified to, even for blocking streams (which I find a bit weird, and the language in the docs seems deliberately vague). Python's buffered read(), though, is specified to return the requested number of bytes (unless EOF happens). |
No one has suggested raising BlockingIOError and DISCARDING the data when a partial read has occurred. The docs seem to imply that the partially read data should be returned since they only say that BlockingIOError should be raised if there is NOTHING to read. Clearly this should all be spelt out properly. That leaves the question of whether, when there is NOTHING to read, BlockingIOError should be raised (as the docs say) or None should be returned (as is done now). I don't mind either way as long as the docs match reality. The part which really needs addressing is partial writes. Currently, if a write fails with EAGAIN then IOError is raised and there is no way to work out how much data was written/buffered. The docs say that BlockingIOError should be raised with the e.args[2] set to indicate the number of bytes written/buffered. This at least should be fixed. I will work on a patch. |
Just tell people that if the return value is a string which does not end in '\n' then it might caused by EOF or EAGAIN. They can just call readline() again to check which. |
The third arg of BlockingIOError is used in two quite different ways. In write(s) it indicates the number of bytes of s which have been "consumed" (ie written to the raw file or buffered). But in flush() and flush_unlocked() (in _pyio) it indicates the number of bytes from the internal buffer which have been written to the raw file. I think this explains the following comment in write():
I don't think flush() should try to tell us how many bytes were flushed: we only need to know whether we need to try again. |
""" The specified number of bytes have been read, As I understand it, it will return the number of bytes asked, unless EOF or EAGAIN/EWOULDBLOCK. It would seem reasonable to me to add the same note for non-blocking FDs to Python's read().
Sounds reasonable.
The problem is that if we raise BlockingIOError, we can only buffer a limited amount of data.
Agreed.
I don't have a string feeling: if we don't raise BlockingIOError on partial reads, then it probably makes sense to keep None. |
Currently a BlockingIOError exception raised by flush() sets Also, the docs say that the raw file wrapped by It would simplify matters a lot to insist that raw files implement BTW, when I try to change characters_written of an existing |
But then what's the point of using buffered I/O at all? If it can't (actually, raw I/O readline() is probably buggy as well) |
Well, ideally it should be an UnsupportedOperation, but that's an option. The only think I didn't like about this is that we should ideally raise this error upon the first method call, not when - and if - we receive EAGAIN.
Yeah, I know. |
Discarding data rarely is worse than always throwing an exception. |
The attached patch makes BufferedWrite.write() raise BlockingIOError when the raw file is non-blocking and the write would block. |
Now that I think about it, it's probably the best solution: |
Testing the patch a bit more thoroughly, I found that data received from the readable end of the pipe can be corrupted by the C implementation. This seems to be because two of the previously dormant codepaths did not properly maintain the necessary invariants. I got the failures to go away by adding
in two places. However, I really do not know what all the attributes mean. (Should self->raw_pos also be modified...?) Someone familiar with the code would need to check whether things are being done properly. This new patch adds some XXX comments in places in bufferedio.c which I am unsure about. |
Hi,
Ouch. Were they only non-blocking codepaths?
raw_pos is the position which the underlying raw stream is currently at. Another comment: you set errno to EAGAIN, but it is not sure that was |
Yes.
Do you mean self->raw_pos should give the same answer as self.raw.tell()? (But that seems to be the definition of self->abs_pos.) Or is it the buffer offset which corresponds to self.raw.tell()? |
The latter. |
Here is an updated patch which uses the real errno. It also gets rid of the restore_pos argument of _bufferedwriter_flush_unlocked() which is always set to false -- |
Thanks again. Just a nit: the tests should be in MiscIOTest, since they don't directly instantiate the individual classes. Also, perhaps it would be nice to check that the exception's "errno" attribute is EAGAIN. |
Done. |
Thanks. Who should I credit? "sbt"? |
Yeah, thanks. |
New changeset ac2c4c62b486 by Antoine Pitrou in branch '3.2': New changeset 3cd1985ed04f by Antoine Pitrou in branch 'default': New changeset e84e17643eeb by Antoine Pitrou in branch '2.7': |
Generally I doubt anyone is using the non-blocking semantics of the Python 3 I/O stack. People doing non-blocking I/O generally do it with sockets instead, which tend to reproduce quite literally the POSIX behaviour and error codes. |
Yes, your claim is confirmed by the fact that there have been little interest in this issue since 2011. Still, non-blocking behavior is incorrectly specified in the docs and is inconsistent (as investigated by Martin). And obscure errors like in my example or in bpo-13858 show that I/O stack is confused too. To prevent people from tripping on it, would you consider recommending against usage of I/O stack for non-blocking operations in io module docs? |
Yes, I think adding a note in the docs is reasonable. |
bpo-35762 was opened specifically about Izbyshev’s case: TextIOWrapper behaviour with a non-blocking file. Calling “os.fdopen” with mode='r' (text mode) returns a TextIOWrapper object. |
I closed bpo-35762 as a duplicate of this issue: subprocess.Popen with universal_newlines and nonblocking streams fails with "can't concat NoneType to bytes". |
I closed bpo-26292 as a duplicate of this issue: Raw I/O writelines() broken for non-blocking I/O. |
I closed bpo-24560 as a duplicate of this issue: codecs.StreamReader doesn't work with nonblocking streams: TypeError: can't concat bytes to NoneType. |
See also bpo-32561: Add API to io objects for non-blocking reads/writes. |
TextIOWrapper, and maybe also BufferedRead, may raise an exception if the underlying file descriptor is configured in non-blocking mode. It may require an additional syscall() to query the FD properties, which may slowdown the creation of file objects in Python :-/ |
I have experienced both ״TypeError: can't concat NoneType to bytes״, and the fact BufferedIO returns None. @pitrou @izbyshev contrary to your belief, I think there is at least some interest in this issue. Every few months another ticket is opened about a different aspect of the same underlying problem. |
Going through the same TypeError using 3.7 today, sample code:
the file read by the process is also manipulated by another process on Linux at the same time, while the code caught the IOError only. Wasn't expecting the |
The ticket #80050 also relates to these problems. While updating Rsfile (https://github.com/pakal/rsfile) to Python3.12, I got very confused too by the interactions between io and non-blocking pipes. Maybe the best would be indeed to prevent using buffer/text layers of io with non-blocking pipes. If an additional syscall to check for the status of fileno is a performance problem, maybe this can be let as "behaviour undefined", and just advised-against in docs? |
@pakal |
I encountered the same problem process = subprocess.Popen(["bash"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True)
os.set_blocking(process.stdout.fileno(), False)
process.stdout.read() This leads to
It seems that this issue is more complicated than I imagined, but In any case, a I think the crux of the issue lies in whether If it is a pure wrapper, it should only add decode/encode capabilities to the read/write functions. When encountering other situations, it should return an empty bytes (which should decode to an empty str) or None, or it should raise an error wrapper that returns/raises the output of rawbuffer unchanged, in order to maintain the same behavior as rawbuffer. If it is a class with wrapping capabilities, it should have a consistent behavior, appropriately handling the return/raise of the raw buffer at EOF to maintain consistency in behavior. I lean towards the class with wrapping capabilities option, as it reduces the hassle for downstream developers. However, for compatibility reasons, the pure wrapper option is also an enticing choice; it is harmless because it does not change the original behavior of TextIOWrapper. It merely chooses to return None in situations that would originally cause a TypeError. For existing code, this might just postpone the occurrence of TypeError, but it enables subsequent developers to handle situations where TypeError arises appropriately (rather than having to use try: ... expect TypeError: ...). This will not affect legacy code if they originally had no issues with TypeError. |
#122933 (to fix gh-109523) adds more documentation around behavior + raise a |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: