-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrokenPipe sometimes not reported #37807
Comments
I believe what's happening here is in Seems kinda weird to me to ignore that error, we should probably do something like this instead: if n != i + 1 || { self.flush()?; true } {
return Ok(n)
} That is, propagate an error in flush if we wrote the entire line. |
Well... maybe not. So apparently this behavior changed in #32107 to fix #32085. cc @Stebalien, thoughts? |
Interesting. The new code doesn't satisfy the "write-xor-error guarantee" either, though. It does up to three operations on the underlying Maybe the guarantee should be weakened to a reasonable-effort sort of thing. It's important in some systems-y cases where the user knows the type of the writer (e.g. a raw |
This also interacts with the documented semantics of
That's probably just an erroneous clause targeted at, for example, file descriptors. With buffered readers and writers (like a line writer) I think we fundamentally just can't uphold that guarantee. Nominating for @rust-lang/libs discussion as I think we'll want to reword that documentation. |
In some cases, An abstract interface can't reasonably guarantee that any specific downstream source has received and acknowledged the data. On Unix, even if you do a But an abstract interface can and should be expected to let its user keep track of where it needs to continue its work, in the case of an intermittent error condition like |
That clause is actually pretty important. Otherwise, you can't be sure of the state of a writer after it returns an error. For example, the following could would write "abc" twice if interrupted: LineWriter::write_all("abc\ncde\n"); When I wrote that patch, I was thinking that the next write after a failed flush would error out but obviously didn't think it though. So, a simple fix would be to record that a flush didn't happen and try to flush before writing anything else (i.e., on the next call to write). Unfortunately, this does mean that writing a newline wouldn't guarantee a flush but I don't think there's much we can do about that at this point. I believe a Alternatively, as we haven't actually written anything to the underlying file yet, |
@jimblandy yeah it's definitely true that it's useful to have this guarantee, and it may just not be possible to relax the guarantee. @Stebalien yeah on second though retrying the flush on the next |
My point is to flush before doing anything else if the previous flush-on-newline failed. Only then do we write iff that flush succeeds. If the flush fails, return an error without writing anything (no papering over written bytes). However, the previously written bytes will not have been flushed so the documentation needs to be updated to say that we try to flush on newline but that flush may fail silently in some cases (and that the user should call |
@Stebalien yeah that sounds good to me! |
Since |
Yeah that also seems reasonable to me, we could have |
Right, |
Oh, it turns out my hunch was wrong. |
Previously the `LineWriter` could successfully write some bytes but then fail to report that it has done so. Additionally, an erroneous flush after a successful write was permanently ignored. This commit fixes these two issues by (a) maintaining a `need_flush` flag to indicate whether a flush should be the first operation in `LineWriter::write` and (b) avoiding returning an error once some bytes have been successfully written. Closes rust-lang#37807
I've opened a PR as #38062 |
std: Fix partial writes in LineWriter Previously the `LineWriter` could successfully write some bytes but then fail to report that it has done so. Additionally, an erroneous flush after a successful write was permanently ignored. This commit fixes these two issues by (a) maintaining a `need_flush` flag to indicate whether a flush should be the first operation in `LineWriter::write` and (b) avoiding returning an error once some bytes have been successfully written. Closes #37807
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
…nieu Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
Substantial refactor to the design of LineWriter # Preamble This is the first in a series of pull requests designed to move forward with rust-lang#60673 (and the related [5 year old FIXME](https://github.com/rust-lang/rust/blob/ea7181b5f7a888c2cf969ae86de7207fa5fb40aa/src/libstd/io/stdio.rs#L459-L461)), which calls for an update to `Stdout` such that it can be block-buffered rather than line-buffered under certain circumstances (such as a `tty`, or a user setting the mode with a function call). This pull request refactors the logic `LineWriter` into a `LineWriterShim`, which operates on a `BufWriter` by mutable reference, such that it is easy to invoke the line-writing logic on an existing `BufWriter` without having to construct a new `LineWriter`. Additionally, fixes rust-lang#72721 ## A note on flushing Because the word **flush** tends to be pretty overloaded in this discussion, I'm going to use the word **unbuffered** to refer to a `BufWriter` sending its data to the wrapped writer via `write`, without calling `flush` on it, and I'll be using **flushed** when referring to sending data via flush, which recursively writes the data all the way to the final sink. For example, given a `T = BufWriter<BufWriter<File>>`, saying that `T` **unbuffers** its data means that it is sent to the inner `BufWriter`, but not necessarily to the `File`, whereas saying that `T` **flushes** its data means that causes it (via `Write::flush`) to be delivered all the way to `File`. # Goals Once it became clear (for reasons described below) that the best way to approach this would involve refactoring `LineWriter` to work more directly on `BufWriter`'s internals, I established the following design goals for the refactor: - Do not duplicate logic with `BufWriter`. It's great at buffering and then unbuffering data, so use the existing logic as much as possible. - Minimize superfluous copying of data into `BufWriter`'s buffer. - Eliminate calls to `BufWriter::flush` and instead do the same thing as `BufWriter::write`, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink). - Uphold the "at-most 1 write of new data" convention of `Write::write` - Minimize or eliminate dropping errors (that is, eliminate the parts of the old design that threw away errors because `write` *must* report if any bytes were written) - As much as possible, attempt to fully flush completed lines, and *not* flush partial lines. One of the advantages of this design is that, so long as we don't encounter lines larger than the `BufWriter`'s capacity, partial lines will never be unbuffered, while completed lines will *always* be unbuffered (with subsequent calls to `LineWriter::write` retrying failed writes before processing new data. # Design There are two major & related parts of the design. First, a new internal stuct, `LineWriterShim`, is added. This struct implements all of the actual logic of line-writing in a `Write` implementation, but it only operates on an `&mut BufWriter`. This means that this shim can be constructed on-the-fly to apply line writing logic to an existing `BufWriter`. This is in fact how `LineWriter` has been updated to operate, and it is also how `Stdout` is being updated in my [development branch](https://github.com/Lucretiel/rust/tree/stdout-block-buffer) to switch which mode it wants to use at runtime. [An example of how this looks in practice](https://github.com/Lucretiel/rust/blob/f24f272df674dc7fa8941b97b45f41ad08b2199b/src/libstd/io/stdio.rs#L479-L484 ) The second major part of the design that the line-buffering logic, implemented in `LineWriterShim`, has been updated to work slightly more directly on the internals of `BufWriter`. Mostly it makes us of the public interface—particularly `buffer()` and `get_mut()`—but it also controls the flushing of the buffer with `flush_buf` rather than `flush`, and it writes to the buffer infallibly with a new `write_to_buffer` method. This has several advantages: - Data no longer has to round trip through the `BufWriter`'s buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed). - The conventional contract of `write`—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to the `inner` writer. - Perhaps most importantly, `LineWriter` *no longer performs a full flush on every line.* This makes its behavior much more consistent with `BufWriter`, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously, `LineWriter` had no choice but to use `flush` to ensure that the lines were unbuffered, but by writing directly to `inner` via `get_mut()` (when appropriate), we can use a more correct behavior. ## New(ish) line buffering logic The logic for line writing has been cleaned up, as described above. It now follows this algorithm for `write`, with minor adjustments for `write_all` and `write_vectored`: - Does our input data contain a newline? - If no: - simply use the regular `BufWriter::write` to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is. - additionally, if the current buffer ends with `'\n'`, attempt to immediately flush it with `flush_buf` before calling `BufWriter::write` This reproduces the old `needs_flush` behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer *ends* with `'\n'` is discussed later. - If yes: - First, `flush_buf` - Then use `bufwriter.get_mut().write()` to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this. - If it errors, return the error - If it succeeds with a full write, add the remaining data (between the last newline and the end of the input) to the buffer. In order to uphold the "at-most 1 attempt to write new data" convention, no attempts are made to write this data to the inner writer (though obviously a subsequent write may immediately flush it, e.g., if it totally filled the buffer's capacity. - If it only partially succeeds, buffer the data only up to the last newline. We do this to try to avoid writing partial lines to the inner writer where possible (that is, whenever the lines are shorter than the total buffer capacity). While it was not my intention for this behavior to diverge from this existing `LineWriter` algorithm, this updated design emerged very naturally once `LineWriter` wasn't burdened with having to only operate via `BufWriter::flush`. There essentially two main changes to observable behavior: - `flush` is no longer used to unbuffer lines. The are only written to the writer wrapped by `LineWriter`; this inner writer might do its own buffering. This change makes `LineWriter` consistent with the behavior of `BufWriter`. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked. - Unless a line exceeds the capacity of the buffer, partial lines are not unbuffered (without the user manually calling flush). This is a less surprising behavior, and is enabled because `LineWriter` now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying on `LineWriter` unbuffering or flushing *partial* lines that are shorter than the capacity, so I'm not worried about this one. None of these changes are inconsistent with any published documentation of `LineWriter`. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized. # Alternative designs and design rationalle The initial goal of this project was to provide a way for the `LineWriter` logic to be operable directly on a `BufWriter`, so that the updated `Stdout` doesn't need to do something convoluted like `enum { BufWriter, LineWriter }` (which ends up being ~~impossible~~ difficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft. The major first version simply involved adding methods like `write_line_buffered` to `BufWriter`; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals of `BufWriter`. The idea was that `LineWriter` would simply call these methods, and the updated `Stdout` would use either `BufWriter::write` or `BufWriter::write_line_buffered`, depending on what mode it was in. The major issue with this design is that it loses the ability to take advantage of the `io::Write` trait, which provides several useful default implementations of the various io methods, such as `write_fmt` and `write_all`, just using the core methods. For this reason, the `write_line_buffered` design was retained, but moved into a separate struct called `LineWriterShim` which operates on an `&mut LineWriter`. As part of this move, the logic was lightly retooled to not touch the innards of `BufWriter` directly, but instead to make use of the unexported helper methods like `flush_buf`. The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current `LineWriter` logic (which, for example, retries partial line writes on subsequent calls to `write`) while still meeting the refactor design goals. # Next steps ~Currently, this design fails a few `LineWriter` tests, mostly because they expect `LineWriter` to *fully* flush its content. There are also some changes to the way that `LineWriter` buffers data *after* writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.~ However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it. # Test updates This design required some tests to be updated; I've research the intent behind these tests (mostly via `git blame`) and updated them appropriately. Those changes are cataloged here. - `test_line_buffer_fail_flush`: This test was added as a regression test for rust-lang#32085, and is intended to assure that an errors from `flush` aren't propagated when preceded by a successful `write`. Because type of issue is no longer possible, because `write` calls `buffer.get_mut().write()` instead of `buffer.write(); buffer.flush();`, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases. - `erroneous_flush_retried`: This test was added as a regression test for rust-lang#37807, and was intended to ensure that flush-retrying (via `needs_flush`) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. The `should_flush` behavior is captured by checking for a trailing newline in the `LineWriter` buffer; this test now checks that behavior. - `line_vectored`: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation of `write_vectored` used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written. - `line_vectored_partial_and_errors`: Pretty similiar to `line_vectored`, above; this test is for basic error recovery in `write_vectored` for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.
Try piping this program's stdout to
head
. After the first 10 lines,head
will exit. The pipe is now broken, so the next write should fail, and we should getthread 'main' panicked at 'failed printing to stdout: Broken pipe (os error 32)'
Instead, the Rust program keeps running indefinitely. GDB says
SIGPIPE
is "received" (but I guess ignored) by the process every time it tries to write.Feels like a race condition: if you remove the
sleep
or just change it to be short,println!
panics as expected.tested on Linux,
rustc 1.15.0-nightly (0ed951993 2016-11-14)
The text was updated successfully, but these errors were encountered: