-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Substantial refactor to the design of LineWriter #72808
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
Way outside my wheelhouse, and I currently have little time for reviews anyway. r? @BurntSushi |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, amazing work! I really appreciate the detailed write up. I found myself agreeing with everything you were saying and like the rationale. I think all of my comments are fairly minor.
I don't do many reviews in std, so it would probably be wise to get someone else to give this a sanity check as well. cc @rust-lang/libs
I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.
Unfortunately I don't know the answer to this. This might require a git blame
to ping the person who wrote them.
I've been doing git blame and it's been very helpful; usually the commit links to the specific github issue being resolved, which makes it easy to discover the specific intents. I'm cataloging that stuff in the "Tests" section. |
All failing pre-existing tests have been updated, and the updates have been noted in the "Tests" section of the pull request description. Also added several new tests. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@BurntSushi can you detach #60673 from this issue? I don't want it to be incorrectly closed if/when this is merged. |
@Lucretiel you might be able to unattach it by editing your comment to break github's pattern matching, e.g. misspell |
That fixed it, thank you! |
r? @Amanieu |
src/libstd/io/buffered.rs
Outdated
|
||
// This is what we're going to try to write directly to the inner | ||
// writer. The rest will be buffered, if nothing goes wrong. | ||
let lines = &buf[..newline_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can just write the entire buf
directly to the inner writer. AFAIK LineWriter
doesn't make any guarantees that partial lines are actually buffered, it only guarantees that newlines will be written out immediately instead of being buffered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are multiple concurrent writers to a pipe it is still useful to buffer up whole lines to avoid interleaving text. This is a common problem with multithreaded programs printing to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly considered that. However, in my opinion, taking steps to only write whole lines is the least surprising behavior; my expectation as a client of this API would be that (outside of very-long lines) it only flushes whole lines, regardless of the shape of the input.
Received the review, will get to it as soon as I'm able |
I should clarify and correct that the target in question appears to be
Which appears to bind inside sys::unix to os::emscripten pub const EBADF: ::c_int = 8; HOWEVER that was a recent change libc = { version = "0.2.51", default-features = false, features = ['rustc-dep-of-std'] } So I opened #75515 to bump it to the latest. Mind, that shouldn't affect it due to the Cargo.lock using 0.2.74 already, but at this point I'm not sure we don't have to check the world isn't going mad. |
@Lucretiel the test cannot fail on current master, because current master is always a commit that has passed tests. I'm really not the right person to talk to about this, though. |
Just to note here: it appears (via @tmiasko) that this test is currently succeeding improperly, with a description and fix in #75532. The core of the issue is that currently, |
Bump minor version of emsdk to 1.38.47 Release Notes: ``` v1.38.47: 10/02/2019 -------------------- - Add support for FETCH API in WASM backend. This doesn't support FETCH in the main thread (`USE_FETCH_WORKER=0` is enforced). rust-lang#9490 - Redefine errno values to be consistent with wasi. This will let us avoid needing to convert the values back and forth as we use more wasi APIs. This is an ABI change, which should not be noticeable from user code unless you use errno defines (like EAGAIN) *and* keep around binaries compiled with an older version that you link against. In that case, you should rebuild them. See rust-lang#9545. - Removed build option `-s ONLY_MY_CODE` as we now have much better solutions for that, like building to a wasm object file or using `STANDALONE_WASM` etc. (see https://github.com/emscripten-core/emscripten/wiki/WebAssembly-Standalone). - Emscripten now supports the config file (.emscripten) being placed in the emscripten directory rather that the current user's home directory. See rust-lang#9543 ``` Motivated by changes to errno values which are currently out of sync with those in libc crate which uses wasi values already. Helps with rust-lang#72808 and rust-lang#75532.
Bump minor version of emsdk to 1.38.47 Release Notes: ``` v1.38.47: 10/02/2019 -------------------- - Add support for FETCH API in WASM backend. This doesn't support FETCH in the main thread (`USE_FETCH_WORKER=0` is enforced). rust-lang#9490 - Redefine errno values to be consistent with wasi. This will let us avoid needing to convert the values back and forth as we use more wasi APIs. This is an ABI change, which should not be noticeable from user code unless you use errno defines (like EAGAIN) *and* keep around binaries compiled with an older version that you link against. In that case, you should rebuild them. See rust-lang#9545. - Removed build option `-s ONLY_MY_CODE` as we now have much better solutions for that, like building to a wasm object file or using `STANDALONE_WASM` etc. (see https://github.com/emscripten-core/emscripten/wiki/WebAssembly-Standalone). - Emscripten now supports the config file (.emscripten) being placed in the emscripten directory rather that the current user's home directory. See rust-lang#9543 ``` Motivated by changes to errno values which are currently out of sync with those in libc crate which uses wasi values already. Helps with rust-lang#72808 and rust-lang#75532.
library/std/src/io/buffered.rs
Outdated
// Flush existing content to prepare for our write | ||
self.buffer.flush_buf()?; | ||
|
||
// This is what we're going to try to write directly to the inner | ||
// writer. The rest will be buffered, if nothing goes wrong. | ||
let (lines, tail) = buf.split_at(newline_idx + 1); | ||
|
||
// Write `lines` directly to the inner writer, bypassing the buffer. | ||
self.inner_mut().write_all(lines)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using println!
, one desirable property is that the formatted output is
written using a single syscall. To take advantage of write atomicity, and avoid
interleaving the output with that from other processes that might be sharing
the same stdout.
The implementation here performs two syscalls whenever there are any formatting
operations in println!
. The presence of terminal \n
triggers the flush of
existing buffer content and then the final string with a newline is flushed
separately.
For example, println!("Hello {}!", "world");
, leads to:
write(1, "Hello world", 11) = 11
write(1, "!\n", 2) = 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It's a good point. The problem is this:
Suppose that, as a solution (in your example), we buffer the second print, so that we can do the whole thing in a single write. At that point, we are obligated to return an Ok(2), because we have "successfully" written 2 bytes (by adding them to the buffer). This means that we can't actually attempt to send the byte to the underlying device, because if the device returns an error, we'd have to ignore that error (or save it and return it on the next write), which (along with extra, unnecessary data buffering) is something I was trying to avoid.
One workaround to this would be to buffer the data, attempt a write, and if the write returns an error, remove that data from the buffer. However, in that case, if the first write returns Ok(12) (representing "Hello World!") and the second write returns an error, there's no way to report a consistent result to the caller without also suppressing that error.
That being said, this problem probably can be addressed in the write_all
specialization (used by default write_fmt
), since it's not necessary to try to preserve a recoverable / knowable state in the event of an error there, like it is with write
. I'll see about retooling write_all
a bit to address this concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I've updated write_all
to address this concern, and added a test verifying the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing this! I expected that doing so might be challenging in
context of other goals. Fortunately, the fact that formatting goes through
write_all
made it quite manageable. Nice!
`LineWriter::write_all` now only emits a single write when writing a newline when there's already buffered data.
@bors r+ |
📌 Commit c91e764 has been approved by |
☀️ Test successful - checks-actions, checks-azure |
🤯🤯🤯🤯 we actually got the thing merged! |
It's not quite big enough to warrant inclusion in the perf triage report for this week, but the |
Oh, neat! If I had to guess it probably had to do more with the |
Preamble
This is the first in a series of pull requests designed to move forward with #60673 (and the related 5 year old FIXME), which calls for an update to
Stdout
such that it can be block-buffered rather than line-buffered under certain circumstances (such as atty
, or a user setting the mode with a function call). This pull request refactors the logicLineWriter
into aLineWriterShim
, which operates on aBufWriter
by mutable reference, such that it is easy to invoke the line-writing logic on an existingBufWriter
without having to construct a newLineWriter
.Additionally, fixes #72721
A note on flushing
Because the word flush tends to be pretty overloaded in this discussion, I'm going to use the word unbuffered to refer to a
BufWriter
sending its data to the wrapped writer viawrite
, without callingflush
on it, and I'll be using flushed when referring to sending data via flush, which recursively writes the data all the way to the final sink.For example, given a
T = BufWriter<BufWriter<File>>
, saying thatT
unbuffers its data means that it is sent to the innerBufWriter
, but not necessarily to theFile
, whereas saying thatT
flushes its data means that causes it (viaWrite::flush
) to be delivered all the way toFile
.Goals
Once it became clear (for reasons described below) that the best way to approach this would involve refactoring
LineWriter
to work more directly onBufWriter
's internals, I established the following design goals for the refactor:BufWriter
. It's great at buffering and then unbuffering data, so use the existing logic as much as possible.BufWriter
's buffer.BufWriter::flush
and instead do the same thing asBufWriter::write
, which is to only write to the wrapped writer (rather than flushing all the way down to the final data sink).Write::write
write
must report if any bytes were written)BufWriter
's capacity, partial lines will never be unbuffered, while completed lines will always be unbuffered (with subsequent calls toLineWriter::write
retrying failed writes before processing new data.Design
There are two major & related parts of the design.
First, a new internal stuct,
LineWriterShim
, is added. This struct implements all of the actual logic of line-writing in aWrite
implementation, but it only operates on an&mut BufWriter
. This means that this shim can be constructed on-the-fly to apply line writing logic to an existingBufWriter
. This is in fact howLineWriter
has been updated to operate, and it is also howStdout
is being updated in my development branch to switch which mode it wants to use at runtime.An example of how this looks in practice
The second major part of the design that the line-buffering logic, implemented in
LineWriterShim
, has been updated to work slightly more directly on the internals ofBufWriter
. Mostly it makes us of the public interface—particularlybuffer()
andget_mut()
—but it also controls the flushing of the buffer withflush_buf
rather thanflush
, and it writes to the buffer infallibly with a newwrite_to_buffer
method. This has several advantages:BufWriter
's buffer. If the user provides a complete line, that line is written directly to the inner writer (after ensuring the existing buffer is flushed).write
—that at-most 1 attempt to write new data is made—is much more cleanly upheld, because we don't have to perform fallible flushes and perform semi-complicated logic of trying to pretend errors at different stages didn't happen. Instead, after attempting to write lines directly to the buffer, we can infallibly add trailing data to the buffer without allowing any attempts to continue writing it to theinner
writer.LineWriter
no longer performs a full flush on every line. This makes its behavior much more consistent withBufWriter
, which unbuffers data to its inner writer, without trying to flush it all the way to the final device. Previously,LineWriter
had no choice but to useflush
to ensure that the lines were unbuffered, but by writing directly toinner
viaget_mut()
(when appropriate), we can use a more correct behavior.New(ish) line buffering logic
The logic for line writing has been cleaned up, as described above. It now follows this algorithm for
write
, with minor adjustments forwrite_all
andwrite_vectored
:BufWriter::write
to write it; this will append it to the buffer and/or flush it as necessary based on how full the buffer is and how much input data there is.'\n'
, attempt to immediately flush it withflush_buf
before callingBufWriter::write
This reproduces the oldneeds_flush
behavior and ensures completed lines are flushed as soon as possible. The reason we only check if the buffer ends with'\n'
is discussed later.flush_buf
bufwriter.get_mut().write()
to write the input data directly to the underlying writer, up to the last newline. Make at most one attempt at this.While it was not my intention for this behavior to diverge from this existing
LineWriter
algorithm, this updated design emerged very naturally onceLineWriter
wasn't burdened with having to only operate viaBufWriter::flush
. There essentially two main changes to observable behavior:flush
is no longer used to unbuffer lines. The are only written to the writer wrapped byLineWriter
; this inner writer might do its own buffering. This change makesLineWriter
consistent with the behavior ofBufWriter
. This is probably the most obvious user-visible change; it's the one I most expect to provoke issue reports, if any are provoked.LineWriter
now has more precise control of what data is buffered and when it is unbuffered. I'd be surprised if anyone is relying onLineWriter
unbuffering or flushing partial lines that are shorter than the capacity, so I'm not worried about this one.None of these changes are inconsistent with any published documentation of
LineWriter
. Nonetheless, like all changes with user-facing behavior changes, this design will obviously have to be very carefully scrutinized.Alternative designs and design rationalle
The initial goal of this project was to provide a way for the
LineWriter
logic to be operable directly on aBufWriter
, so that the updatedStdout
doesn't need to do something convoluted likeenum { BufWriter, LineWriter }
(which ends up beingimpossibledifficult to transition between states after being constructed). The design went through several iterations before arriving at the current draft.The major first version simply involved adding methods like
write_line_buffered
toBufWriter
; these would contain the actual logic of line-buffered writing, and would additionally have the advantages (described above) of operating directly on the internals ofBufWriter
. The idea was thatLineWriter
would simply call these methods, and the updatedStdout
would use eitherBufWriter::write
orBufWriter::write_line_buffered
, depending on what mode it was in.The major issue with this design is that it loses the ability to take advantage of the
io::Write
trait, which provides several useful default implementations of the various io methods, such aswrite_fmt
andwrite_all
, just using the core methods. For this reason, thewrite_line_buffered
design was retained, but moved into a separate struct calledLineWriterShim
which operates on an&mut LineWriter
. As part of this move, the logic was lightly retooled to not touch the innards ofBufWriter
directly, but instead to make use of the unexported helper methods likeflush_buf
.The other design evolutions were mostly related to answering questions like "how much data should be buffered", "how should partial line writes be handled", etc. As much as possible I tried to answer these by emulating the current
LineWriter
logic (which, for example, retries partial line writes on subsequent calls towrite
) while still meeting the refactor design goals.Next steps
Currently, this design fails a fewLineWriter
tests, mostly because they expectLineWriter
to fully flush its content. There are also some changes to the way thatLineWriter
buffers data after writing completed lines, aimed at ensuring that partial lines are not unbuffered prematurely. I want to make sure I fully understand the intent behind these tests before I either update the test or update this design so that they pass.However, in the meantime I wanted to get this published so that feedback could start to accumulate on it. There's a lot of errata around how I arrived at this design that didn't really fit in this overlong document, so please ask questions about anything that confusing or unclear and hopefully I can explain more of the rationale that led to it.
Test updates
This design required some tests to be updated; I've research the intent behind these tests (mostly via
git blame
) and updated them appropriately. Those changes are cataloged here.test_line_buffer_fail_flush
: This test was added as a regression test for Don't return an error after a partial write inLineWriter
#32085, and is intended to assure that an errors fromflush
aren't propagated when preceded by a successfulwrite
. Because type of issue is no longer possible, becausewrite
callsbuffer.get_mut().write()
instead ofbuffer.write(); buffer.flush();
, I'm simply removing this test entirely. Other, similar error invariants related to errors during write-retrying are handled in other test cases.erroneous_flush_retried
: This test was added as a regression test for BrokenPipe sometimes not reported #37807, and was intended to ensure that flush-retrying (vianeeds_flush
) and error-ignoring were being handled correctly (ironically, this issue was caused by the flush-error-ignoring, above). Half of that issue is not possible by design with this refactor, because we no longer make fallible i/o calls that might produce errors we have to ignore after unbuffering lines. Theshould_flush
behavior is captured by checking for a trailing newline in theLineWriter
buffer; this test now checks that behavior.line_vectored
: changes here were pretty minor, mostly related to when partial lines are or aren't written. The old implementation ofwrite_vectored
used very complicated logic to precisely determine the location of the last newline and precisely write up to that point; this required doing several consecutive fallible writes, with all the complex error handling or ignoring issues that come with it. The updated design does at-most one write of a subset of total buffers (that is, it doesn't split in the middle of a buffer), even if that means writing partial lines. One of the major advantages of the new design is that the underlying vectored write operation on the device can be taken advantage of, even with small writes, so long as they include a newline; previously these were unconditionally buffered then written.line_vectored_partial_and_errors
: Pretty similiar toline_vectored
, above; this test is for basic error recovery inwrite_vectored
for vectored writes. As previously discussed, the mocked behavior being tested for (errors ignored under certain circumstances) no occurs, so I've simplified the test while doing my best to retain its spirit.