Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs on safety of BufWriter are misleading #136025

Open
rgr21 opened this issue Jan 24, 2025 · 8 comments
Open

Docs on safety of BufWriter are misleading #136025

rgr21 opened this issue Jan 24, 2025 · 8 comments
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@rgr21
Copy link

rgr21 commented Jan 24, 2025

Location

/// It can be excessively inefficient to work directly with something that

Summary

This one is long winded. Sorry. A reader may decide this represents an actual code bug. I believe it is at least a documentation bug. I may just be rediscovering something every other programmer already knows.

let mut file = std::fs::File::create("test")?;
file.write_all(b"Hello, world!")?;
file.flush()?; // This does nothing. It is implemented as return Ok()
file.sync_all()?; // This issues an fsync
drop(file); // Close happens here. Docs correctly say that errors are ignored, sync_all *must* have happened to force the fsync.

In my scenario I am writing a file to an NFS mount from Linux. The syscalls are of the pattern open+write+write+...+close.
Close can fail, and that can prevent data from previous writes (which reported success) from being actually saved. I have no way of seeing the output of close. So I need to call sync_all, giving me a safe open+write+write+...+fsync+close.

/// It can be excessively inefficient to work directly with something that

/// It can be excessively inefficient to work directly with something that
/// implements [Write]. [...]
///
/// BufWriter<W> can improve the speed of programs that make small and
/// repeated write calls to the same file or network socket. [...]
///
/// It is critical to call [flush] before BufWriter<W> is dropped. Though
/// dropping will attempt to flush the contents of the buffer, any errors
/// that happen in the process of dropping will be ignored. Calling [flush]
/// ensures that the buffer is empty and thus dropping will not even attempt
/// file operations.

This really advertises itself for the use case

let mut buf = BufWriter::new(std::fs::File::create("test")?)

and then gives guidance on how to use it safely. This talks about errors during drop being lost (correct), says that calling flush is critical (true, but in a misleading way as the call to the inner file.flush is pointless). It does talk about files, not any old Write, steering me further towards madness.

Because I need to call

buf.into_inner()?.sync_all()?;

to avoid risk of silent data loss that is otherwise unobservable to the application.

I propose changing the wording at

/// It is critical to call [`flush`] before `BufWriter<W>` is dropped. Though
to indicate that flush is necessary but not sufficient, and perhaps also adding as an example use of into_inner().

@rgr21 rgr21 added the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Jan 24, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 24, 2025
@rgr21
Copy link
Author

rgr21 commented Jan 24, 2025

Related: #98338

@the8472
Copy link
Member

the8472 commented Jan 24, 2025

They address two different kinds of problems.

flush ensures that bytes make it from the userspace buffer to the OS/environment (assuming there's something doing actual IO behind the Write impls) so that even if your program gets killed the OS can still perform writeback at its leisure.
It's also necessary to make data visible to other parts of the system.

sync_all is about telling the OS to perform the writeback now and report failures. This addresses a different cause of failures, such as power loss or network filesystem disconnects. It's also a lot slower.

@rgr21
Copy link
Author

rgr21 commented Jan 24, 2025

Yes. I think File API/docs make this clear. I think BufWriter is somewhat misleading. In my scenario the OS has reported an error to the application. The application has followed the docs on BufWriter, but has not seen that error and has lost data as a consequence.
The application has not also followed the docs on File, the application is at fault here (given the choice by Rust to not expose that close/drop error). I think we can make it easier for others in future.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jan 24, 2025

Dropping a BufWriter without flushing and checking for errors drops errors from write calls. Observing these errors with manual flushing is important whether or not you also call sync_all. It's also important for e.g. BufWriter<&File>, where dropping the BufWriter doesn't close the file, or for BufWrite<&mut [u8]> (dubious but can occur in generic code) where there's no file to begin with but writes can still fail.

It's true that, *if *you want to call sync_all on every file before closing it, you also need to do so if the file is owned by a BufWriter because dropping a BufWriter<T> also drops the underlying T. But when you put it like that, it seems fairly redundant in the larger Rust context. Perhaps a brief note about it could be added to the docs, but it's not particularly BufWriter-specific, that's just a corollary of ownership/RAII.

@the8472
Copy link
Member

the8472 commented Jan 24, 2025

It's unclear to me whether you care about durability or about error reporting here.

If your goal is durability then you need sync_all anyway. This has nothing to do with flush or bufwriter, neither promise durability because the distinction is not a part of the generic Write of BufWriter types but specific to File IO.
That some NFS impls have close act like sync_all is implementation-specific and not guaranteed, AIUI mounting with nocto will disable this.

If you don't need durability but still want to report errors (which is kind of weird, since in this case you'd also miss writeback errors after closing a file...) then we could add a File::close(self) -> Result, but see prior discussion in #98338.

@rgr21
Copy link
Author

rgr21 commented Jan 24, 2025

Fair question. To be awkward, there is a tradeoff between speed and correctness and ergonomics. I think ergonomics could be improved with no loss of performance. Or: there is a class of failure that isn't visible without an fsync. But there is a class of failure that is and we miss that opportunity.

(Yes, my users are using NFS with async - docs and tests indicate sync on close happens.)

https://github.com/pola-rs/polars/blob/1993d59c31ca5632f2f8fbdd7af413eeafac6e69/crates/polars-python/src/dataframe/io.rs#L480 as a example from broader ecosystem. This pattern is both common and I think encouraged by current docs. Flush is called correctly somewhere, then close happens on drop.
The OS reports an error 'your data may not have been written', the user is not given chance to deal with this.

Maybe I am just flawed in expectation: I generally expect Rust error handling to stick it in my face and make sure I have dealt with it, compiler enforcing that I have done so. The underlying decision to hide errors from close makes this one counterintuitive. Close docs do say it returns errors and that these can indicate failure to write.
I don't propose changing this - too late (even if it is right) - but a minor doc tweak will help.

@the8472
Copy link
Member

the8472 commented Jan 24, 2025

Or: there is a class of failure that isn't visible without an fsync. But there is a class of failure that is and we miss that opportunity.

No, not really. It's just that on NFS with cto specifically close also acts as fsync, with all the performance penalties. You get the error reporting of fsync at the price of fsync.

If you're saying "if we already paid the cost in this case, we might as well surface an error", then yeah, ok. But that's just nice-to-have diagnostics, not guaranteed behavior on which you can build reliable applications.

The underlying decision to hide errors from close makes this one counterintuitive

If you never called fsync then, generically, close is allowed to do nothing and defer all the work, which means deferring all the IO errors to where the program cannot see them. So by closing without sync you already told the OS you don't care about errors here, best effort is enough. Rust piggybacks on your signaled intent here.

I don't propose changing this - too late (even if it is right) - but a minor doc tweak will help.

Well, BufWriter<W> is generic for all kinds of writers, so we don't really want to document File-specific stuff there. So any tweaks (I'm not sure what exactly you're proposing) would have to be careful there.

@saethlin saethlin added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 25, 2025
@rgr21
Copy link
Author

rgr21 commented Feb 1, 2025

How about master...rgr21:rust:patch-1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants