-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse-file handling for the Linux implementation of std::fs::copy() #55909
Conversation
… sparse aware parts.
…provided testing tmpdir impl.
r? @dtolnay (rust_highfive has picked a reviewer for you, use r? to override) |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
…ly on copy_file_range().
I can‘t really review right now since I‘m on vacation. |
Reassigning because I am swamped this week: |
Afaik every call to |
I have posted some ideas on internals how this could be optimized further, but that's out of scope for this PR. |
Good point, I've cleaned it up. |
|
||
// Slightly modified version of io::copy() that only copies a set amount of bytes. | ||
fn copy_bytes_uspace(mut reader: &File, mut writer: &File, nbytes: usize) -> io::Result<u64> { | ||
const BLKSIZE: usize = 4 * 1024; // Assume 4k blocks on disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reasonable default for io::copy
that handles generic Read
and Write
types but for fs::copy
we know we're doing syscalls and can probably do better. E.g. gnu cp has bloated up over the years and now does 128KiB or stat.blksize, whichever is larger.
This is probably due to ever-growing throughput (thus more syscalls per unit of time) and larger true block sizes, e.g. on flash and raid storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and of course it only does this for large files.
On the other hand the rust std default buffer size was dropped to rust 8k in #32695 due to problems with jemalloc. But that might not be relevant here since jemalloc is not the default anymore and this would only apply to copying large files.
GNU cp does |
Thanks for the PR @tarka! Unfortunately I'm very swamped right now and this is a quite a large amount of code that I won't have time to review. Others from @rust-lang/libs may be interested in this as well. I'm not entirely certain if we want to have such complicated code for this in the standard library, this may be crossing into the boundary of "should start on crates.io and maybe move some time afterwards". If no one else gets around to this I will try to get around to it in the coming weeks |
let out_meta = outfd.metadata()?; | ||
|
||
let (is_sparse, is_xmount) = copy_parms(&in_meta, &out_meta)?; | ||
let uspace = is_xmount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check may soon be too strict in new kernels, cross-device clone_file_range might become supported e.g. between NFS mounts or multiple subvolumes of btrfs systems.
Instead one should always try copy_file_range
and fall back to regular copy on the first EXDEV
.
Thanks @alexcrichton, understood. @the8472 Thanks for the pointers. Based on Alex's comments I might hold off on any changes until it's decided if this is appropriate for std. That said, even if this is accepted into std I think there is a place for an external library with more advanced features and tunables. In particular with xcp I had to duplicate most of what was in |
Ping from triage. @rust-lang/libs can someone review this? |
r? @sfackler |
I don't think I'm really convinced that the significant extra complexity here is worth it. I'd probably agree with @alexcrichton that this may be best served starting externally. |
ping from triage. Closing this since it is better to have this started externally. |
This PR adds sparse-file support to the Linux implementation of
std::fs:copy()
. The current implementation uses the Linux sys-callcopy_file_range()
, which has a number of benefits over user-space copying, but it will expand 'holes' in any sparse files if not explicitly checked-for. This PR also adds a user-space copy function as a fall back for cases wherecopy_file_range()
is unavailable or cannot handle the operation.There are a few judgement-call changes made to support the additional code; I'm happy to take pointers on style if this doesn't fit with the overall Rust codebase:
#[cfg()]
s, minimise nesting, and simplify testing.It's worth noting that the user-space copy and sparse-hole seek code could be ported to other Unix-like OSs that support sparse-files, however I didn't want to get too into the weeds.
cc @nicokoch who added the original
copy_file_range()
implementation.