Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: Make file copy ops use zero-copy mechanisms #6516

Merged
merged 5 commits into from
Oct 9, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions lib/std/c/darwin.zig
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,16 @@ pub extern "c" fn _dyld_get_image_header(image_index: u32) ?*mach_header;
pub extern "c" fn _dyld_get_image_vmaddr_slide(image_index: u32) usize;
pub extern "c" fn _dyld_get_image_name(image_index: u32) [*:0]const u8;

pub const COPYFILE_ACL = 1 << 0;
pub const COPYFILE_STAT = 1 << 1;
pub const COPYFILE_XATTR = 1 << 2;
pub const COPYFILE_DATA = 1 << 3;

pub const copyfile_state_t = *@Type(.Opaque);
pub extern "c" fn copyfile_state_alloc() copyfile_state_t;
pub extern "c" fn copyfile_state_free(state: copyfile_state_t) c_int;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the alloc/free are now unused and can get the boot before/after merging.

pub extern "c" fn fcopyfile(from: fd_t, to: fd_t, state: ?copyfile_state_t, flags: u32) c_int;

pub extern "c" fn @"realpath$DARWIN_EXTSN"(noalias file_name: [*:0]const u8, noalias resolved_name: [*]u8) ?[*:0]u8;

pub extern "c" fn __getdirentries64(fd: c_int, buf_ptr: [*]u8, buf_len: usize, basep: *i64) isize;
Expand Down
2 changes: 1 addition & 1 deletion lib/std/fs.zig
Original file line number Diff line number Diff line change
Expand Up @@ -1823,7 +1823,7 @@ pub const Dir = struct {
var atomic_file = try dest_dir.atomicFile(dest_path, .{ .mode = mode });
defer atomic_file.deinit();

try atomic_file.file.writeFileAll(in_file, .{ .in_len = size });
try os.copy_file(in_file.handle, atomic_file.file.handle, .{});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for in_len mention:

    /// If the size of the source file is known, passing the size here will save one syscall.
   in_len: ?u64 = null,

Did we gain a redundant fstat syscall in the case that sendfile has to be used? I know we've had this discussion before, and your position is that it doesn't matter in practice, but this is fundamental to zig's premise as a language and standard library, that it does the "optimal" interaction with the OS. I think it's an important sign of implementation quality for an strace / ProcMon session to contain only necessary syscalls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we gain a redundant fstat syscall in the case that sendfile has to be used?

No, both copy_file_range and sendfile are asked to read/write as many bytes as possible at each iteration. This saves a stat() (that's done anyway if when the file mode is copied over, I'd expect copyFile to copy the timestamps too but that's a different topic)

return atomic_file.finish();
}

Expand Down
104 changes: 94 additions & 10 deletions lib/std/os.zig
Original file line number Diff line number Diff line change
Expand Up @@ -4981,19 +4981,15 @@ pub const CopyFileRangeError = error{
pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len: usize, flags: u32) CopyFileRangeError!usize {
const use_c = std.c.versionCheck(.{ .major = 2, .minor = 27, .patch = 0 }).ok;

// TODO support for other systems than linux
const try_syscall = comptime std.Target.current.os.isAtLeast(.linux, .{ .major = 4, .minor = 5 }) != false;

if (use_c or try_syscall) {
if (std.Target.current.os.tag == .linux and
(use_c or has_copy_file_range_syscall.get() != 0))
{
const sys = if (use_c) std.c else linux;

var off_in_copy = @bitCast(i64, off_in);
var off_out_copy = @bitCast(i64, off_out);

const rc = sys.copy_file_range(fd_in, &off_in_copy, fd_out, &off_out_copy, len, flags);

// TODO avoid wasting a syscall every time if kernel is too old and returns ENOSYS https://github.com/ziglang/zig/issues/1018

switch (sys.getErrno(rc)) {
0 => return @intCast(usize, rc),
EBADF => unreachable,
Expand All @@ -5005,9 +5001,14 @@ pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len
EOVERFLOW => return error.Unseekable,
EPERM => return error.PermissionDenied,
ETXTBSY => return error.FileBusy,
EINVAL => {}, // these may not be regular files, try fallback
EXDEV => {}, // support for cross-filesystem copy added in Linux 5.3, use fallback
ENOSYS => {}, // syscall added in Linux 4.5, use fallback
// these may not be regular files, try fallback
EINVAL => {},
// support for cross-filesystem copy added in Linux 5.3, use fallback
EXDEV => {},
// syscall added in Linux 4.5, use fallback
ENOSYS => {
has_copy_file_range_syscall.set(0);
},
else => |err| return unexpectedErrno(err),
}
}
Expand All @@ -5021,6 +5022,89 @@ pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len
return pwrite(fd_out, buf[0..amt_read], off_out);
}

var has_copy_file_range_syscall = std.atomic.Int(u1).init(1);

pub const CopyFileOptions = struct {};

pub const CopyFileError = error{
BadFileHandle,
SystemResources,
FileTooBig,
InputOutput,
IsDir,
OutOfMemory,
NoSpaceLeft,
Unseekable,
PermissionDenied,
FileBusy,
} || FStatError || SendFileError;

/// Transfer all the data between two file descriptors in the most efficient way.
/// No metadata is transferred over.
pub fn copy_file(fd_in: fd_t, fd_out: fd_t, options: CopyFileOptions) CopyFileError!void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "posix layer" API function that this represents? It looks like the only one that has such a function is Darwin, in which case this should be called fcopyfile and not have an options parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no POSIX equivalent, we can just move this into fs as a private helper and call it a day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw if you haven't seen it: #5019

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw if you haven't seen it: #5019

IMO we don't need a posix layer, we only need a std.os that does things (eg. copyFile, writeData) rather than trying to paper over all the differences between different OSs. If one wants a specific posix-specific/Windows-specific function they're more than welcome to do so by explicitly writing it out, but a full-blown Posix compatibility layer is not something that belongs into a stdlib.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that creating a full blown posix compatibility layer is not a goal in and of itself. But it's an implementation detail of providing a std that does things (e.g. std.fs). It's code that we would end up writing - in fact did end up writing - as an implementation detail of the higher level cross platform abstractions. "posix layer" is just a way to think about how it is organized.

if (comptime std.Target.current.isDarwin()) {
const rc = system.fcopyfile(fd_in, fd_out, null, system.COPYFILE_DATA);
switch (errno(rc)) {
0 => return,
EINVAL => unreachable,
ENOMEM => return error.SystemResources,
// The source file was not a directory, symbolic link, or regular file.
// Try with the fallback path before giving up.
ENOTSUP => {},
else => |err| return unexpectedErrno(err),
}
}

if (std.Target.current.os.tag == .linux) {
// Try copy_file_range first as that works at the FS level and is the
// most efficient method (if available).
if (has_copy_file_range_syscall.get() != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the target version range to determine which state we are in:

  • the target OS version range guarantees kernel will have copy_file_range. Skip the runtime check.
  • the target OS version range guarantees kernel will not have copy_file_range. Skip the runtime check.
  • the target OS version range spans a kernel version that has the syscall and a version that does not have the syscall. Do the runtime check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, it's enough to preload has_copy_file_range_syscall with 0 if the version check fails.

cfr_loop: while (true) {
// The kernel checks `file_pos+count` for overflow, use a 32 bit
// value so that the syscall won't return EINVAL except for
// impossibly large files.
const rc = linux.copy_file_range(fd_in, null, fd_out, null, math.maxInt(u32), 0);
switch (errno(rc)) {
0 => {},
EBADF => return error.BadFileHandle,
EFBIG => return error.FileTooBig,
EIO => return error.InputOutput,
EISDIR => return error.IsDir,
ENOMEM => return error.OutOfMemory,
ENOSPC => return error.NoSpaceLeft,
EOVERFLOW => return error.Unseekable,
EPERM => return error.PermissionDenied,
ETXTBSY => return error.FileBusy,
// These may not be regular files, try fallback
EINVAL => break :cfr_loop,
// Support for cross-filesystem copy added in Linux 5.3, use fallback
EXDEV => break :cfr_loop,
// Syscall added in Linux 4.5, use fallback
ENOSYS => {
has_copy_file_range_syscall.set(0);
break :cfr_loop;
},
else => |err| return unexpectedErrno(err),
}
// Terminate when no data was copied
if (rc == 0) return;
}
// This point is reached when an error occurred, hopefully no data
// was transferred yet
}
}

// Sendfile is a zero-copy mechanism iff the OS supports it, otherwise the
// fallback code will copy the contents chunk by chunk.
const empty_iovec = [0]iovec_const{};
var offset: u64 = 0;
sendfile_loop: while (true) {
const amt = try sendfile(fd_out, fd_in, offset, 0, &empty_iovec, &empty_iovec, 0);
if (amt == 0) break :sendfile_loop;
offset += amt;
}
}

pub const PollError = error{
/// The kernel had no space to allocate file descriptor tables.
SystemResources,
Expand Down