Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ReaderFrom/WriterTo for Linux (splice) #29

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cpuguy83
Copy link
Member

On Linux, we can use splice to optimize copies to happen in kernel
when copying to other files.
Since this is already a pipe, this should always work when copying
to/from another file.

@cpuguy83 cpuguy83 changed the title Add ReaderFrom/WriterTo for Linux Add ReaderFrom/WriterTo for Linux (splice) Nov 21, 2020
@cpuguy83 cpuguy83 force-pushed the add_readerfrom_writerto branch 5 times, most recently from 0a81acb to abf9d52 Compare November 21, 2020 06:32
// splice not supported on kernel
atomic.StoreInt32(&spliceSupported, 0)
return true
case syscall.EINVAL, syscall.EOPNOTSUPP, syscall.EPERM:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would syscall.EOPNOTSUPP here indicate the process is unable to perform the syscall regardless of whether the kernel supports it? If this doesn't depend on input would it make sense to treat it like ENOSYS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think EOPNOTSUP is more about the transport than the actual system.
For instance, I'd expect EOPNOTSUP if the read side is a unix socket which does not support splicing... but I have not tested this... it may actually be worth adding tests for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the specific case may actually be EINVAL...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, its not super clear when these standard return values would be returned in this case and better not to set a global value

raw_linux.go Outdated Show resolved Hide resolved
raw_linux.go Outdated Show resolved Hide resolved
raw_linux_test.go Outdated Show resolved Hide resolved
@cpuguy83 cpuguy83 force-pushed the add_readerfrom_writerto branch 2 times, most recently from 6e8e675 to 2e799de Compare July 30, 2021 00:39
On Linux, we can use `splice` to optimize copies to happen in kernel
when copying to other files.
Since this is already a pipe, this should always work when copying
to/from another file.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@cpuguy83
Copy link
Member Author

Updated this, added some new test cases and improved the implementation slightly:

  1. EINTR handling was breaking out of the loop instead of just retrying - fixed
  2. Before the max amount that could be copied before the new methods would return was 1<<62... which is a lot... but is not expected since we want to copy until EOF (splice copies 0 bytes), so the internal copy takes a special value, -1, to mean copy to EOF. This is important because we have special handling for when the reader passed in to ReadFrom is an *io.LimitedReader and this was originally mixing "amount to copy" with "amount remaining to copy".

@cpuguy83
Copy link
Member Author

I'm going to do some more testing on this as well.

@cpuguy83
Copy link
Member Author

FYI, ended up making a new repo to play around with this a bit more, benchmarks, etc: https://github.com/cpuguy83/pipes
Somewhat suprisingly I'm seeing an approx 2x speedup using splice.

I've made some changes to the implementation that make it a bit cleaner which I'll move here as well.

@thaJeztah
Copy link
Member

Somewhat suprisingly I'm seeing an approx 2x speedup using splice.

Silly question; the benchmark in the readme doesn't show a delta; is that because some option wasn't set?

(Performance improvement sounds great though!)

@cpuguy83
Copy link
Member Author

Hmm I'm not sure.
I hadn't used benchstat before...
benchcmp shows the delta.

@thaJeztah
Copy link
Member

@cpuguy83 @kzys @dmcgowan what's the status on this one? Do we want to have this merged and included in a release?

I went looking what changes are in main that are not yet released (following #32 (comment)), so was looking if we wanted to include this PR as well if we would be doing a new release.

@kzys
Copy link
Member

kzys commented Feb 19, 2022

Sorry. I have missed the ping. Let me take a look next week.

Copy link
Member

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions and nits, but otherwise LGTM.

Comment on lines +47 to +54
select {
case <-f.opened:
return f.readFrom(r)
default:
}
select {
case <-f.opened:
return f.readFrom(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of trying to read from <-f.opened twice here?

raw_linux.go Show resolved Hide resolved
Comment on lines +75 to +77
if !ok {
return copyBuffer(f.file, r)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this error case also need to handle lr != nil?

Suggested change
if !ok {
return copyBuffer(f.file, r)
}
if !ok {
if lr != nil {
r = lr
}
return copyBuffer(f.file, r)
}

remain = spliceMax
}

// Hear the RawConn Read/Write methods allow us to utilize the go runtime
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// Hear the RawConn Read/Write methods allow us to utilize the go runtime
// Here the RawConn Read/Write methods allow us to utilize the go runtime

Comment on lines +199 to +206
case nil:
handled = true
if n == 0 {
// At EOF
return true
}
case unix.EINTR:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it'd be slightly more readable to have an explicit continue in here.

Suggested change
case nil:
handled = true
if n == 0 {
// At EOF
return true
}
case unix.EINTR:
continue
case nil:
handled = true
if n == 0 {
// At EOF
return true
}
continue
case unix.EINTR:
continue

Also, what do you think about reordering the switch to order successful cases before errors and have all the errors grouped together? Something like nil, unix.EINTR, unix.EAGAIN, unix.ENOSYS, syscall.EINVAL, syscall.EOPNOTSUPP, syscall.EPERM, default?

Comment on lines +113 to +121
select {
case <-f.opened:
return f.writeTo(w)
default:
}

select {
case <-f.opened:
return f.writeTo(w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here about the repeated read from f.opened.

data := strings.Repeat("This is a test, this is only a test.", 1000)

// For these test cases we only call ReadFrom and validate there is no error and the
// amouont of data it copied is what we put into it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// amouont of data it copied is what we put into it.
// amount of data it copied is what we put into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Needs Update
Development

Successfully merging this pull request may close these issues.

5 participants