Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve copyZeroAlloc for os.File and net.TCPConn #1893

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

ksw2000
Copy link
Contributor

@ksw2000 ksw2000 commented Nov 4, 2024

Improve performance of copyZeroAlloc function

Resolves #1889

it looks like go's io.CopyBuffer always uses the WriterTo interface if it's available, much like copyZeroAlloc
However, File and TCPConn unfortunately have a fallback in their WriterTo that calls io.Copy if sendfile isn't possible, which allocates an implicit buffer

zero-copy sendfile can only be triggered when copying between os.File and net.TCPConn. Since the function confirming zero-copy sendfile is a private function, we can use ReadFrom or WriteTo only in the specific scenario.

In the original copyZeroAlloc implementation, both os.File and net.TCPConn have a WriteTo method, so the function calls WriteTo. However, zero-copy in Go is enabled only when copying between os.File and net.TCPConn, and certain conditions are met; otherwise, io.CopyBuffer will be used.

See the table below for my implementation

  • o: our copyBuffer by using sync.Pool
  • r: ReadFrom()
  • w: WriteTo()
w\r *File *TCPConn writeTo other
*File o r w o
*TCPConn w,r o w o
readFrom r r w r
other o o w o

The benchmark tests are shown below

Linux

goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: QEMU Virtual CPU version 2.5+
                                       │ old_linux.txt │            new_linux.txt            │
                                       │    sec/op     │   sec/op     vs base                │
CopyZeroAllocOSFileToBytesBuffer-8        1.802µ ±  3%   1.303µ ± 2%  -27.69% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8        1.066µ ± 17%   1.048µ ± 1%   -1.69% (p=0.043 n=25)
CopyZeroAllocOSFileToStringsBuilder-8     9.477µ ±  0%   1.345µ ± 2%  -85.81% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8    1.031µ ±  1%   1.092µ ± 4%   +5.92% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            12.132µ ±  1%   2.386µ ± 2%  -80.33% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8            2.009µ ±  2%   1.995µ ± 2%        ~ (p=0.733 n=25)
CopyZeroAllocNetConnToOSFile-8            21.86µ ±  2%   20.21µ ± 1%   -7.56% (p=0.000 n=25)
geomean                                   3.728µ         2.121µ       -43.11%

                                       │ old_linux.txt  │               new_linux.txt               │
                                       │      B/op      │     B/op      vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8         40.00 ± 0%       0.00 ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8         0.000 ± 0%      0.000 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    32.04Ki ± 0%     0.00Ki ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8     0.000 ± 0%      0.000 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToOSFile-8            32.06Ki ± 0%     0.00Ki ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8             96.00 ± 0%      96.00 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8            16.000 ± 6%      8.000 ± 12%   -50.00% (p=0.000 n=25)
geomean                                               ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                       │ old_linux.txt │              new_linux.txt              │
                                       │   allocs/op   │ allocs/op   vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8        4.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8        0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8     5.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8    0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToOSFile-8             8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8            6.000 ± 0%     6.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8            2.000 ± 0%     1.000 ± 0%   -50.00% (p=0.000 n=25)
geomean                                              ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

Windows

goos: windows
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
                                       │  old_win.txt  │             new_win.txt              │
                                       │    sec/op     │    sec/op     vs base                │
CopyZeroAllocOSFileToBytesBuffer-8        4.347µ ±  7%   4.220µ ± 11%        ~ (p=0.211 n=25)
CopyZeroAllocBytesBufferToOSFile-8        1.408µ ± 12%   1.460µ ±  7%        ~ (p=0.427 n=25)
CopyZeroAllocOSFileToStringsBuilder-8    17.448µ ±  5%   3.613µ ±  9%  -79.29% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8    1.324µ ±  8%   1.257µ ±  6%   -5.06% (p=0.024 n=25)
CopyZeroAllocOSFileToOSFile-8            19.953µ ±  8%   4.846µ ±  7%  -75.71% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8            18.18µ ±  8%   18.22µ ±  7%        ~ (p=0.405 n=25)
CopyZeroAllocNetConnToOSFile-8            74.75µ ±  2%   68.10µ ±  3%   -8.90% (p=0.000 n=25)
geomean                                   8.720µ         5.579µ        -36.02%

                                       │  old_win.txt   │                new_win.txt                │
                                       │      B/op      │     B/op      vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8         8.000 ± 0%       0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8         0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    32.01Ki ± 0%      0.00Ki ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8     9.000 ± 0%       0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            32.02Ki ± 0%      0.00Ki ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8           32.02Ki ± 0%     32.02Ki ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8           32.02Ki ± 0%     32.02Ki ± 0%    -0.00% (p=0.012 n=25)
geomean                                               ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                       │ old_win.txt  │               new_win.txt               │
                                       │  allocs/op   │ allocs/op   vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8       1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8       0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            3.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8           3.000 ± 0%     3.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8           3.000 ± 0%     3.000 ± 0%         ~ (p=1.000 n=25) ¹
geomean                                             ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

Improve performance of `copyZeroAlloc` function

```
goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: QEMU Virtual CPU version 2.5+
                                       │   old6.txt    │              new7.txt               │
                                       │    sec/op     │   sec/op     vs base                │
CopyZeroAllocOSFileToBytesBuffer-8        1.802µ ±  3%   1.303µ ± 2%  -27.69% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8        1.066µ ± 17%   1.048µ ± 1%   -1.69% (p=0.043 n=25)
CopyZeroAllocOSFileToStringsBuilder-8     9.477µ ±  0%   1.345µ ± 2%  -85.81% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8    1.031µ ±  1%   1.092µ ± 4%   +5.92% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            12.132µ ±  1%   2.386µ ± 2%  -80.33% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8            2.009µ ±  2%   1.995µ ± 2%        ~ (p=0.733 n=25)
CopyZeroAllocNetConnToOSFile-8            21.86µ ±  2%   20.21µ ± 1%   -7.56% (p=0.000 n=25)
geomean                                   3.728µ         2.121µ       -43.11%

                                       │    old6.txt    │                 new7.txt                  │
                                       │      B/op      │     B/op      vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8         40.00 ± 0%       0.00 ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8         0.000 ± 0%      0.000 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    32.04Ki ± 0%     0.00Ki ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8     0.000 ± 0%      0.000 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToOSFile-8            32.06Ki ± 0%     0.00Ki ±  0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8             96.00 ± 0%      96.00 ±  0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8            16.000 ± 6%      8.000 ± 12%   -50.00% (p=0.000 n=25)
geomean                                               ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                       │   old6.txt   │                new7.txt                 │
                                       │  allocs/op   │ allocs/op   vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8       4.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8       0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    5.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8   0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToOSFile-8            8.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8           6.000 ± 0%     6.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8           2.000 ± 0%     1.000 ± 0%   -50.00% (p=0.000 n=25)
geomean                                             ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean
```

```
goos: windows
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
                                       │  old_win.txt  │             new_win.txt              │
                                       │    sec/op     │    sec/op     vs base                │
CopyZeroAllocOSFileToBytesBuffer-8        4.347µ ±  7%   4.220µ ± 11%        ~ (p=0.211 n=25)
CopyZeroAllocBytesBufferToOSFile-8        1.408µ ± 12%   1.460µ ±  7%        ~ (p=0.427 n=25)
CopyZeroAllocOSFileToStringsBuilder-8    17.448µ ±  5%   3.613µ ±  9%  -79.29% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8    1.324µ ±  8%   1.257µ ±  6%   -5.06% (p=0.024 n=25)
CopyZeroAllocOSFileToOSFile-8            19.953µ ±  8%   4.846µ ±  7%  -75.71% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8            18.18µ ±  8%   18.22µ ±  7%        ~ (p=0.405 n=25)
CopyZeroAllocNetConnToOSFile-8            74.75µ ±  2%   68.10µ ±  3%   -8.90% (p=0.000 n=25)
geomean                                   8.720µ         5.579µ        -36.02%

                                       │  old_win.txt   │                new_win.txt                │
                                       │      B/op      │     B/op      vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8         8.000 ± 0%       0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8         0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    32.01Ki ± 0%      0.00Ki ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8     9.000 ± 0%       0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            32.02Ki ± 0%      0.00Ki ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8           32.02Ki ± 0%     32.02Ki ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8           32.02Ki ± 0%     32.02Ki ± 0%    -0.00% (p=0.012 n=25)
geomean                                               ²                 ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean

                                       │ old_win.txt  │               new_win.txt               │
                                       │  allocs/op   │ allocs/op   vs base                     │
CopyZeroAllocOSFileToBytesBuffer-8       1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocBytesBufferToOSFile-8       0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocOSFileToStringsBuilder-8    2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocIOLimitedReaderToOSFile-8   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToOSFile-8            3.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=25)
CopyZeroAllocOSFileToNetConn-8           3.000 ± 0%     3.000 ± 0%         ~ (p=1.000 n=25) ¹
CopyZeroAllocNetConnToOSFile-8           3.000 ± 0%     3.000 ± 0%         ~ (p=1.000 n=25) ¹
geomean                                             ²               ?                       ² ³
¹ all samples are equal
² summaries must be >0 to compute geomean
³ ratios must be >0 to compute geomean
```
@ksw2000 ksw2000 marked this pull request as ready for review November 4, 2024 07:01
@erikdubbelboer erikdubbelboer merged commit f6ba4ab into valyala:master Nov 8, 2024
15 checks passed
@erikdubbelboer
Copy link
Collaborator

Thanks! Super nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

copyZeroAlloc allocates when interface uses genericWriteTo
2 participants