Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffering to stdout when it's not a terminal. #885

Merged
merged 4 commits into from
Nov 26, 2021

Conversation

tmccombs
Copy link
Collaborator

@tmccombs tmccombs commented Nov 15, 2021

Follow-up of #736

@tmccombs tmccombs changed the title Tty buffer Add buffering to stdout when it's not a terminal. Nov 15, 2021
@sharkdp
Copy link
Owner

sharkdp commented Nov 15, 2021

Thank you for bringing this up to date! It would be good to see some benchmark results (regression.sh tests and maybe some benchmarks with output activated for TTY behavior) in order to gain confidence in the changes. Let me know if you need help.

src/output.rs Outdated Show resolved Hide resolved
src/walk.rs Show resolved Hide resolved
src/walk.rs Outdated Show resolved Hide resolved
src/walk.rs Outdated Show resolved Hide resolved
@tavianator
Copy link
Collaborator

By the way, you could add

Co-authored-by: sourlemon207 <jw1756@protonmail.com>

to the commit message to attribute it to both you and the original author.

@tmccombs
Copy link
Collaborator Author

tmccombs commented Nov 16, 2021

I tried running the the rust-lang repository and got:

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 245.2 ± 2.7 241.6 249.7 1.00
./fd-feature --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 269.5 ± 4.4 263.3 278.1 1.10 ± 0.02

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 343.1 ± 4.6 338.4 355.0 1.00 ± 0.01
./fd-feature '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 342.6 ± 1.8 340.6 346.4 1.00

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 242.9 ± 3.7 238.7 251.2 1.00
./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 248.9 ± 5.3 242.5 258.1 1.02 ± 0.03

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 250.9 ± 4.4 246.0 258.4 1.00
./fd-feature -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 253.3 ± 2.9 249.3 258.6 1.01 ± 0.02

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 245.2 ± 4.6 240.8 255.7 1.00
./fd-feature -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 251.1 ± 4.9 246.5 261.0 1.02 ± 0.03

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 28.951 ± 0.725 28.224 29.674 1.01 ± 0.03
./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 28.654 ± 0.318 28.339 28.975 1.00

interestingly, it looks like this branch is actually a little bit slower, I'm not really sure why.

@tmccombs
Copy link
Collaborator Author

tmccombs commented Nov 16, 2021

I switch back to using channel instead of sync_channel than it is, at least not any slower:

fd regression benchmark

No pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 250.0 ± 2.4 245.9 253.7 1.00
./fd-feature --hidden --no-ignore '' '/home/thayne/bulk-home/devel/rust-lang' 254.4 ± 1.4 251.7 255.9 1.02 ± 0.01

Simple pattern

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 342.2 ± 3.1 336.3 347.9 1.00
./fd-feature '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 342.7 ± 2.7 340.3 349.0 1.00 ± 0.01

Simple pattern (-HI)

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 242.6 ± 3.1 239.2 249.0 1.00
./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 244.9 ± 3.8 239.2 253.7 1.01 ± 0.02

File extension

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 253.8 ± 3.0 249.2 259.3 1.00
./fd-feature -HI --extension jpg '' '/home/thayne/bulk-home/devel/rust-lang' 254.7 ± 3.2 250.6 260.4 1.00 ± 0.02

File type

Command Mean [ms] Min [ms] Max [ms] Relative
./fd-master -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 243.4 ± 4.5 238.9 255.0 1.00
./fd-feature -HI --type l '' '/home/thayne/bulk-home/devel/rust-lang' 244.8 ± 3.0 240.9 249.1 1.01 ± 0.02

Cold cache

Command Mean [s] Min [s] Max [s] Relative
./fd-master -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 29.176 ± 0.388 28.729 29.432 1.00 ± 0.02
./fd-feature -HI '.*[0-9]\.jpg$' '/home/thayne/bulk-home/devel/rust-lang' 29.120 ± 0.375 28.754 29.504 1.00

@sharkdp
Copy link
Owner

sharkdp commented Nov 16, 2021

Ok, thank you. And do we have a significant improvement for searches that need to write a lot of results to stdout? Something like fd -HI?

@sharkdp
Copy link
Owner

sharkdp commented Nov 16, 2021

Have you played around with different buffer sizes? https://doc.rust-lang.org/stable/std/io/struct.BufWriter.html#method.with_capacity

@tmccombs
Copy link
Collaborator Author

Have you played around with different buffer sizes?

I have not

I'm wondering if it might be that the bottleneck, at least on my system, is reading from disk, so even if it writes to stdout faster, it doesn't impact the overall run time.

So I tried changing it to write the output to a file, instead of whatever hyperfine does with the output by default, and got much better results:

❯ hyperfine --warmup 5 './fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1' './fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2' --shell bash                                                at 00:14:26 
Benchmark 1: ./fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1
  Time (mean ± σ):     559.2 ms ±  91.6 ms    [User: 813.3 ms, System: 878.3 ms]
  Range (min … max):   511.7 ms … 817.3 ms    10 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2
  Time (mean ± σ):     326.5 ms ±  17.4 ms    [User: 781.8 ms, System: 681.6 ms]
  Range (min … max):   301.9 ms … 347.3 ms    10 runs
 
Summary
  './fd-feature -HI "" ~/bulk-home/devel/rust-lang/ > test2' ran
    1.71 ± 0.30 times faster than './fd-master -HI "" ~/bulk-home/devel/rust-lang/ > test1'

tmccombs and others added 4 commits November 26, 2021 19:40
This is based on the work of sharkdp#736 by @sourlemon207.

I've added the suggestion I recommended on that PR.
Co-authored-by: sourlemon207 <jw1756@protonmail.com>
@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

So I tried changing it to write the output to a file, instead of whatever hyperfine does with the output by default, and got much better results:

Using hyperfine, it should write to /dev/null by default (related: sharkdp/hyperfine#377). I would have naively assumed that this wouldn't change performance!?

@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

I ran some benchmarks myself. I see quite significant improvements when a lot of results (1.6 million) are printed:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark'
  Time (mean ± σ):      1.392 s ±  0.025 s    [User: 4.129 s, System: 4.597 s]
  Range (min … max):    1.372 s …  1.454 s    10 runs
 
Benchmark 2: ./fd-feature --hidden --no-ignore '' '/home/shark'
  Time (mean ± σ):      1.094 s ±  0.037 s    [User: 3.769 s, System: 4.273 s]
  Range (min … max):    1.072 s …  1.193 s    10 runs
 
Summary
  './fd-feature --hidden --no-ignore '' '/home/shark'' ran
    1.27 ± 0.05 times faster than './fd-master --hidden --no-ignore '' '/home/shark''

In combination with the changes by @tavianator, this will make colorized searches almost 5x faster (compared to 8.2.1) 🥳 Very cool!

Benchmark 1: ./fd-8.2.1 --color=always '' '/home/shark'
  Time (mean ± σ):      2.736 s ±  0.034 s    [User: 2.001 s, System: 2.149 s]
  Range (min … max):    2.706 s …  2.820 s    10 runs
 
Benchmark 2: ./fd-feature --color=always '' '/home/shark'
  Time (mean ± σ):     595.9 ms ±   5.8 ms    [User: 1085.6 ms, System: 916.4 ms]
  Range (min … max):   584.9 ms … 606.8 ms    10 runs
 
Summary
  './fd-feature --color=always '' '/home/shark'' ran
    4.59 ± 0.07 times faster than './fd-8.2.1 --color=always '' '/home/shark''

@tavianator
Copy link
Collaborator

tavianator commented Nov 26, 2021

A write() to /dev/null doesn't even look at the data, so reducing the number of write()s just reduces syscall overhead which may be hidden by parallelism anyway.

@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

A write() to /dev/null doesn't even look at the data, so reducing the number of write()s just reduces syscall overhead which may be hidden by parallelism anyway.

Interesting. I was secretly hoping that you would explain what's going on 😄 - thanks!

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

Have you played around with different buffer sizes?

I have not

I tried out various buffer sizes and it doesn't make a huge difference as long as the buffer size is larger than a few 100 B. The default is 8 KiB which seems reasonable.

@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

If the output goes to an actual file (on tmpfs), the advantage is even greater:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark' > /tmp/test.out
  Time (mean ± σ):      1.863 s ±  0.005 s    [User: 4.058 s, System: 5.137 s]
  Range (min … max):    1.857 s …  1.875 s    10 runs
 
Benchmark 2: fd --hidden --no-ignore '' '/home/shark' > /tmp/test.out
  Time (mean ± σ):      1.087 s ±  0.005 s    [User: 3.716 s, System: 4.357 s]
  Range (min … max):    1.078 s …  1.098 s    10 runs
 
Summary
  'fd --hidden --no-ignore '' '/home/shark' > /tmp/test.out' ran
    1.71 ± 0.01 times faster than './fd-master --hidden --no-ignore '' '/home/shark' > /tmp/test.out'

Even more so, if we are writing to an SSD:

Benchmark 1: ./fd-master --hidden --no-ignore '' '/home/shark' > /home/shark/test.out
  Time (mean ± σ):      3.376 s ±  0.064 s    [User: 4.138 s, System: 6.629 s]
  Range (min … max):    3.318 s …  3.525 s    10 runs
 
Benchmark 2: fd --hidden --no-ignore '' '/home/shark' > /home/shark/test.out
  Time (mean ± σ):      1.480 s ±  0.299 s    [User: 3.748 s, System: 4.469 s]
  Range (min … max):    1.184 s …  2.137 s    10 runs
 
Summary
  'fd --hidden --no-ignore '' '/home/shark' > /home/shark/test.out' ran
    2.28 ± 0.46 times faster than './fd-master --hidden --no-ignore '' '/home/shark' > /home/shark/test.out'

pv confirms the same:

▶ ./fd-master --hidden --no-ignore '' '/home/shark' | pv -a > /dev/null
[70,7MiB/s]

▶ ./fd-feature-885 --hidden --no-ignore '' '/home/shark' | pv -a > /dev/null
[ 145MiB/s]

@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

I see no reason to delay merging this.

@tavianator
Copy link
Collaborator

So I guess this means that we would get slightly more realistic benchmarking results if we pipe the output to a file?

Depends what you're trying to be realistic about :). Writing to /dev/null is a pretty good proxy for the "maximum possible" throughput of a program, which is a reasonable thing to benchmark.

You might also try writing to a file, but that complicates things. Depending on how much you write, the data might only hit the page cache, or the flash cache of a hard drive, etc. The filesystem, hardware, etc. will all affect the performance and make it harder to interpret the results.

The right thing to do depends on what you actually want to measure. I think mostly people care about interactive use, where writing to the TTY is probably the biggest bottleneck. The current benchmarks don't really capture this well.

Another common use is probably piping fd to some other program. For this case it might be better to do something like fd | cat >/dev/null instead of fd >/dev/null so that the writes actually go through a pipe.

@sharkdp sharkdp merged commit f219da4 into sharkdp:master Nov 26, 2021
@sharkdp
Copy link
Owner

sharkdp commented Nov 26, 2021

I see no reason to delay merging this. Thank you @sourlemon207 @tmccombs @tavianator.

I'll create a ticket to address the shortcomings of the current benchmark set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants