Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

Closed
derekmahar opened this issue Sep 8, 2023 · 23 comments · Fixed by #1496
Labels
enhancement New feature or request. Once marked with this label, its in the backlog.

Comments

@derekmahar
Copy link

In order to bypass the command line length of command shells, please allow cat to accept an option that specifies the name of a file that may contain a list of input CSV data file name arguments, each separated by a new line. This option would be similar to option --infile-list string that csvtk concat accepts:

      --infile-list string     file of input files list (one file per line), if given, they are appended to files from cli arguments

(See the output of command csvtk concat --help for the most current syntax of csvtk concat.)

@jqnatividad jqnatividad added the enhancement New feature or request. Once marked with this label, its in the backlog. label Sep 8, 2023
@jqnatividad
Copy link
Collaborator

This is a good feature to have. Added to the backlog.

@jqnatividad
Copy link
Collaborator

jqnatividad commented Nov 3, 2023

Thinking more about this, something like --infile-list should be a qsv-wide feature so commands that accept multiple input files will accept it beyond the cat command.

It should also be implemented for headers & sqlp.

@derekmahar
Copy link
Author

If not too much effort, and for commands like cat where it would make sense, please consider implementing --infile-list as a streaming or "lazy" list so that a very large list of input file name arguments consumes less memory than it does now. In July 2022, I tested xsv cat rows with a large number of CSV file arguments and found that not only was xsv cat rows much slower than ordinary cat, but it also consumed a lot of memory because the data structure which represents each file argument is relatively large. If I recall correctly, before processing the input files, xsv cat rows first reads all input file name argument strings into a list of this "heavy" argument data structure instead of just reading the original file name argument list directly. My guess is that other commands also use this argument data structure. Couldn't xsv cat rows (and other commands) just read the input file name argument list directly without building this internal argument list?

@jqnatividad
Copy link
Collaborator

I'm afraid changing cat rows logic to lazy loading is not that straightforward and will require a rewrite that I'm not sure is worth the payoff.

The large config data structure that it loads for the CSV reading configuration of each input file is really not that big. Per VSCode, it's only 112 bytes per file.

How many files were you benchmarking BTW? Was it thousands of files?

Also, qsv cat rows will always be slower than regular cat as it's parsing the files as CSV, with special handling for ignoring the header. cat just appends the files together.

Anyway, I'll consider streamlining it while implementing --infile-list just the same as I can see just reusing the same config for each input file...

@derekmahar
Copy link
Author

derekmahar commented Nov 8, 2023

The large config data structure that it loads for the CSV reading configuration of each input file is really not that big. Per VSCode, it's only 112 bytes per file.

If not file name argument processing, might the slower CSV file concatenation be due to slower file input-output handling? Excessive CSV data validation?

How many files were you benchmarking BTW? Was it thousands of files?

I concatenated thousands of files. Project benchmark_cat_csv compares the CSV file concatenation performance of my naive custom CSV concatenation shell script cat_csv_custom, csvtk, mlr, qsv, and xsv. In the test run below which concatenates 100,000 CSV files that each contain a single row of one column, while qsv completed the test more than 2.5 times faster than xsv, csvtk was almost six times and mlr almost 17 times faster than qsv.

$ ./run_tests 100000
Benchmark 1: ./run_test ./cat_csv_csvtk 100000
  Time (mean ± σ):      2.950 s ±  0.034 s    [User: 4.620 s, System: 0.941 s]
  Range (min … max):    2.897 s …  3.024 s    10 runs

Benchmark 2: ./run_test ./cat_csv_csvtk_2 100000
  Time (mean ± σ):      3.400 s ±  0.032 s    [User: 4.645 s, System: 1.588 s]
  Range (min … max):    3.341 s …  3.452 s    10 runs

Benchmark 3: ./run_test ./cat_csv_custom 100000
  Time (mean ± σ):     431.4 ms ±   1.6 ms    [User: 159.8 ms, System: 308.1 ms]
  Range (min … max):   428.8 ms … 434.1 ms    10 runs

Benchmark 4: ./run_test ./cat_csv_custom_2 100000
  Time (mean ± σ):     468.5 ms ±   3.4 ms    [User: 166.1 ms, System: 340.0 ms]
  Range (min … max):   464.0 ms … 473.2 ms    10 runs

Benchmark 5: ./run_test ./cat_csv_mlr 100000
  Time (mean ± σ):      1.141 s ±  0.007 s    [User: 0.774 s, System: 0.645 s]
  Range (min … max):    1.130 s …  1.153 s    10 runs

Benchmark 6: ./run_test ./cat_csv_mlr_2 100000
  Time (mean ± σ):      1.219 s ±  0.008 s    [User: 0.751 s, System: 0.740 s]
  Range (min … max):    1.211 s …  1.237 s    10 runs

Benchmark 7: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     20.105 s ±  0.028 s    [User: 13.985 s, System: 6.358 s]
  Range (min … max):   20.067 s … 20.149 s    10 runs

Benchmark 8: ./run_test ./cat_csv_xsv 100000
  Time (mean ± σ):     52.460 s ±  0.129 s    [User: 45.097 s, System: 7.395 s]
  Range (min … max):   52.295 s … 52.737 s    10 runs

Summary
  './run_test ./cat_csv_custom 100000' ran
    1.09 ± 0.01 times faster than './run_test ./cat_csv_custom_2 100000'
    2.64 ± 0.02 times faster than './run_test ./cat_csv_mlr 100000'
    2.83 ± 0.02 times faster than './run_test ./cat_csv_mlr_2 100000'
    6.84 ± 0.08 times faster than './run_test ./cat_csv_csvtk 100000'
    7.88 ± 0.08 times faster than './run_test ./cat_csv_csvtk_2 100000'
   46.60 ± 0.19 times faster than './run_test ./cat_csv_qsv 100000'
  121.59 ± 0.55 times faster than './run_test ./cat_csv_xsv 100000'

Note that all of the shell scripts with the "_2" suffix invoke a sub-shell in order to handicap them in a way similar to shell scripts cat_csv_qsv and cat_csv_xsv. This is to eliminate the possibility that copying shell script arguments slowed down the qsv and xsv scripts. Script cat_csv_xsv cheats a little by invoking tail to delete CSV headers because xsv doesn't have a command similar to qsv behead or csvtk del-header.

Also, qsv cat rows will always be slower than regular cat as it's parsing the files as CSV, with special handling for ignoring the header. cat just appends the files together.

I was oversimplifying the implementation of my custom shell script. It actually uses Bash read, GNU cat, xargs, and tail. The custom script is at the end of the following shell pipeline:

find data -type f -name '*.csv' |
  sort |
  head -n 5 |
  (read first; cat $first; xargs -r tail --lines=+2 --quiet)

@derekmahar
Copy link
Author

By the way, here are the versions of the various tools that I used in the performance tests.

$ ./versions
csvtk v0.28.1
mlr 6.9.0
qsv 0.118.0-mimalloc-apply;fetch;foreach;generate;geocode;Luau 0.601;python-3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0];to;polars-0.34.2;self_update-8-8;12.49 GiB-0 B-14.90 GiB-15.61 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.73.0) compiled
xsv 0.13.0
GNU tools
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
cat (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund and Richard M. Stallman.
tail (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
xargs (GNU findutils) 4.8.0
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.

@jqnatividad
Copy link
Collaborator

Thanks for pulling together the benchmark_cat_csv project and including qsv in it.

I just tweaked cat_rows a bit to squeeze out more performance by removing the header handling outside the main hot loop, amortizing the rdr allocation, inlining some functions, and doubling the default write buffer size to 512k.

I'll still work on the --infile-list argument, but would be interested in how qsv performs now with these changes.

@jqnatividad
Copy link
Collaborator

Just added --flexible option which makes it even faster still by turning off col-count validation.

@derekmahar
Copy link
Author

derekmahar commented Nov 9, 2023

Unfortunately, the qsv cat rows --flexible option and the changes to qsv cat rows did not improve its CSV file concatenation performance. (Actually, the mean test completion time was just over 1 s higher than in the original test.)

$ hyperfine --warmup 3 './run_test ./cat_csv_qsv 100000' './run_test ./cat_csv_qsv_flexible 100000'
Benchmark 1: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     21.383 s ±  0.061 s    [User: 14.396 s, System: 7.228 s]
  Range (min … max):   21.294 s … 21.491 s    10 runs

Benchmark 2: ./run_test ./cat_csv_qsv_flexible 100000
  Time (mean ± σ):     21.606 s ±  0.057 s    [User: 14.537 s, System: 7.305 s]
  Range (min … max):   21.511 s … 21.669 s    10 runs

Summary
  './run_test ./cat_csv_qsv 100000' ran
    1.01 ± 0.00 times faster than './run_test ./cat_csv_qsv_flexible 100000'
$ cat cat_csv_qsv_flexible
#!/usr/bin/env bash
read first
qsv cat rows --flexible $first
xargs --no-run-if-empty sh -c 'qsv cat rows --flexible "$@" | qsv behead' cat_csv_skip_header

@jqnatividad
Copy link
Collaborator

jqnatividad commented Nov 9, 2023

Interesting. There should be a noticeable performance bump...

Can you try running it with qsvlite, making sure to build it with CPU optimizations in release mode?

export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
cargo build --release --locked -F lite

@derekmahar
Copy link
Author

qsvlite made no difference.

$ hyperfine --warmup 3 './run_test ./cat_csv_qsvlite 100000' './run_test ./cat_csv_qsvlite_flexible 100000'
Benchmark 1: ./run_test ./cat_csv_qsvlite 100000
  Time (mean ± σ):     20.638 s ±  0.029 s    [User: 13.804 s, System: 6.907 s]
  Range (min … max):   20.602 s … 20.678 s    10 runs

Benchmark 2: ./run_test ./cat_csv_qsvlite_flexible 100000
  Time (mean ± σ):     20.765 s ±  0.044 s    [User: 13.857 s, System: 6.980 s]
  Range (min … max):   20.683 s … 20.822 s    10 runs

Summary
  './run_test ./cat_csv_qsvlite 100000' ran
    1.01 ± 0.00 times faster than './run_test ./cat_csv_qsvlite_flexible 100000'
$ cat cat_csv_qsvlite
#!/usr/bin/env bash
read first
qsvlite cat rows $first
xargs --no-run-if-empty sh -c 'qsvlite cat rows "$@" | qsvlite behead' cat_csv_skip_header
$ cat cat_csv_qsvlite_flexible
#!/usr/bin/env bash
read first
qsvlite cat rows --flexible $first
xargs --no-run-if-empty sh -c 'qsvlite cat rows --flexible "$@" | qsvlite behead' cat_csv_skip_header
$ qsvlite --version
qsvlite 0.118.0-mimalloc--8-8;12.49 GiB-0 B-14.88 GiB-15.61 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.73.0) compiled
$ echo $CARGO_BUILD_RUSTFLAGS
-C target-cpu=native

@jqnatividad
Copy link
Collaborator

I specified qsvlite since it's faster to build. I just wanted to make sure CPU optimizations were enabled and it was built in release mode...

If you're so inclined @derekmahar , you can actually compile qsv with a samply profile and actually see where the bottlenecks are using samply.

export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
cargo build --profile release-samply --locked -F lite

@derekmahar
Copy link
Author

How much difference should CARGO_BUILD_RUSTFLAGS='-C target-cpu=native' make?

@jqnatividad
Copy link
Collaborator

jqnatividad commented Nov 9, 2023

Depends on the platform.

But you can find out what addl CPU features are enabled by going here

https://github.com/jqnatividad/qsv/blob/master/docs/PERFORMANCE.md#cpu-optimization

jqnatividad added a commit that referenced this issue Nov 9, 2023
… `--flexible` option is enabled

in connection with #1293
@derekmahar
Copy link
Author

derekmahar commented Nov 9, 2023

Just added --flexible option which makes it even faster still by turning off col-count validation.

--flexible did make a significant difference running on my HP Spectre x360 (Intel i7-8565U at 1.992GHz, SSD) in Ubuntu for Windows Subsystem for Linux 2:

$ ./run_tests 100000
Benchmark 1: ./run_test ./cat_csv_awk 100000
  Time (mean ± σ):      2.518 s ±  0.971 s    [User: 1.541 s, System: 1.133 s]
  Range (min … max):    1.414 s …  3.803 s    10 runs

Benchmark 2: ./run_test ./cat_csv_csvtk 100000
  Time (mean ± σ):      8.385 s ±  0.197 s    [User: 11.015 s, System: 3.808 s]
  Range (min … max):    8.123 s …  8.724 s    10 runs

Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 100000
  Time (mean ± σ):     11.923 s ±  0.814 s    [User: 13.867 s, System: 6.801 s]
  Range (min … max):   11.022 s … 13.229 s    10 runs

Benchmark 4: ./run_test ./cat_csv_custom 100000
  Time (mean ± σ):     930.7 ms ± 108.4 ms    [User: 368.0 ms, System: 616.7 ms]
  Range (min … max):   788.9 ms … 1061.9 ms    10 runs

Benchmark 5: ./run_test ./cat_csv_custom_subshell 100000
  Time (mean ± σ):     835.1 ms ±  21.6 ms    [User: 440.1 ms, System: 438.5 ms]
  Range (min … max):   804.4 ms … 871.4 ms    10 runs

Benchmark 6: ./run_test ./cat_csv_goawk 100000
  Time (mean ± σ):     878.3 ms ±  46.1 ms    [User: 440.3 ms, System: 482.4 ms]
  Range (min … max):   817.6 ms … 966.1 ms    10 runs

Benchmark 7: ./run_test ./cat_csv_mlr 100000
  Time (mean ± σ):      3.128 s ±  0.145 s    [User: 2.250 s, System: 2.229 s]
  Range (min … max):    2.915 s …  3.409 s    10 runs

Benchmark 8: ./run_test ./cat_csv_mlr_subshell 100000
  Time (mean ± σ):      3.431 s ±  0.376 s    [User: 2.410 s, System: 2.409 s]
  Range (min … max):    3.101 s …  4.417 s    10 runs

Benchmark 9: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     44.597 s ±  1.669 s    [User: 34.652 s, System: 10.306 s]
  Range (min … max):   42.515 s … 46.957 s    10 runs

Benchmark 10: ./run_test ./cat_csv_qsv_flexible 100000
  Time (mean ± σ):     19.450 s ±  0.949 s    [User: 15.616 s, System: 4.219 s]
  Range (min … max):   18.671 s … 22.009 s    10 runs

Benchmark 11: ./run_test ./cat_csv_xsv 100000
  Time (mean ± σ):     108.793 s ± 12.043 s    [User: 90.884 s, System: 17.915 s]
  Range (min … max):   97.630 s … 126.524 s    10 runs

Summary
  ./run_test ./cat_csv_custom_subshell 100000 ran
    1.05 ± 0.06 times faster than ./run_test ./cat_csv_goawk 100000
    1.11 ± 0.13 times faster than ./run_test ./cat_csv_custom 100000
    3.01 ± 1.17 times faster than ./run_test ./cat_csv_awk 100000
    3.75 ± 0.20 times faster than ./run_test ./cat_csv_mlr 100000
    4.11 ± 0.46 times faster than ./run_test ./cat_csv_mlr_subshell 100000
   10.04 ± 0.35 times faster than ./run_test ./cat_csv_csvtk 100000
   14.28 ± 1.04 times faster than ./run_test ./cat_csv_csvtk_subshell 100000
   23.29 ± 1.29 times faster than ./run_test ./cat_csv_qsv_flexible 100000
   53.40 ± 2.43 times faster than ./run_test ./cat_csv_qsv 100000
  130.28 ± 14.81 times faster than ./run_test ./cat_csv_xsv 100000

I ran the original test on an AMD Ryzen 9 6900HX at 3.293GHz, NVMe drive.

@derekmahar
Copy link
Author

If you're so inclined @derekmahar , you can actually compile qsv with a samply profile and actually see where the bottlenecks are using samply.

I'm not certain, but I think I can't use samply because I'm running qsv and my benchmarks only in command line shells in Windows Subsystem for Linux 2 and on remote Linux servers on my home network. I may be mistaken, but I think that a process that runs in WSL 2 can't launch a local browser on its Windows host. If WSL 2 can launch a browser in the host, I don't know how to do it.

jqnatividad added a commit that referenced this issue Dec 26, 2023
"infile-list" files is qsv's flavor of the "infile-list" support of csvtk as per #1293

In our implementation, providing a file with the ".infile-list" extension to commands that support it (currently, `sqlp` and `to`) will read the file as a list of input files to use for the command.

Will add ".infile-list" support  to `cat` and `headers` command as well
@derekmahar
Copy link
Author

Thank you for implementing this feature!

@derekmahar
Copy link
Author

derekmahar commented Dec 26, 2023

By the way, qsv cat rows using an input file list is an order of magnitude faster than reading CSV file name arguments from the command line which reinforces my hypothesis that qsv's command line argument list reader is not very efficient.

Benchmark 1: ./run_test ./cat_csv_awk 10000
  Time (mean ± σ):     171.2 ms ±   3.4 ms    [User: 125.6 ms, System: 56.1 ms]
  Range (min … max):   167.9 ms … 182.0 ms    17 runs
 
Benchmark 2: ./run_test ./cat_csv_csvtk 10000
  Time (mean ± σ):     412.3 ms ±   7.4 ms    [User: 529.8 ms, System: 143.5 ms]
  Range (min … max):   401.9 ms … 429.6 ms    10 runs
 
Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 10000
  Time (mean ± σ):     450.3 ms ±  11.1 ms    [User: 579.1 ms, System: 163.7 ms]
  Range (min … max):   438.0 ms … 475.5 ms    10 runs
 
Benchmark 4: ./run_test ./cat_csv_custom 10000
  Time (mean ± σ):     145.0 ms ±   0.6 ms    [User: 106.2 ms, System: 46.5 ms]
  Range (min … max):   143.8 ms … 145.9 ms    20 runs
 
Benchmark 5: ./run_test ./cat_csv_custom_subshell 10000
  Time (mean ± σ):     148.5 ms ±   1.9 ms    [User: 105.1 ms, System: 52.0 ms]
  Range (min … max):   146.3 ms … 155.5 ms    20 runs
 
Benchmark 6: ./run_test ./cat_csv_goawk 10000
  Time (mean ± σ):     150.3 ms ±   2.0 ms    [User: 117.0 ms, System: 41.7 ms]
  Range (min … max):   148.3 ms … 155.3 ms    19 runs
 
Benchmark 7: ./run_test ./cat_csv_mlr 10000
  Time (mean ± σ):     219.9 ms ±   1.7 ms    [User: 170.5 ms, System: 80.3 ms]
  Range (min … max):   217.9 ms … 222.8 ms    13 runs
 
Benchmark 8: ./run_test ./cat_csv_mlr_subshell 10000
  Time (mean ± σ):     227.5 ms ±   2.4 ms    [User: 169.3 ms, System: 89.3 ms]
  Range (min … max):   224.5 ms … 232.4 ms    13 runs
 
Benchmark 9: ./run_test ./cat_csv_qsv 10000
  Time (mean ± σ):      2.057 s ±  0.013 s    [User: 1.368 s, System: 0.724 s]
  Range (min … max):    2.027 s …  2.078 s    10 runs
 
Benchmark 10: ./run_test ./cat_csv_qsv_infile_list 10000
  Time (mean ± σ):     276.9 ms ±   0.9 ms    [User: 234.4 ms, System: 50.5 ms]
  Range (min … max):   275.8 ms … 278.3 ms    10 runs
 
Benchmark 11: ./run_test ./cat_csv_qsv_flexible 10000
  Time (mean ± σ):      2.072 s ±  0.015 s    [User: 1.394 s, System: 0.714 s]
  Range (min … max):    2.054 s …  2.097 s    10 runs
 
Benchmark 12: ./run_test ./cat_csv_xsv 10000
  Time (mean ± σ):      4.673 s ±  0.021 s    [User: 3.906 s, System: 0.775 s]
  Range (min … max):    4.636 s …  4.700 s    10 runs
 
Summary
  './run_test ./cat_csv_custom 10000' ran
    1.02 ± 0.01 times faster than './run_test ./cat_csv_custom_subshell 10000'
    1.04 ± 0.01 times faster than './run_test ./cat_csv_goawk 10000'
    1.18 ± 0.02 times faster than './run_test ./cat_csv_awk 10000'
    1.52 ± 0.01 times faster than './run_test ./cat_csv_mlr 10000'
    1.57 ± 0.02 times faster than './run_test ./cat_csv_mlr_subshell 10000'
    1.91 ± 0.01 times faster than './run_test ./cat_csv_qsv_infile_list 10000'
    2.84 ± 0.05 times faster than './run_test ./cat_csv_csvtk 10000'
    3.11 ± 0.08 times faster than './run_test ./cat_csv_csvtk_subshell 10000'
   14.19 ± 0.11 times faster than './run_test ./cat_csv_qsv 10000'
   14.30 ± 0.12 times faster than './run_test ./cat_csv_qsv_flexible 10000'
   32.24 ± 0.20 times faster than './run_test ./cat_csv_xsv 10000'

@jqnatividad
Copy link
Collaborator

Thanks @derekmahar for compiling these benchmarks!

The docopt parser is super-convenient that's why I choose to stay with it (#463 for more details), and its good to have a baseline to keep improving its performance.

It's also good to know that qsv is faster than csvtk and just a tad slower than mlr. I may still be able to squeeze some more performance from cat.

Do you mind sharing your benchmark so I can use it for tuning?

@derekmahar
Copy link
Author

derekmahar commented Dec 27, 2023

benchmark_cat_csv

@derekmahar
Copy link
Author

derekmahar commented Dec 27, 2023

The docopt parser is super-convenient that's why I choose to stay with it (#463 for more details), and its good to have a baseline to keep improving its performance.

qsv's poor performance (aside from the input file list argument) in this benchmark may be an example of docopt/docopt.rs#207 to which you referred in #463.

It's also good to know that qsv is faster than csvtk and just a tad slower than mlr. I may still be able to squeeze some more performance from cat.

It's encouraging to know that the sluggishness of qsv cat rows is due almost entirely to its command line parser and not its core CSV parser.

@jqnatividad
Copy link
Collaborator

Hi @derekmahar , just wanted to give you a heads-up that I tweaked qsv-docopt a bit.

Hopefully, it'll perform better on your command line parsing benchmarks...

@derekmahar
Copy link
Author

Your tweaks to qsv-docopt improved qsv's performance by about 5%:

Benchmark 1: ./run_test ./cat_csv_awk 10000
  Time (mean ± σ):     169.5 ms ±   1.3 ms    [User: 124.1 ms, System: 55.8 ms]
  Range (min … max):   167.9 ms … 173.7 ms    17 runs
 
Benchmark 2: ./run_test ./cat_csv_csvtk 10000
  Time (mean ± σ):     425.1 ms ±  16.1 ms    [User: 553.3 ms, System: 152.8 ms]
  Range (min … max):   410.2 ms … 462.1 ms    10 runs
 
Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 10000
  Time (mean ± σ):     437.9 ms ±   7.4 ms    [User: 520.8 ms, System: 193.9 ms]
  Range (min … max):   430.6 ms … 455.0 ms    10 runs
 
Benchmark 4: ./run_test ./cat_csv_custom 10000
  Time (mean ± σ):     145.6 ms ±   1.3 ms    [User: 106.9 ms, System: 46.7 ms]
  Range (min … max):   143.9 ms … 148.3 ms    20 runs
 
Benchmark 5: ./run_test ./cat_csv_custom_subshell 10000
  Time (mean ± σ):     148.6 ms ±   0.8 ms    [User: 106.8 ms, System: 49.8 ms]
  Range (min … max):   147.0 ms … 150.0 ms    20 runs
 
Benchmark 6: ./run_test ./cat_csv_goawk 10000
  Time (mean ± σ):     150.4 ms ±   0.7 ms    [User: 114.8 ms, System: 43.9 ms]
  Range (min … max):   149.1 ms … 151.8 ms    19 runs
 
Benchmark 7: ./run_test ./cat_csv_mlr 10000
  Time (mean ± σ):     222.3 ms ±   3.9 ms    [User: 163.8 ms, System: 90.7 ms]
  Range (min … max):   217.7 ms … 232.2 ms    13 runs
 
Benchmark 8: ./run_test ./cat_csv_mlr_subshell 10000
  Time (mean ± σ):     230.1 ms ±   2.4 ms    [User: 172.9 ms, System: 89.5 ms]
  Range (min … max):   227.3 ms … 235.5 ms    13 runs
 
Benchmark 9: ./run_test ./cat_csv_qsv 10000
  Time (mean ± σ):      1.969 s ±  0.013 s    [User: 1.322 s, System: 0.683 s]
  Range (min … max):    1.945 s …  1.992 s    10 runs
 
Benchmark 10: ./run_test ./cat_csv_qsv_infile_list 10000
  Time (mean ± σ):     274.3 ms ±   2.2 ms    [User: 229.6 ms, System: 53.2 ms]
  Range (min … max):   271.7 ms … 279.5 ms    10 runs
 
Benchmark 11: ./run_test ./cat_csv_qsv_flexible 10000
  Time (mean ± σ):      1.978 s ±  0.011 s    [User: 1.320 s, System: 0.694 s]
  Range (min … max):    1.966 s …  1.998 s    10 runs
 
Benchmark 12: ./run_test ./cat_csv_xsv 10000
  Time (mean ± σ):      4.640 s ±  0.035 s    [User: 3.865 s, System: 0.783 s]
  Range (min … max):    4.585 s …  4.703 s    10 runs
 
Summary
  './run_test ./cat_csv_custom 10000' ran
    1.02 ± 0.01 times faster than './run_test ./cat_csv_custom_subshell 10000'
    1.03 ± 0.01 times faster than './run_test ./cat_csv_goawk 10000'
    1.16 ± 0.01 times faster than './run_test ./cat_csv_awk 10000'
    1.53 ± 0.03 times faster than './run_test ./cat_csv_mlr 10000'
    1.58 ± 0.02 times faster than './run_test ./cat_csv_mlr_subshell 10000'
    1.88 ± 0.02 times faster than './run_test ./cat_csv_qsv_infile_list 10000'
    2.92 ± 0.11 times faster than './run_test ./cat_csv_csvtk 10000'
    3.01 ± 0.06 times faster than './run_test ./cat_csv_csvtk_subshell 10000'
   13.52 ± 0.15 times faster than './run_test ./cat_csv_qsv 10000'
   13.58 ± 0.14 times faster than './run_test ./cat_csv_qsv_flexible 10000'
   31.86 ± 0.38 times faster than './run_test ./cat_csv_xsv 10000'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Once marked with this label, its in the backlog.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants