Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

derekmahar · 2023-09-08T17:16:04Z

In order to bypass the command line length of command shells, please allow cat to accept an option that specifies the name of a file that may contain a list of input CSV data file name arguments, each separated by a new line. This option would be similar to option --infile-list string that csvtk concat accepts:

      --infile-list string     file of input files list (one file per line), if given, they are appended to files from cli arguments

(See the output of command csvtk concat --help for the most current syntax of csvtk concat.)

The text was updated successfully, but these errors were encountered:

jqnatividad · 2023-09-08T19:12:00Z

This is a good feature to have. Added to the backlog.

jqnatividad · 2023-11-03T12:49:18Z

Thinking more about this, something like --infile-list should be a qsv-wide feature so commands that accept multiple input files will accept it beyond the cat command.

It should also be implemented for headers & sqlp.

derekmahar · 2023-11-03T19:11:52Z

If not too much effort, and for commands like cat where it would make sense, please consider implementing --infile-list as a streaming or "lazy" list so that a very large list of input file name arguments consumes less memory than it does now. In July 2022, I tested xsv cat rows with a large number of CSV file arguments and found that not only was xsv cat rows much slower than ordinary cat, but it also consumed a lot of memory because the data structure which represents each file argument is relatively large. If I recall correctly, before processing the input files, xsv cat rows first reads all input file name argument strings into a list of this "heavy" argument data structure instead of just reading the original file name argument list directly. My guess is that other commands also use this argument data structure. Couldn't xsv cat rows (and other commands) just read the input file name argument list directly without building this internal argument list?

jqnatividad · 2023-11-04T17:04:39Z

I'm afraid changing cat rows logic to lazy loading is not that straightforward and will require a rewrite that I'm not sure is worth the payoff.

The large config data structure that it loads for the CSV reading configuration of each input file is really not that big. Per VSCode, it's only 112 bytes per file.

How many files were you benchmarking BTW? Was it thousands of files?

Also, qsv cat rows will always be slower than regular cat as it's parsing the files as CSV, with special handling for ignoring the header. cat just appends the files together.

Anyway, I'll consider streamlining it while implementing --infile-list just the same as I can see just reusing the same config for each input file...

derekmahar · 2023-11-08T21:44:43Z

The large config data structure that it loads for the CSV reading configuration of each input file is really not that big. Per VSCode, it's only 112 bytes per file.

If not file name argument processing, might the slower CSV file concatenation be due to slower file input-output handling? Excessive CSV data validation?

How many files were you benchmarking BTW? Was it thousands of files?

I concatenated thousands of files. Project benchmark_cat_csv compares the CSV file concatenation performance of my naive custom CSV concatenation shell script cat_csv_custom, csvtk, mlr, qsv, and xsv. In the test run below which concatenates 100,000 CSV files that each contain a single row of one column, while qsv completed the test more than 2.5 times faster than xsv, csvtk was almost six times and mlr almost 17 times faster than qsv.

$ ./run_tests 100000
Benchmark 1: ./run_test ./cat_csv_csvtk 100000
  Time (mean ± σ):      2.950 s ±  0.034 s    [User: 4.620 s, System: 0.941 s]
  Range (min … max):    2.897 s …  3.024 s    10 runs

Benchmark 2: ./run_test ./cat_csv_csvtk_2 100000
  Time (mean ± σ):      3.400 s ±  0.032 s    [User: 4.645 s, System: 1.588 s]
  Range (min … max):    3.341 s …  3.452 s    10 runs

Benchmark 3: ./run_test ./cat_csv_custom 100000
  Time (mean ± σ):     431.4 ms ±   1.6 ms    [User: 159.8 ms, System: 308.1 ms]
  Range (min … max):   428.8 ms … 434.1 ms    10 runs

Benchmark 4: ./run_test ./cat_csv_custom_2 100000
  Time (mean ± σ):     468.5 ms ±   3.4 ms    [User: 166.1 ms, System: 340.0 ms]
  Range (min … max):   464.0 ms … 473.2 ms    10 runs

Benchmark 5: ./run_test ./cat_csv_mlr 100000
  Time (mean ± σ):      1.141 s ±  0.007 s    [User: 0.774 s, System: 0.645 s]
  Range (min … max):    1.130 s …  1.153 s    10 runs

Benchmark 6: ./run_test ./cat_csv_mlr_2 100000
  Time (mean ± σ):      1.219 s ±  0.008 s    [User: 0.751 s, System: 0.740 s]
  Range (min … max):    1.211 s …  1.237 s    10 runs

Benchmark 7: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     20.105 s ±  0.028 s    [User: 13.985 s, System: 6.358 s]
  Range (min … max):   20.067 s … 20.149 s    10 runs

Benchmark 8: ./run_test ./cat_csv_xsv 100000
  Time (mean ± σ):     52.460 s ±  0.129 s    [User: 45.097 s, System: 7.395 s]
  Range (min … max):   52.295 s … 52.737 s    10 runs

Summary
  './run_test ./cat_csv_custom 100000' ran
    1.09 ± 0.01 times faster than './run_test ./cat_csv_custom_2 100000'
    2.64 ± 0.02 times faster than './run_test ./cat_csv_mlr 100000'
    2.83 ± 0.02 times faster than './run_test ./cat_csv_mlr_2 100000'
    6.84 ± 0.08 times faster than './run_test ./cat_csv_csvtk 100000'
    7.88 ± 0.08 times faster than './run_test ./cat_csv_csvtk_2 100000'
   46.60 ± 0.19 times faster than './run_test ./cat_csv_qsv 100000'
  121.59 ± 0.55 times faster than './run_test ./cat_csv_xsv 100000'

Note that all of the shell scripts with the "_2" suffix invoke a sub-shell in order to handicap them in a way similar to shell scripts cat_csv_qsv and cat_csv_xsv. This is to eliminate the possibility that copying shell script arguments slowed down the qsv and xsv scripts. Script cat_csv_xsv cheats a little by invoking tail to delete CSV headers because xsv doesn't have a command similar to qsv behead or csvtk del-header.

Also, qsv cat rows will always be slower than regular cat as it's parsing the files as CSV, with special handling for ignoring the header. cat just appends the files together.

I was oversimplifying the implementation of my custom shell script. It actually uses Bash read, GNU cat, xargs, and tail. The custom script is at the end of the following shell pipeline:

find data -type f -name '*.csv' |
  sort |
  head -n 5 |
  (read first; cat $first; xargs -r tail --lines=+2 --quiet)

derekmahar · 2023-11-08T23:02:26Z

By the way, here are the versions of the various tools that I used in the performance tests.

$ ./versions
csvtk v0.28.1
mlr 6.9.0
qsv 0.118.0-mimalloc-apply;fetch;foreach;generate;geocode;Luau 0.601;python-3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0];to;polars-0.34.2;self_update-8-8;12.49 GiB-0 B-14.90 GiB-15.61 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.73.0) compiled
xsv 0.13.0
GNU tools
GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
cat (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjorn Granlund and Richard M. Stallman.
tail (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, Ian Lance Taylor,
and Jim Meyering.
xargs (GNU findutils) 4.8.0
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.

jqnatividad · 2023-11-09T10:57:41Z

Thanks for pulling together the benchmark_cat_csv project and including qsv in it.

I just tweaked cat_rows a bit to squeeze out more performance by removing the header handling outside the main hot loop, amortizing the rdr allocation, inlining some functions, and doubling the default write buffer size to 512k.

I'll still work on the --infile-list argument, but would be interested in how qsv performs now with these changes.

jqnatividad · 2023-11-09T12:41:52Z

Just added --flexible option which makes it even faster still by turning off col-count validation.

derekmahar · 2023-11-09T13:22:59Z

Unfortunately, the qsv cat rows --flexible option and the changes to qsv cat rows did not improve its CSV file concatenation performance. (Actually, the mean test completion time was just over 1 s higher than in the original test.)

$ hyperfine --warmup 3 './run_test ./cat_csv_qsv 100000' './run_test ./cat_csv_qsv_flexible 100000'
Benchmark 1: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     21.383 s ±  0.061 s    [User: 14.396 s, System: 7.228 s]
  Range (min … max):   21.294 s … 21.491 s    10 runs

Benchmark 2: ./run_test ./cat_csv_qsv_flexible 100000
  Time (mean ± σ):     21.606 s ±  0.057 s    [User: 14.537 s, System: 7.305 s]
  Range (min … max):   21.511 s … 21.669 s    10 runs

Summary
  './run_test ./cat_csv_qsv 100000' ran
    1.01 ± 0.00 times faster than './run_test ./cat_csv_qsv_flexible 100000'

$ cat cat_csv_qsv_flexible
#!/usr/bin/env bash
read first
qsv cat rows --flexible $first
xargs --no-run-if-empty sh -c 'qsv cat rows --flexible "$@" | qsv behead' cat_csv_skip_header

jqnatividad · 2023-11-09T14:07:06Z

Interesting. There should be a noticeable performance bump...

Can you try running it with qsvlite, making sure to build it with CPU optimizations in release mode?

export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
cargo build --release --locked -F lite

derekmahar · 2023-11-09T14:39:12Z

qsvlite made no difference.

$ hyperfine --warmup 3 './run_test ./cat_csv_qsvlite 100000' './run_test ./cat_csv_qsvlite_flexible 100000'
Benchmark 1: ./run_test ./cat_csv_qsvlite 100000
  Time (mean ± σ):     20.638 s ±  0.029 s    [User: 13.804 s, System: 6.907 s]
  Range (min … max):   20.602 s … 20.678 s    10 runs

Benchmark 2: ./run_test ./cat_csv_qsvlite_flexible 100000
  Time (mean ± σ):     20.765 s ±  0.044 s    [User: 13.857 s, System: 6.980 s]
  Range (min … max):   20.683 s … 20.822 s    10 runs

Summary
  './run_test ./cat_csv_qsvlite 100000' ran
    1.01 ± 0.00 times faster than './run_test ./cat_csv_qsvlite_flexible 100000'

$ cat cat_csv_qsvlite
#!/usr/bin/env bash
read first
qsvlite cat rows $first
xargs --no-run-if-empty sh -c 'qsvlite cat rows "$@" | qsvlite behead' cat_csv_skip_header
$ cat cat_csv_qsvlite_flexible
#!/usr/bin/env bash
read first
qsvlite cat rows --flexible $first
xargs --no-run-if-empty sh -c 'qsvlite cat rows --flexible "$@" | qsvlite behead' cat_csv_skip_header
$ qsvlite --version
qsvlite 0.118.0-mimalloc--8-8;12.49 GiB-0 B-14.88 GiB-15.61 GiB (x86_64-unknown-linux-gnu compiled with Rust 1.73.0) compiled
$ echo $CARGO_BUILD_RUSTFLAGS
-C target-cpu=native

jqnatividad · 2023-11-09T16:06:51Z

I specified qsvlite since it's faster to build. I just wanted to make sure CPU optimizations were enabled and it was built in release mode...

If you're so inclined @derekmahar , you can actually compile qsv with a samply profile and actually see where the bottlenecks are using samply.

export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'
cargo build --profile release-samply --locked -F lite

derekmahar · 2023-11-09T16:27:57Z

How much difference should CARGO_BUILD_RUSTFLAGS='-C target-cpu=native' make?

jqnatividad · 2023-11-09T19:27:38Z

Depends on the platform.

But you can find out what addl CPU features are enabled by going here

https://github.com/jqnatividad/qsv/blob/master/docs/PERFORMANCE.md#cpu-optimization

… `--flexible` option is enabled in connection with #1293

derekmahar · 2023-11-09T20:51:39Z

Just added --flexible option which makes it even faster still by turning off col-count validation.

--flexible did make a significant difference running on my HP Spectre x360 (Intel i7-8565U at 1.992GHz, SSD) in Ubuntu for Windows Subsystem for Linux 2:

$ ./run_tests 100000
Benchmark 1: ./run_test ./cat_csv_awk 100000
  Time (mean ± σ):      2.518 s ±  0.971 s    [User: 1.541 s, System: 1.133 s]
  Range (min … max):    1.414 s …  3.803 s    10 runs

Benchmark 2: ./run_test ./cat_csv_csvtk 100000
  Time (mean ± σ):      8.385 s ±  0.197 s    [User: 11.015 s, System: 3.808 s]
  Range (min … max):    8.123 s …  8.724 s    10 runs

Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 100000
  Time (mean ± σ):     11.923 s ±  0.814 s    [User: 13.867 s, System: 6.801 s]
  Range (min … max):   11.022 s … 13.229 s    10 runs

Benchmark 4: ./run_test ./cat_csv_custom 100000
  Time (mean ± σ):     930.7 ms ± 108.4 ms    [User: 368.0 ms, System: 616.7 ms]
  Range (min … max):   788.9 ms … 1061.9 ms    10 runs

Benchmark 5: ./run_test ./cat_csv_custom_subshell 100000
  Time (mean ± σ):     835.1 ms ±  21.6 ms    [User: 440.1 ms, System: 438.5 ms]
  Range (min … max):   804.4 ms … 871.4 ms    10 runs

Benchmark 6: ./run_test ./cat_csv_goawk 100000
  Time (mean ± σ):     878.3 ms ±  46.1 ms    [User: 440.3 ms, System: 482.4 ms]
  Range (min … max):   817.6 ms … 966.1 ms    10 runs

Benchmark 7: ./run_test ./cat_csv_mlr 100000
  Time (mean ± σ):      3.128 s ±  0.145 s    [User: 2.250 s, System: 2.229 s]
  Range (min … max):    2.915 s …  3.409 s    10 runs

Benchmark 8: ./run_test ./cat_csv_mlr_subshell 100000
  Time (mean ± σ):      3.431 s ±  0.376 s    [User: 2.410 s, System: 2.409 s]
  Range (min … max):    3.101 s …  4.417 s    10 runs

Benchmark 9: ./run_test ./cat_csv_qsv 100000
  Time (mean ± σ):     44.597 s ±  1.669 s    [User: 34.652 s, System: 10.306 s]
  Range (min … max):   42.515 s … 46.957 s    10 runs

Benchmark 10: ./run_test ./cat_csv_qsv_flexible 100000
  Time (mean ± σ):     19.450 s ±  0.949 s    [User: 15.616 s, System: 4.219 s]
  Range (min … max):   18.671 s … 22.009 s    10 runs

Benchmark 11: ./run_test ./cat_csv_xsv 100000
  Time (mean ± σ):     108.793 s ± 12.043 s    [User: 90.884 s, System: 17.915 s]
  Range (min … max):   97.630 s … 126.524 s    10 runs

Summary
  ./run_test ./cat_csv_custom_subshell 100000 ran
    1.05 ± 0.06 times faster than ./run_test ./cat_csv_goawk 100000
    1.11 ± 0.13 times faster than ./run_test ./cat_csv_custom 100000
    3.01 ± 1.17 times faster than ./run_test ./cat_csv_awk 100000
    3.75 ± 0.20 times faster than ./run_test ./cat_csv_mlr 100000
    4.11 ± 0.46 times faster than ./run_test ./cat_csv_mlr_subshell 100000
   10.04 ± 0.35 times faster than ./run_test ./cat_csv_csvtk 100000
   14.28 ± 1.04 times faster than ./run_test ./cat_csv_csvtk_subshell 100000
   23.29 ± 1.29 times faster than ./run_test ./cat_csv_qsv_flexible 100000
   53.40 ± 2.43 times faster than ./run_test ./cat_csv_qsv 100000
  130.28 ± 14.81 times faster than ./run_test ./cat_csv_xsv 100000

I ran the original test on an AMD Ryzen 9 6900HX at 3.293GHz, NVMe drive.

derekmahar · 2023-11-12T21:21:27Z

If you're so inclined @derekmahar , you can actually compile qsv with a samply profile and actually see where the bottlenecks are using samply.

I'm not certain, but I think I can't use samply because I'm running qsv and my benchmarks only in command line shells in Windows Subsystem for Linux 2 and on remote Linux servers on my home network. I may be mistaken, but I think that a process that runs in WSL 2 can't launch a local browser on its Windows host. If WSL 2 can launch a browser in the host, I don't know how to do it.

"infile-list" files is qsv's flavor of the "infile-list" support of csvtk as per #1293 In our implementation, providing a file with the ".infile-list" extension to commands that support it (currently, `sqlp` and `to`) will read the file as a list of input files to use for the command. Will add ".infile-list" support to `cat` and `headers` command as well

derekmahar · 2023-12-26T16:47:43Z

Thank you for implementing this feature!

derekmahar · 2023-12-26T20:37:26Z

By the way, qsv cat rows using an input file list is an order of magnitude faster than reading CSV file name arguments from the command line which reinforces my hypothesis that qsv's command line argument list reader is not very efficient.

Benchmark 1: ./run_test ./cat_csv_awk 10000
  Time (mean ± σ):     171.2 ms ±   3.4 ms    [User: 125.6 ms, System: 56.1 ms]
  Range (min … max):   167.9 ms … 182.0 ms    17 runs
 
Benchmark 2: ./run_test ./cat_csv_csvtk 10000
  Time (mean ± σ):     412.3 ms ±   7.4 ms    [User: 529.8 ms, System: 143.5 ms]
  Range (min … max):   401.9 ms … 429.6 ms    10 runs
 
Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 10000
  Time (mean ± σ):     450.3 ms ±  11.1 ms    [User: 579.1 ms, System: 163.7 ms]
  Range (min … max):   438.0 ms … 475.5 ms    10 runs
 
Benchmark 4: ./run_test ./cat_csv_custom 10000
  Time (mean ± σ):     145.0 ms ±   0.6 ms    [User: 106.2 ms, System: 46.5 ms]
  Range (min … max):   143.8 ms … 145.9 ms    20 runs
 
Benchmark 5: ./run_test ./cat_csv_custom_subshell 10000
  Time (mean ± σ):     148.5 ms ±   1.9 ms    [User: 105.1 ms, System: 52.0 ms]
  Range (min … max):   146.3 ms … 155.5 ms    20 runs
 
Benchmark 6: ./run_test ./cat_csv_goawk 10000
  Time (mean ± σ):     150.3 ms ±   2.0 ms    [User: 117.0 ms, System: 41.7 ms]
  Range (min … max):   148.3 ms … 155.3 ms    19 runs
 
Benchmark 7: ./run_test ./cat_csv_mlr 10000
  Time (mean ± σ):     219.9 ms ±   1.7 ms    [User: 170.5 ms, System: 80.3 ms]
  Range (min … max):   217.9 ms … 222.8 ms    13 runs
 
Benchmark 8: ./run_test ./cat_csv_mlr_subshell 10000
  Time (mean ± σ):     227.5 ms ±   2.4 ms    [User: 169.3 ms, System: 89.3 ms]
  Range (min … max):   224.5 ms … 232.4 ms    13 runs
 
Benchmark 9: ./run_test ./cat_csv_qsv 10000
  Time (mean ± σ):      2.057 s ±  0.013 s    [User: 1.368 s, System: 0.724 s]
  Range (min … max):    2.027 s …  2.078 s    10 runs
 
Benchmark 10: ./run_test ./cat_csv_qsv_infile_list 10000
  Time (mean ± σ):     276.9 ms ±   0.9 ms    [User: 234.4 ms, System: 50.5 ms]
  Range (min … max):   275.8 ms … 278.3 ms    10 runs
 
Benchmark 11: ./run_test ./cat_csv_qsv_flexible 10000
  Time (mean ± σ):      2.072 s ±  0.015 s    [User: 1.394 s, System: 0.714 s]
  Range (min … max):    2.054 s …  2.097 s    10 runs
 
Benchmark 12: ./run_test ./cat_csv_xsv 10000
  Time (mean ± σ):      4.673 s ±  0.021 s    [User: 3.906 s, System: 0.775 s]
  Range (min … max):    4.636 s …  4.700 s    10 runs
 
Summary
  './run_test ./cat_csv_custom 10000' ran
    1.02 ± 0.01 times faster than './run_test ./cat_csv_custom_subshell 10000'
    1.04 ± 0.01 times faster than './run_test ./cat_csv_goawk 10000'
    1.18 ± 0.02 times faster than './run_test ./cat_csv_awk 10000'
    1.52 ± 0.01 times faster than './run_test ./cat_csv_mlr 10000'
    1.57 ± 0.02 times faster than './run_test ./cat_csv_mlr_subshell 10000'
    1.91 ± 0.01 times faster than './run_test ./cat_csv_qsv_infile_list 10000'
    2.84 ± 0.05 times faster than './run_test ./cat_csv_csvtk 10000'
    3.11 ± 0.08 times faster than './run_test ./cat_csv_csvtk_subshell 10000'
   14.19 ± 0.11 times faster than './run_test ./cat_csv_qsv 10000'
   14.30 ± 0.12 times faster than './run_test ./cat_csv_qsv_flexible 10000'
   32.24 ± 0.20 times faster than './run_test ./cat_csv_xsv 10000'

jqnatividad · 2023-12-27T05:56:59Z

Thanks @derekmahar for compiling these benchmarks!

The docopt parser is super-convenient that's why I choose to stay with it (#463 for more details), and its good to have a baseline to keep improving its performance.

It's also good to know that qsv is faster than csvtk and just a tad slower than mlr. I may still be able to squeeze some more performance from cat.

Do you mind sharing your benchmark so I can use it for tuning?

derekmahar · 2023-12-27T08:36:15Z

benchmark_cat_csv

derekmahar · 2023-12-27T08:55:10Z

The docopt parser is super-convenient that's why I choose to stay with it (#463 for more details), and its good to have a baseline to keep improving its performance.

qsv's poor performance (aside from the input file list argument) in this benchmark may be an example of docopt/docopt.rs#207 to which you referred in #463.

It's also good to know that qsv is faster than csvtk and just a tad slower than mlr. I may still be able to squeeze some more performance from cat.

It's encouraging to know that the sluggishness of qsv cat rows is due almost entirely to its command line parser and not its core CSV parser.

jqnatividad · 2023-12-28T20:53:13Z

Hi @derekmahar , just wanted to give you a heads-up that I tweaked qsv-docopt a bit.

Hopefully, it'll perform better on your command line parsing benchmarks...

derekmahar · 2023-12-29T13:23:14Z

Your tweaks to qsv-docopt improved qsv's performance by about 5%:

Benchmark 1: ./run_test ./cat_csv_awk 10000
  Time (mean ± σ):     169.5 ms ±   1.3 ms    [User: 124.1 ms, System: 55.8 ms]
  Range (min … max):   167.9 ms … 173.7 ms    17 runs
 
Benchmark 2: ./run_test ./cat_csv_csvtk 10000
  Time (mean ± σ):     425.1 ms ±  16.1 ms    [User: 553.3 ms, System: 152.8 ms]
  Range (min … max):   410.2 ms … 462.1 ms    10 runs
 
Benchmark 3: ./run_test ./cat_csv_csvtk_subshell 10000
  Time (mean ± σ):     437.9 ms ±   7.4 ms    [User: 520.8 ms, System: 193.9 ms]
  Range (min … max):   430.6 ms … 455.0 ms    10 runs
 
Benchmark 4: ./run_test ./cat_csv_custom 10000
  Time (mean ± σ):     145.6 ms ±   1.3 ms    [User: 106.9 ms, System: 46.7 ms]
  Range (min … max):   143.9 ms … 148.3 ms    20 runs
 
Benchmark 5: ./run_test ./cat_csv_custom_subshell 10000
  Time (mean ± σ):     148.6 ms ±   0.8 ms    [User: 106.8 ms, System: 49.8 ms]
  Range (min … max):   147.0 ms … 150.0 ms    20 runs
 
Benchmark 6: ./run_test ./cat_csv_goawk 10000
  Time (mean ± σ):     150.4 ms ±   0.7 ms    [User: 114.8 ms, System: 43.9 ms]
  Range (min … max):   149.1 ms … 151.8 ms    19 runs
 
Benchmark 7: ./run_test ./cat_csv_mlr 10000
  Time (mean ± σ):     222.3 ms ±   3.9 ms    [User: 163.8 ms, System: 90.7 ms]
  Range (min … max):   217.7 ms … 232.2 ms    13 runs
 
Benchmark 8: ./run_test ./cat_csv_mlr_subshell 10000
  Time (mean ± σ):     230.1 ms ±   2.4 ms    [User: 172.9 ms, System: 89.5 ms]
  Range (min … max):   227.3 ms … 235.5 ms    13 runs
 
Benchmark 9: ./run_test ./cat_csv_qsv 10000
  Time (mean ± σ):      1.969 s ±  0.013 s    [User: 1.322 s, System: 0.683 s]
  Range (min … max):    1.945 s …  1.992 s    10 runs
 
Benchmark 10: ./run_test ./cat_csv_qsv_infile_list 10000
  Time (mean ± σ):     274.3 ms ±   2.2 ms    [User: 229.6 ms, System: 53.2 ms]
  Range (min … max):   271.7 ms … 279.5 ms    10 runs
 
Benchmark 11: ./run_test ./cat_csv_qsv_flexible 10000
  Time (mean ± σ):      1.978 s ±  0.011 s    [User: 1.320 s, System: 0.694 s]
  Range (min … max):    1.966 s …  1.998 s    10 runs
 
Benchmark 12: ./run_test ./cat_csv_xsv 10000
  Time (mean ± σ):      4.640 s ±  0.035 s    [User: 3.865 s, System: 0.783 s]
  Range (min … max):    4.585 s …  4.703 s    10 runs
 
Summary
  './run_test ./cat_csv_custom 10000' ran
    1.02 ± 0.01 times faster than './run_test ./cat_csv_custom_subshell 10000'
    1.03 ± 0.01 times faster than './run_test ./cat_csv_goawk 10000'
    1.16 ± 0.01 times faster than './run_test ./cat_csv_awk 10000'
    1.53 ± 0.03 times faster than './run_test ./cat_csv_mlr 10000'
    1.58 ± 0.02 times faster than './run_test ./cat_csv_mlr_subshell 10000'
    1.88 ± 0.02 times faster than './run_test ./cat_csv_qsv_infile_list 10000'
    2.92 ± 0.11 times faster than './run_test ./cat_csv_csvtk 10000'
    3.01 ± 0.06 times faster than './run_test ./cat_csv_csvtk_subshell 10000'
   13.52 ± 0.15 times faster than './run_test ./cat_csv_qsv 10000'
   13.58 ± 0.14 times faster than './run_test ./cat_csv_qsv_flexible 10000'
   31.86 ± 0.38 times faster than './run_test ./cat_csv_xsv 10000'

jqnatividad added the enhancement New feature or request. Once marked with this label, its in the backlog. label Sep 8, 2023

derekmahar mentioned this issue Nov 7, 2023

Add concat option "--del-header". shenwei356/csvtk#258

Closed

4 tasks

jqnatividad mentioned this issue Nov 9, 2023

cat: faster cat rows #1407

Merged

jqnatividad mentioned this issue Nov 9, 2023

cat: make cat rows faster still by adding --flexible option #1408

Merged

jqnatividad added a commit that referenced this issue Nov 9, 2023

cat: also read CSV data flexibly (do not validate col count) when…

2d3a969

… `--flexible` option is enabled in connection with #1293

jqnatividad mentioned this issue Dec 26, 2023

refactor commands that accept multiple input files to use improved process_input helper #1496

Merged

jqnatividad closed this as completed in #1496 Dec 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

derekmahar commented Sep 8, 2023

jqnatividad commented Sep 8, 2023

jqnatividad commented Nov 3, 2023 •

edited

Loading

derekmahar commented Nov 3, 2023

jqnatividad commented Nov 4, 2023

derekmahar commented Nov 8, 2023 •

edited

Loading

derekmahar commented Nov 8, 2023

jqnatividad commented Nov 9, 2023

jqnatividad commented Nov 9, 2023

derekmahar commented Nov 9, 2023 •

edited

Loading

jqnatividad commented Nov 9, 2023 •

edited

Loading

derekmahar commented Nov 9, 2023

jqnatividad commented Nov 9, 2023

derekmahar commented Nov 9, 2023

jqnatividad commented Nov 9, 2023 •

edited

Loading

derekmahar commented Nov 9, 2023 •

edited

Loading

derekmahar commented Nov 12, 2023

derekmahar commented Dec 26, 2023

derekmahar commented Dec 26, 2023 •

edited

Loading

jqnatividad commented Dec 27, 2023

derekmahar commented Dec 27, 2023 •

edited

Loading

derekmahar commented Dec 27, 2023 •

edited

Loading

jqnatividad commented Dec 28, 2023

derekmahar commented Dec 29, 2023

Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

Add option to cat that accepts the name of a file that contains a list of input CSV data file names. #1293

Comments

derekmahar commented Sep 8, 2023

jqnatividad commented Sep 8, 2023

jqnatividad commented Nov 3, 2023 • edited Loading

derekmahar commented Nov 3, 2023

jqnatividad commented Nov 4, 2023

derekmahar commented Nov 8, 2023 • edited Loading

derekmahar commented Nov 8, 2023

jqnatividad commented Nov 9, 2023

jqnatividad commented Nov 9, 2023

derekmahar commented Nov 9, 2023 • edited Loading

jqnatividad commented Nov 9, 2023 • edited Loading

derekmahar commented Nov 9, 2023

jqnatividad commented Nov 9, 2023

derekmahar commented Nov 9, 2023

jqnatividad commented Nov 9, 2023 • edited Loading

derekmahar commented Nov 9, 2023 • edited Loading

derekmahar commented Nov 12, 2023

derekmahar commented Dec 26, 2023

derekmahar commented Dec 26, 2023 • edited Loading

jqnatividad commented Dec 27, 2023

derekmahar commented Dec 27, 2023 • edited Loading

derekmahar commented Dec 27, 2023 • edited Loading

jqnatividad commented Dec 28, 2023

derekmahar commented Dec 29, 2023

jqnatividad commented Nov 3, 2023 •

edited

Loading

derekmahar commented Nov 8, 2023 •

edited

Loading

derekmahar commented Nov 9, 2023 •

edited

Loading

jqnatividad commented Nov 9, 2023 •

edited

Loading

jqnatividad commented Nov 9, 2023 •

edited

Loading

derekmahar commented Nov 9, 2023 •

edited

Loading

derekmahar commented Dec 26, 2023 •

edited

Loading

derekmahar commented Dec 27, 2023 •

edited

Loading

derekmahar commented Dec 27, 2023 •

edited

Loading