A small utility to display some metadata about the fields available in a "csv" (or otherwise delimited) data file. This came about because I got tired of trying to determine the field lengths in a source data file that was very large and also wanted to learn some Rust.
This is just here while I tinker with some "applied" learning of Rust - don't trust it to do what it says on the tin.
You should really probably be using XSV: https://github.com/BurntSushi/xsv but, I am taking examples in idea from there and implementing them here because, again, this is a "hands on" place for me to do some learnin'.
1. git clone git@github.com:sullivant/csvinfo.git
2. cd csvinfo
3. cargo build --release
4. cp ./target/release/csvinfo ~/bin (or somewhere in your path)
$ ./csvinfo --help
CSV Utils 0.4.3
Thomas Sullivan <sullivan.t@gmail.com>
Shows some info on CSV files.
USAGE:
csvinfo [FLAGS] [OPTIONS] <file>
FLAGS:
-h, --help Prints help information
-q, --quotes When passed, data is quoted.
-s, --skip When used, skips the first record (header)
-V, --version Prints version information
OPTIONS:
-d, --delim <delim> Sets the field delimiter to use (example: -d '|'), default is ','
-m, --max <max_records> When provided, will stop gathering data after N records
ARGS:
<file> Sets the input file to use
$ ./csvinfo ../tmp/file_of_data.csv -d'|'
175646 records in file (| delim).
Field Max Len ( %int %float %char ) Empty? Title
1 5 ( 0.00 0.00 100.00 ) Type
2 6 ( 25.00 25.00 50.00 ) Value
- Use the crate "clap" as a way to pass CLI parameters
- Allow for any single char passed as parameter
- Allow for quoted values
- Allow for field names to be gathered from header data instead of "field 1, field 2..."
- Test cases
- Prettier looking CLI output
- Which are always numeric?
- Which have empty vals?
- Trim extra spaces (eg:
"Name", "Age","Location"
) - Files without headers - autogen names instead of using first vals
- Decide if we want to allow for mixed quoted values (some quoted, some not)
- Process escaped delimiters
- Status bar on CLI while waiting/processing
- Add more "metadata" to the output; instead of all the fields, maybe bucket them into sizes? ( wide files look odd in the results )