csvs (CSV Sql) is a command-line tool that simplifies working with CSV or TSV files by enabling SQL queries through an embedded SQLite engine. It is ideal for data analysts and developers who need SQL's flexibility to manage text-based data efficiently.
Run advanced SQL queries, including JOIN
, GROUP BY
, SUM()
, or COUNT()
on CSV data.
Gain unparalleled flexibility to query, filter, sort, group, and combine data compared to traditional spreadsheet tools.
csvs also supports regular expressions in SQL queries. Refer to Regular Expressions Document.
Eliminate encoding issues with automatic detection of character encodings. Avoid garbled text and broken queries effortlessly.
csvs scans CSV rows to determine the most appropriate data types for SQLite tables. This dynamic analysis ensures compatibility and precision, even when handling nullable fields.
See Decide Data Type for Each Column and Validating Number Document for details.
Combine data from multiple CSV or TSV files by creating a temporary SQLite database using --in-file
. Easily perform
SQL joins across files in seconds.
Common use cases:
- Merging datasets from separate files.
- Cross-referencing data using SQL joins.
Export query results as:
- CSV or TSV: Ideal for data sharing or further processing.
- SQLite Database: Retain results as
.db
files for future queries.
Control delimiters, headers, and quoting styles to suit your needs.
Explore datasets interactively without specifying queries upfront. Features include:
- Browsing imported tables.
- Ad-hoc query execution.
- Previewing and saving query results.
Execute multiple SQL statements in a single command. Transform and query data across multiple steps, with only the final result displayed.
- Display help:
csvs --help
Start csvs in interactive mode when neither --query
nor --source
is specified. This mode allows you to:
- View imported tables.
- Preview table content.
- Save query results to files interactively.
- Display version:
csvs --version
- Select specific fields from
./data/address.csv
and save the results topicked.csv
:
csvs -i ./data/address.csv -q 'SELECT "city","town","phone" FROM "address.csv"' -o picked.csv
- Process CSV data from
STDIN
:
csvs -q 'SELECT "city","town","phone" FROM "stdin"' < ./data/address.csv > picked.csv
or
cat ./data/address.csv | csvs -q 'SELECT "city","town","phone" FROM "stdin"' > picked.csv
- Perform SQL joins across multiple files:
csvs -i ./left.csv -i ./right.tsv -q 'SELECT * FROM "left.csv" AS l JOIN "right.tsv" AS r ON l."name"=r."name"'
- Leverage SQLite functions like
UPPER()
,COUNT()
, etc., and export results to a SQLite database:
csvs -i people.csv -q 'SELECT "city",COUNT(*) FROM "people.csv" GROUP BY "city" ORDER BY COUNT(*) DESC' --out-database out.db
- Start in interactive mode:
csvs -i MOCK_DATA.csv
File names provided with --in-file
map directly to SQLite tables (e.g. ./sample/address.csv
becomes "address.csv"
,
and data.2024.csv
becomes "data.2024.csv"
).
Columns or table names with spaces, punctuation, or reserved words must be quoted. Example:
SELECT "first name", "last name"
FROM "contacts.csv"
If specified, column names default to "c1", "c2", "c3", etc., for header-less CSV files.
Separate SQL statements with semicolons to execute multiple queries in sequence. Only the result of the final query is displayed.
Example:
SELECT "first name"
FROM "contacts.csv";
SELECT "age"
FROM "contacts.csv";
See Build Guide
- Interactive mode cannot be invoked when CSV data is provided via
STDIN
. Use--in-file
to specify CSV files instead. - Large files may require significant RAM since csvs loads entire files into memory when
--out-database
is not specified. - CSV files with names starting with
sqlite_
cannot be used with--in-file
due to SQLite's reserved naming convention.
csvs relies on the following open-source projects:
- SQLite
- anyhow
- chardetng
- clap
- clap-help
- csv
- encoding_rs
- encoding_rs_rw
- indicatif
- lazy-regex
- ratatui
- r2d2
- r2d2_sqlite
- rusqlite
- smashquote
- sqlparser
- tracing
- tracing-logfmt
- tuirealm
- And many more! See the full list in the source repository.
csvs is licensed under the MIT license.