csvtools: Tools for exploring and modifying delimited files in preparation of importing for data analysis
This package contains command line scripts that help me explore and modify delimited files (csv) in preparation of importing them for data analysis. These aren’t meant to replace the awesome csvkit package. They are just common tasks that I find myself doing.
These scripts depend on standard Unix command utilities such as awk
, sed
, bash
, and python
. Some rely on R
.
count_delimiter_report.sh
: Are there the same number of delimiters in each row? Count the number of delimiters in each row within a file and summarize. We could discover that there are line break characters within a data field, leading to ‘short’ lines, and/or the delimiter character exists within a data field, leading to extra delimiters in a line. It could also be that the data generator was sloppy and more or less delimiters are present in some lines.csvconvert.py
: Convert delimited file from one delimiter to another; defaults to converting CSV to pipe-delimited.repair_linebreaks_in_fields.py
: Remove line breaks within data fields (so each line has at least the same number of delimiters as line 1).csvdescribe.py
: Describe a delimited file: variable name, variable type, max length.generate_sas_import.R
: create a sas program that imports a delimited file from the description generated bycsvdescribe.py
.
- These scripts are released under the GPL 3.0 license.