Convert CSV files to Avro like a boss.
$ gem install csv2avro
or if you prefer to live on the edge, just clone this repository and build it from scratch.
$ csv2avro --schema ./spec/support/schema.avsc ./spec/support/data.csv
This will process the data.csv file and creates a data.avro file and a data.bad file with a report of the bad rows.
You can override the bad rows report file location with the --bad-rows [BAD_ROWS]
option.
$ cat ./spec/support/data.csv | csv2avro --schema ./spec/support/schema.avsc --bad-rows ./spec/support/data.bad > ./spec/support/data.avro
This will process the input stream and push the avro data to the output stream. If you're working with streams you will need to specify the --bad-rows
location.
aws s3 cp s3://csv-bucket/transactions.csv - | csv2avro --schema ./transactions.avsc --bad-rows ./transactions.bad | aws s3 cp - s3://avro-bucket/transactions.avro
This will stream your file stored in AWS S3, converts the data and pushes it back to S3. For more information, please check the AWS CLI documentation.
gunzip -c ./spec/support/data.csv.gz | csv2avro --schema ./spec/support/schema.avsc --bad-rows ./spec/support/data.bad > ./spec/support/data.avro
This will uncompress the file and converts it to avro, leaving the original file intact.
For a full list of available options, run csv2avro --help
$ csv2avro --help
Version 1.3.0 of CSV2Avro
Usage: csv2avro [options] [file]
-s, --schema SCHEMA A file containing the Avro schema. This value is required.
-b, --bad-rows [BAD_ROWS] The output location of the bad rows report file.
-d, --delimiter [DELIMITER] Field delimiter. If none specified, then comma is used as the delimiter.
-l, --line-ending [LINE_ENDING] Line ending character used as row separator in CSV parsing
-a [ARRAY_DELIMITER], Array field delimiter. If none specified, then comma is used as the delimiter.
--array-delimiter
-D, --write-defaults Write default values.
-c, --stdout Output will go to the standard output stream, leaving files intact.
-h, --help Prints help
- Fork it ( https://github.com/sspinc/csv2avro/fork )
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request