Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.06 KB

README.md

File metadata and controls

30 lines (20 loc) · 1.06 KB

Csv2parquet

Converting csv 2 parquet as fast as I can, using rust.

Why?

Being forced to waste a lot of time just waiting my slow machine reading bunch of CSV files, I was just curious how much I can improve on this end.

Why Rust?

Modern C++ is getting harder to catch-up. My C/C++ code style is more like just C + STL libraries, not leveraging recent C++ standards. It was easier to learn Rust from the scratch, instead of improving my poor C style I'm familiar with more than 10 years.

Implementation details

As current rust parquet implementation does not have buffered column writer like cpp version, I had to implement my own version of buffered column writer, and some copy-and-paste version of row group writer etc.

Future improvements

Currently buffer is just using vec[u8] where a lot of copy will happen if we write more and more data. Would be nice if I can find some linked list buffer supporting append operation.