csv-parser

Quick, multi-threaded CSV parser with focus on handling huge files.

Features

Quick: Loads a 400MB CSV (5 float columns, 10M rows) in 8 seconds (on an i7-4790K). Papa Parse, which claims to be the fastest CSV parser in the browser took twice as long to parse the same file on the same system – when parsing everything as strings. When enabling type parsing, it took over a minute.
Supports preprocessing the data: Loading the data first and processing it later can waste RAM by storing columns which you don't actually need. Instead, you can specify generator functions to create custom columns from the parsed input data, allowing you to immediately discard it afterwards to reduce memory usage.
Data is returned in chunks: Especially for very large files (multiple GB), you may want to work with the available data before everything is parsed. This also allows the usage of infinite data streams.
Sensible data storage: All scalar data is stored as typed arrays with ArrayBuffers/SharedArrayBuffers as underlying storage. This has multiple advantages:
- Lower memory footprint: You can choose the required byte size of your buffer. Depending on your requirements, this can drastically reduce memory usage in comparison to an array of numbers (each being a 64-bit float). But even when using a Float64Array, the memory usage tends to be less, as each number in the respective the Array<number> typically measures more than 8 bytes.
- Easier usage of low-level interfaces, such as sending data to the GPU with WebGL.
- SharedArrayBuffer allows multiple threads (workers) to access the data without duplicating it. Can be enabled by setting the sharedArrayBuffer option to true. You'll need to add two security headers when hosting your website.

Usage

See the apps directory for multiple example implementations using the parser.

Note regarding huge files on Google Chrome

Chrome only ever allows a tab to use 4GB of RAM, even when running on 64-bit machines. This means when parsing big files, you may run into issues with your tab crashing with a STATUS_BREAKPOINT error message. Example: Parsing a 2GB file with 5 32-bit float columns and 50M rows, the parsed arrays will measure roughly 1GB. During parsing, the memory consumption can approach 4GB due to intermediate values being created. As a workaround for this limitation, you can use Firefox, which does allow using more than 4GB RAM.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
.vscode		.vscode
apps		apps
packages/csv-parser		packages/csv-parser
scripts		scripts
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.eslint.json		tsconfig.eslint.json
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

csv-parser

Features

Usage

Note regarding huge files on Google Chrome

About

Releases

Packages

Languages

License

hpicgs/csv-parser

Folders and files

Latest commit

History

Repository files navigation

csv-parser

Features

Usage

Note regarding huge files on Google Chrome

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages