Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for multiple line endings in a single file #5

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Lukazar
Copy link

@Lukazar Lukazar commented Apr 3, 2014

Hey!

I had a use case where a user would upload a csv file and then I'd parse it converting it to JSON. Since a user would submit a windows csv ( \r line endings ) or a *nix csv ( \n line endings), it was necessary for me to allow an or case when determining the value of this.endLine.

Therefore I introduced a bit of code to allow this. endLine can now be an array. To maintain backward compatibility, I did not mandate an array to be passed. I found this useful and thought perhaps it should be included.

Cheers

@lbdremy
Copy link
Owner

lbdremy commented Apr 26, 2014

Hi @Lukazar,

Sorry for the delay, I had a quick look and it seems that the change make the parsing really slow compared to before.

before:

lbdremy@precision:~/workspace/nodejs/csv-stream$ node bench/parse.js 
232 ms
43107.59482758621 bytes/ms
226 ms
44252.04424778761 bytes/ms
226 ms
44252.04424778761 bytes/ms
227 ms
44057.101321585906 bytes/ms
226 ms
44252.04424778761 bytes/ms
225 ms
44448.72 bytes/ms
227 ms
44057.101321585906 bytes/ms
229 ms
43672.323144104805 bytes/ms
226 ms

after:

lbdremy@precision:~/workspace/nodejs/csv-stream$ node bench/parse.js 
1364 ms
7332.08357771261 bytes/ms
1363 ms
7337.462949376376 bytes/ms
1362 ms
7342.850220264318 bytes/ms
1363 ms
7337.462949376376 bytes/ms
1365 ms
7326.712087912088 bytes/ms
1364 ms
7332.08357771261 bytes/ms
1361 ms

I need to spend more time on it to find out how to fix it but in the same time if you want to have fun with it, you can have a look to what I did the first time to find out performance issues in this gist https://gist.github.com/lbdremy/3805481, it might help.

@Lukazar
Copy link
Author

Lukazar commented Apr 26, 2014

Ah, right you are! It is almost assuredly the Array functions. I bet switching the look up around line 82 to a hash will get some of that time back. I'll play around with it at some point. Thanks for the perf dump, that will help!

Cheers

@fireridlle
Copy link

fill myself like necromanct)) but will it be implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants