Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken result when decoding large files #104

Closed
ghost opened this issue Sep 3, 2015 · 8 comments
Closed

Broken result when decoding large files #104

ghost opened this issue Sep 3, 2015 · 8 comments

Comments

@ghost
Copy link

ghost commented Sep 3, 2015

With iconv-lite I encountered large files to cause problems on decoding. There aren't any thrown errors, but the decoded result is broken at least by stripped lines at the file's end.
I cut it down to a mcve: The issue is given on file too-large.html (507kb), whereby file large.html (501kb) decodes and encodes correctly.

I also added a zip-archive including the folder node_modules.

For my project I switched to module node-iconv.

@ghost ghost changed the title Broken result with decoding large files Broken result when decoding large files Sep 3, 2015
@ashtuchkin
Copy link
Owner

That's a large work, thank you so much for doing it! I'll look into it asap.

@ashtuchkin
Copy link
Owner

Just tried it and I have a different too-large-conv.html produced. What version of node do you use?

@ashtuchkin
Copy link
Owner

Tried node versions 0.12.4, 0.11.13, 0.12.7, io.js v3.3.0, all worked well (too-large.html and too-large-conv.html were the same). Maybe architecture is different? Are you using x86 or x64? Can you give more info on your machine specs?

@ghost
Copy link
Author

ghost commented Sep 4, 2015

The problem is still given, Windows 10 (updated from Windows 7), Node v0.12.7 x64. Also I tried Node versions 0.12.4, 0.11.13, both x86 and x64.
On a notebook I too wasn't able to reproduce the raised behaviour. I attempted to run the decoding with Node v012.7 x64 on fresh installed operating systems Ubuntu 14.04.3 and Windows 10.

Do you have any other hints to look for?

@ashtuchkin
Copy link
Owner

Hmm.. All my test machines are mac/Ubuntu, so probably related to Windows.

My guess is that line endings are changed somewhere on load/save file, on
Node level or on OS level.

Can you write a function that checks line endings in Buffer-s before and
after iconv-lite conversion?

Another test - can you skip conversion for 'too-large', just load - copy
buffer - save?

Third test - can you try io.js latest? They use newer version of V8 & libuv
that could have this issue resolved.
On Sep 4, 2015 4:27 AM, "bit-seq" notifications@github.com wrote:

The problem is still given, Windows 10 (updated from Windows 7), Node
v0.12.7 x64.
On a notebook I too wasn't able to reproduce the raised behaviour. I
attempted to run the decoding with Node v012.7 x64 on fresh installed
operating systems Ubuntu 14.04.3 and Windows 10.

Any hints to look for?


Reply to this email directly or view it on GitHub
#104 (comment)
.

@ghost
Copy link
Author

ghost commented Sep 10, 2015

A good hint to mention line-endings to cause additional trouble. However, they are just involved as they increased file-size by another character per line-break. On my Windows machine with option autocrlf true, Git for Windows checks out Windows-style CRLF and commits Unix-style LF line-endings.

With an increased file-size the issue is reproducible on machines treating line-endings Unix-style.
I added lines to the example files to again have too-large.html (508kb) causing problems, whereby file large.html (504kb) decodes and encodes correctly. There's some debug output to generate files highlighting the content of every step and the according line-endings. On a unix or ios system there won't be any CRLFs on a checkout, however, by unzipping the archive the first files of the chain should contain CRLFs. Calling the function eolUnix forces LF line-endings even for CRLF files.

Node v0.12.7 still shows the affected behaviour. The result is correct using io.js v3.0.0 and node v4.0.0.

@ashtuchkin
Copy link
Owner

Ok, so I found the problem here. The issue is a node bug and documented as node#1024. Affected Node versions: v0.11.7 - v1.4.3. Fixing commit: 1640dedb3b2

A bit more context: when decoding, iconv-lite fills out ucs2-encoded Buffer that then is converted to JS string. The linked issue prevented correct conversion at this step.

The fix is to just upgrade node. Thanks for creating a reproducible test case.

@ghost
Copy link
Author

ghost commented Sep 28, 2015

Thank you for your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant