-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
string_decoder: RangeError introduced by 6.2.1 #7308
Comments
I've got a reproducing example now: 'use strict';
const StringDecoder = require('string_decoder').StringDecoder;
const url = 'https://raw.githubusercontent.com/CindyJS/CindyJS/0d04d273c3/ref/img/RostS5.png';
let chunks = [];
require('https').get(url, res => {
if (res.statusCode !== 200) throw Error(res.statusMessage);
res.on('data', d => chunks.push(d));
res.on('end', haveImg);
});
function haveImg() {
const buf = Buffer.concat(chunks);
if (buf.length !== 181909) throw Error("Unexpected size: " + buf.length);
const decoder = new StringDecoder('utf8');
decoder.write(buf.slice(0, 65536));
decoder.write(buf.slice(65536, 131072));
decoder.write(buf.slice(131072, 181909));
decoder.end();
} Yes, I'm feeding a PNG file to an UTF-8 decoder, so I don't expect the result to make much sense. Mostly replacement characters. I'd not expect a throw, though. The chunking appears to be relevant here; feeding the whole file to the decoder in one piece will not throw. |
And a smaller reproducing example: 'use strict';
const decoder = new (require('string_decoder').StringDecoder)('utf8');
decoder.write(Buffer.from('c9b5a9', 'hex'));
decoder.write(Buffer.from('41', 'hex')); |
Yes, I see it now. I will have a PR to fix this soon. I am not sure why the exception message is pointing to the wrong text though (and even the wrong source file). |
@mscdex My first guess was that |
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: nodejs#7308
There have been problems with utf8 decoding in cases where the input was invalid. Some cases would give different results depending on chunking, while others even led to exceptions. This commit simplifies the code in several ways, reducing the risk of breakage. Most importantly, the `text` method is not only used for bulk conversion of a single chunk, but also for the conversion of the mini buffer `lastChar` while handling characters spanning chunks. That should make most of the problems due to incorrect handling of special cases disappear. Secondly, that `text` method is now independent of encoding. The encoding-dependent method `complete` now has a single well-defined task: determine the buffer position up to which the input consists of complete characters. The actual conversion is handled in a central location, leading to a cleaner and leaner internal interface. Thirdly, we no longer try to remember just how many bytes we'll need for the next complete character. We simply try to fill up the `nextChar` buffer and perform a conversion on that. This reduces the number of internal counter variables from two to one, namely `partial` which indicates the number of bytes currently stored in `nextChar`. A possible drawback of this approach is that there is chance of degraded performance if input arrives one byte at a time and is from a script using long utf8 sequences. As a benefit, though, this allows us to emit a U+FFFD replacement character sooner in cases where the first byte of an utf8 sequence is not followed by the expected number of continuation bytes. Fixes: nodejs#7308
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: nodejs#7308 PR-URL: nodejs#7310 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: #7308 PR-URL: #7310 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: #7308 PR-URL: #7310 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>
The update from 6.2.0 to 6.2.1 broke tests (on Travis and elsewhere) for CindyJS:
The fact that the quoted line is a comment is highly confusing. It makes it hard to get a sound idea as to what's actually going on here. So I bisected the problem to 79ad172 for PR #6777.
So far I haven't managed to come up with a small reproducing example for this. The issue is perfectly reproducible in CindyJS/CindyJS@c685ce4, though. Just run
node make forbidden
in the top-level directory of a git clone (not a mere tarball download):nvm install v6.2.1 git clone https://github.com/CindyJS/CindyJS.git cd CindyJS git checkout c685ce4 node make forbidden
It will check groups of files against forbidden regular expressions. The high-level task specification is in
make/build.js
, with the actual command implementation inmake/commands.js
. If I only operate on files matching globs, I don't see this error message, so the code runninggit ls-files
is probably involved somehow. But even when I extracted those parts into a smaller separate script, I couldn't reproduce the issue, so I'm not sure what other components play a role here.Note that there will be an error message about an attempt to re-evaluate the
fs
module, which is due to the outdated version ofgraceful-fs
used byunzip
. That is unrelated; uncommenting therequire("unzip")
line makes it go away while still reproducing the issue. I'll probably switch to something like yauzl soon.The text was updated successfully, but these errors were encountered: