string_decoder: RangeError introduced by 6.2.1 #7308

gagern · 2016-06-15T12:59:03Z

Version: 6.2.1
Platform: Linux 4.6.1-gentoo x86_64 AuthenticAMD GNU/Linux
Subsystem: string_decoder

The update from 6.2.0 to 6.2.1 broke tests (on Travis and elsewhere) for CindyJS:

_stream_readable.js:65
  // not happen before the first write call.

RangeError: out of range index
    at RangeError (native)
    at StringDecoder.fillLast (string_decoder.js:94:9)
    at StringDecoder.write (string_decoder.js:73:14)
    at readableAddChunk (_stream_readable.js:160:31)
    at ReadStream.Readable.push (_stream_readable.js:130:10)
    at onread (fs.js:1774:12)
    at FSReqWrap.wrapper [as oncomplete] (fs.js:675:17)

The fact that the quoted line is a comment is highly confusing. It makes it hard to get a sound idea as to what's actually going on here. So I bisected the problem to 79ad172 for PR #6777.

So far I haven't managed to come up with a small reproducing example for this. The issue is perfectly reproducible in CindyJS/CindyJS@c685ce4, though. Just run node make forbidden in the top-level directory of a git clone (not a mere tarball download):

nvm install v6.2.1
git clone https://github.com/CindyJS/CindyJS.git
cd CindyJS
git checkout c685ce4
node make forbidden

It will check groups of files against forbidden regular expressions. The high-level task specification is in make/build.js, with the actual command implementation in make/commands.js. If I only operate on files matching globs, I don't see this error message, so the code running git ls-files is probably involved somehow. But even when I extracted those parts into a smaller separate script, I couldn't reproduce the issue, so I'm not sure what other components play a role here.

Note that there will be an error message about an attempt to re-evaluate the fs module, which is due to the outdated version of graceful-fs used by unzip. That is unrelated; uncommenting the require("unzip") line makes it go away while still reproducing the issue. I'll probably switch to something like yauzl soon.

The text was updated successfully, but these errors were encountered:

gagern · 2016-06-15T14:53:39Z

I've got a reproducing example now:

'use strict';
const StringDecoder = require('string_decoder').StringDecoder;
const url = 'https://raw.githubusercontent.com/CindyJS/CindyJS/0d04d273c3/ref/img/RostS5.png';
let chunks = [];
require('https').get(url, res => {
  if (res.statusCode !== 200) throw Error(res.statusMessage);
  res.on('data', d => chunks.push(d));
  res.on('end', haveImg);
});
function haveImg() {
  const buf = Buffer.concat(chunks);
  if (buf.length !== 181909) throw Error("Unexpected size: " + buf.length);
  const decoder = new StringDecoder('utf8');
  decoder.write(buf.slice(0, 65536));
  decoder.write(buf.slice(65536, 131072));
  decoder.write(buf.slice(131072, 181909));
  decoder.end();
}

Yes, I'm feeding a PNG file to an UTF-8 decoder, so I don't expect the result to make much sense. Mostly replacement characters. I'd not expect a throw, though. The chunking appears to be relevant here; feeding the whole file to the decoder in one piece will not throw.

gagern · 2016-06-15T15:15:26Z

And a smaller reproducing example:

'use strict';
const decoder = new (require('string_decoder').StringDecoder)('utf8');
decoder.write(Buffer.from('c9b5a9', 'hex'));
decoder.write(Buffer.from('41', 'hex'));

mscdex · 2016-06-15T15:25:15Z

Yes, I see it now. I will have a PR to fix this soon.

I am not sure why the exception message is pointing to the wrong text though (and even the wrong source file).

gagern · 2016-06-15T15:59:30Z

@mscdex My first guess was that Q.longStackSupport=true might be involved somehow, but that wasn't the case. And the reproducing example didn't reproduce that aspect of the problem either. Feels like another bug, probably unrelated to what we have here. Tracking the source of that location information through v8 internal code is no fun, though…

This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: nodejs#7308

There have been problems with utf8 decoding in cases where the input was invalid. Some cases would give different results depending on chunking, while others even led to exceptions. This commit simplifies the code in several ways, reducing the risk of breakage. Most importantly, the `text` method is not only used for bulk conversion of a single chunk, but also for the conversion of the mini buffer `lastChar` while handling characters spanning chunks. That should make most of the problems due to incorrect handling of special cases disappear. Secondly, that `text` method is now independent of encoding. The encoding-dependent method `complete` now has a single well-defined task: determine the buffer position up to which the input consists of complete characters. The actual conversion is handled in a central location, leading to a cleaner and leaner internal interface. Thirdly, we no longer try to remember just how many bytes we'll need for the next complete character. We simply try to fill up the `nextChar` buffer and perform a conversion on that. This reduces the number of internal counter variables from two to one, namely `partial` which indicates the number of bytes currently stored in `nextChar`. A possible drawback of this approach is that there is chance of degraded performance if input arrives one byte at a time and is from a script using long utf8 sequences. As a benefit, though, this allows us to emit a U+FFFD replacement character sooner in cases where the first byte of an utf8 sequence is not followed by the expected number of continuation bytes. Fixes: nodejs#7308

This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: nodejs#7308 PR-URL: nodejs#7310 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>

This commit fixes an issue when extra utf8 continuation bytes appear at the end of a chunk of data, causing miscalculations to be made when checking how many bytes are needed to decode a complete character. Fixes: #7308 PR-URL: #7310 Reviewed-By: Colin Ihrig <cjihrig@gmail.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Fedor Indutny <fedor.indutny@gmail.com>

This was referenced Jun 15, 2016

string_decoder: rewrite implementation #6777

Merged

Set travis node version to 6.2.0 CindyJS/CindyJS#374

Merged

mscdex added the string_decoder Issues and PRs related to the string_decoder subsystem. label Jun 15, 2016

gagern mentioned this issue Jun 15, 2016

Warn developers about node 6.2.1 CindyJS/CindyJS#378

Merged

mscdex mentioned this issue Jun 15, 2016

string_decoder: fix bad utf8 character handling #7310

Merged

3 tasks

gagern mentioned this issue Jun 16, 2016

string_decoder: fix handling of malformed utf8 #7318

Closed

3 tasks

mscdex closed this as completed in #7310 Jun 24, 2016

tmcw mentioned this issue Apr 12, 2017

Error when run from CI documentationjs/documentation#668

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

string_decoder: RangeError introduced by 6.2.1 #7308

string_decoder: RangeError introduced by 6.2.1 #7308

gagern commented Jun 15, 2016

gagern commented Jun 15, 2016

gagern commented Jun 15, 2016

mscdex commented Jun 15, 2016

gagern commented Jun 15, 2016

string_decoder: RangeError introduced by 6.2.1 #7308

string_decoder: RangeError introduced by 6.2.1 #7308

Comments

gagern commented Jun 15, 2016

gagern commented Jun 15, 2016

gagern commented Jun 15, 2016

mscdex commented Jun 15, 2016

gagern commented Jun 15, 2016