Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising behavior when chunk size > maxLength option #23

Closed
IronSavior opened this issue Dec 11, 2018 · 6 comments
Closed

Surprising behavior when chunk size > maxLength option #23

IronSavior opened this issue Dec 11, 2018 · 6 comments

Comments

@IronSavior
Copy link
Contributor

My expectation of the maxLength option is that the stream will error when buffering of any one line grows the buffer to exceed the given length. I understand this may not match the true intent of this option, but I'd like to ask for clarification. The surprising behavior occurs when processing a chunk larger than the maxLength option, but also contains no lines that individually exceed maxLength.

This is more easily demonstrated in this test:

test('chunk size > maxLength', function (t) {
  t.plan(2)

  var input = split({ maxLength: 2 })

  input.pipe(strcb(function (err, list) {
    t.error(err)
    t.deepEqual(list, ['a', 'b'])
  }))

  input.end("a\nb") // Two lines, each length 1 in single chunk of length 3
})

This is surprising because it should be very common for the typical chunk size to be much greater than the typical line length. The maxLength option could therefore only be useful when set to at least twice the typical chunk size. Since a stream can't control the size of chunks it receives, this seems like a logic error.

Could you please comment?

@IronSavior
Copy link
Contributor Author

In the main transform function, the buffer length is checked before pushing any fully terminated lines. Would it make more sense to check the length of the buffer after pushing terminated lines?

@IronSavior
Copy link
Contributor Author

This seems to pass all current tests as well as the one in this issue:

function transform (chunk, enc, cb) {
  this[kLast] += this[kDecoder].write(chunk)

  var list = this[kLast].split(this.matcher)

  this[kLast] = list.pop()

  for (var i = 0; i < list.length; i++) {
    push(this, this.mapper(list[i]))
  }

  if (this[kLast].length > this.maxLength) {
    return cb(new Error('maximum buffer reached'))
  }
  
  cb()
}

@mcollina
Copy link
Owner

Sure, would you like to send a PR?

@IronSavior
Copy link
Contributor Author

I can do that. With the change described above, it delays the error until the buffer (meaning this[kLast]) is not able to be flushed/reduced to below maxLength between chunks. That should be less surprising as a mechanism to limit buffer growth, but it still doesn't operate on each line--this is acceptable in my specific use case.

I happened to stumble upon this because I was adding an alternative overflow behavior for my use case where the stream seeks to the start of the next line rather than erroring-out when maxLength is violated. The net result is that the stream would effectively filter out any lines that resulted in buffer overflows instead of throwing.

The behavior is described in this test:

test('maximum buffer limit w/skip', function (t) {
  t.plan(2)

  var input = split({ maxLength: 2, skipOverflow: true })

  input.pipe(strcb(function (err, list) {
    t.error(err)
    t.deepEqual(list, ['a', 'b', 'c'])
  }))

  input.write('a\n123')
  input.write('456')
  input.write('789\nb\nc')
  input.end()
})

Would you be interested in a PR for this alternative overflow behavior as well?

@mcollina
Copy link
Owner

go for it

IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also adds a test to catch surprising maxLength behavior described in mcollina#23
IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also addresses surprising maxLength behavior described in mcollina#23
IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also addresses surprising maxLength behavior described in mcollina#23
IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also addresses surprising maxLength behavior described in mcollina#23
IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also addresses surprising maxLength behavior described in mcollina#23
IronSavior added a commit to IronSavior/split2 that referenced this issue Dec 12, 2018
Also addresses surprising maxLength behavior described in mcollina#23
@mcollina
Copy link
Owner

Fixed in 3.1.0 and #24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants