Skip to content

Commit

Permalink
doc: document switchover from measured bytes to chars after setEncoding
Browse files Browse the repository at this point in the history
This commit documents and edge-case behavior in readable streams. It is
expected that non-object streams are measured in bytes against the
highWaterMark. However, it was discovered in issue #6798 that after calling
.setEncoding() on the stream, it will thereafter begin to measure the buffer's
length in characters.
  • Loading branch information
jalafel committed Aug 31, 2017
1 parent 365c245 commit 6e8ecc2
Showing 1 changed file with 20 additions and 5 deletions.
25 changes: 20 additions & 5 deletions doc/api/stream.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ buffer that can be retrieved using `writable._writableState.getBuffer()` or

The amount of data potentially buffered depends on the `highWaterMark` option
passed into the streams constructor. For normal streams, the `highWaterMark`
option specifies a total number of bytes. For streams operating in object mode,
the `highWaterMark` specifies a total number of objects.
option specifies a [total number of bytes][hwm-gotcha]. For streams operating
in object mode, the `highWaterMark` specifies a total number of objects.

Data is buffered in Readable streams when the implementation calls
[`stream.push(chunk)`][stream-push]. If the consumer of the Stream does not
Expand Down Expand Up @@ -1517,9 +1517,9 @@ constructor and implement the `readable._read()` method.
#### new stream.Readable([options])

* `options` {Object}
* `highWaterMark` {number} The maximum number of bytes to store in
the internal buffer before ceasing to read from the underlying
resource. Defaults to `16384` (16kb), or `16` for `objectMode` streams
* `highWaterMark` {number} The maximum [number of bytes][hwm-gotcha] to store
in the internal buffer before ceasing to read from the underlying resource.
Defaults to `16384` (16kb), or `16` for `objectMode` streams
* `encoding` {string} If specified, then buffers will be decoded to
strings using the specified encoding. Defaults to `null`
* `objectMode` {boolean} Whether this stream should behave
Expand Down Expand Up @@ -2157,6 +2157,19 @@ object mode has an interesting side effect. Because it *is* a call to
However, because the argument is an empty string, no data is added to the
readable buffer so there is nothing for a user to consume.

### `highWaterMark` discrepency after calling `readable.setEncoding()`

The use of `readable.setEncoding()` will change the behavior of how the
`highWaterMark` operates in non-object mode.

Typically, the size of the current buffer is measured against the
`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
comparison function will begin to measure the buffer's size in _characters_.

This is not a problem in common cases with `latin1` or `ascii`. But it is
advised to be mindful about this behavior when working with strings that could
contain multi-byte characters.

[`'data'`]: #stream_event_data
[`'drain'`]: #stream_event_drain
[`'end'`]: #stream_event_end
Expand Down Expand Up @@ -2195,6 +2208,8 @@ readable buffer so there is nothing for a user to consume.
[fs write streams]: fs.html#fs_class_fs_writestream
[http-incoming-message]: http.html#http_class_http_incomingmessage
[zlib]: zlib.html
[hwm-gotcha]: #stream_highWaterMark_discrepency_after_calling_readable_setencoding
[Readable]: #stream_class_stream_readable
[stream-_flush]: #stream_transform_flush_callback
[stream-_read]: #stream_readable_read_size_1
[stream-_transform]: #stream_transform_transform_chunk_encoding_callback
Expand Down

0 comments on commit 6e8ecc2

Please sign in to comment.