-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: document switchover from measured bytes to chars after setEncoding #13442
Conversation
doc/api/stream.md
Outdated
@@ -66,7 +66,7 @@ buffer that can be retrieved using `writable._writableState.getBuffer()` or | |||
|
|||
The amount of data potentially buffered depends on the `highWaterMark` option | |||
passed into the streams constructor. For normal streams, the `highWaterMark` | |||
option specifies a total number of bytes. For streams operating in object mode, | |||
option specifies a [total number of bytes][hwm-gotcha]. For streams operating in object mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this line exceeds 80 characters limit.
doc/api/stream.md
Outdated
@@ -101,6 +101,9 @@ Because data may be written to the socket at a faster or slower rate than data | |||
is received, it is important for each side to operate (and buffer) independently | |||
of the other. | |||
|
|||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these empty lines are unintentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
doc/api/stream.md
Outdated
@@ -1965,6 +1968,19 @@ has an interesting side effect. Because it *is* a call to | |||
However, because the argument is an empty string, no data is added to the | |||
readable buffer so there is nothing for a user to consume. | |||
|
|||
### `highWaterMark` discrepency after calling `readable.setEncoding()` | |||
|
|||
The use of `readable.setEncoding()` will change the behaviour of how the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are unofficially targeting US English, so s/behaviour/behavior.
doc/api/stream.md
Outdated
comparison function will begin to measure the buffer's size in characters. | ||
|
||
This is not a problem in common cases with `utf8` or `ascii`. But it is | ||
advised to be mindful about this behaviour when working with long strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here.
doc/api/stream.md
Outdated
The use of `readable.setEncoding()` will change the behaviour of how the | ||
`highWaterMark` operates in non-object mode. | ||
|
||
Atypically, the size of the current buffer is measured against the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'atypically' here should be 'typically'?
Also, the first line of the commit message is too long. |
0231823
to
9205f3f
Compare
@jessicaquynh This needs to be rebased, sorry for the delay. |
Ping @jessicaquynh this needs a rebase |
@BridgeAR Sorry! Didn't realize that was on me. Thank you! |
doc/api/stream.md
Outdated
the internal buffer before ceasing to read from the underlying | ||
resource. Defaults to `16384` (16kb), or `16` for `objectMode` streams | ||
* `encoding` {string} If specified, then buffers will be decoded to | ||
* `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the casing for the parameter type should stay the same.
doc/api/stream.md
Outdated
* `highWaterMark` {Number} The maximum [number of bytes][hwm-gotcha] to store | ||
in the internal buffer before ceasing to read from the underlying resource. | ||
Defaults to `16384` (16kb), or `16` for `objectMode` streams | ||
* `encoding` {String} If specified, then buffers will be decoded to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
doc/api/stream.md
Outdated
|
||
Typically, the size of the current buffer is measured against the | ||
`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the | ||
comparison function will begin to measure the buffer's size in characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps italicize the word 'characters' here as well since it is done for the word 'bytes' in the previous sentence.
doc/api/stream.md
Outdated
`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the | ||
comparison function will begin to measure the buffer's size in characters. | ||
|
||
This is not a problem in common cases with `utf8` or `ascii`. But it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence seems a little confusing to me since we're comparing a multi-byte encoding and a single-byte encoding here. The way the sentence reads to me is that 'ascii' and 'utf8' are common cases and I do not need to worry about them, which is not true for multi-byte utf8 characters. If we're going to name some example encodings, you might replace 'utf8' with 'latin1' instead.
doc/api/stream.md
Outdated
|
||
This is not a problem in common cases with `utf8` or `ascii`. But it is | ||
advised to be mindful about this behavior when working with long strings | ||
with special characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to be more specific about what 'special' means, which is 'multi-byte'.
Also, the length of the string has no bearing on this particular problem. You could have a single character that spans 4 bytes for utf8 for example.
52246a0
to
d0094d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the comment addressed.
doc/api/stream.md
Outdated
|
||
This is not a problem in common cases with `latin1` or `ascii`. But it is | ||
advised to be mindful about this behavior when working strings that contain | ||
multi-byte characters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence does not seem to be complete. I guess there is a with
missing after "working". I would also add a could
after "that" as not every string would likely be an issue.
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue nodejs#6798 that after calling .setEncoding() on the stream, it will thereafter begin to measure the buffer's length in characters.
Landed in 89f2074 @jessicaquynh thanks a lot for your contribution |
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: nodejs#13442 Refs: nodejs#6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: #13442 Refs: #6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Checklist
Ref: #6798
This is documentation PR to denote the change of measurement against the highWaterMark in non-object streams after
.setEncoding()
is called. It changes from bytes to character.Affected core subsystem(s)
doc