doc: document bytes to chars after setEncoding

This commit documents and edge-case behavior in readable streams. It is expected that non-object streams are measured in bytes against the highWaterMark. However, it was discovered in issue thereafter begin to measure the buffer's length in characters. PR-URL: https://github.com/nodejs/node/pull/13442 Refs: https://github.com/nodejs/node/issues/6798 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Vse Mozhet Byt <vsemozhetbyt@gmail.com> Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
8 years ago · feb6863a5c
1 changed files with 20 additions and 5 deletions
--- a/doc/api/stream.md
+++ b/doc/api/stream.md
@ -66,8 +66,8 @@ buffer that can be retrieved using `writable._writableState.getBuffer()` or

 The amount of data potentially buffered depends on the `highWaterMark` option
 passed into the streams constructor. For normal streams, the `highWaterMark`
-option specifies a total number of bytes. For streams operating in object mode,
-the `highWaterMark` specifies a total number of objects.
+option specifies a [total number of bytes][hwm-gotcha]. For streams operating
+in object mode, the `highWaterMark` specifies a total number of objects.

 Data is buffered in Readable streams when the implementation calls
 [`stream.push(chunk)`][stream-push]. If the consumer of the Stream does not
@ -1415,9 +1415,9 @@ constructor and implement the `readable._read()` method.
 #### new stream.Readable([options])

 * `options` {Object}
-  * `highWaterMark` {number} The maximum number of bytes to store in
-    the internal buffer before ceasing to read from the underlying
-    resource. Defaults to `16384` (16kb), or `16` for `objectMode` streams
+  * `highWaterMark` {number} The maximum [number of bytes][hwm-gotcha] to store
+    in the internal buffer before ceasing to read from the underlying resource.
+    Defaults to `16384` (16kb), or `16` for `objectMode` streams
  * `encoding` {string} If specified, then buffers will be decoded to
    strings using the specified encoding. Defaults to `null`
  * `objectMode` {boolean} Whether this stream should behave
@ -2031,6 +2031,19 @@ has an interesting side effect. Because it *is* a call to
 However, because the argument is an empty string, no data is added to the
 readable buffer so there is nothing for a user to consume.

+### `highWaterMark` discrepency after calling `readable.setEncoding()`
+
+The use of `readable.setEncoding()` will change the behavior of how the
+`highWaterMark` operates in non-object mode.
+
+Typically, the size of the current buffer is measured against the
+`highWaterMark` in _bytes_. However, after `setEncoding()` is called, the
+comparison function will begin to measure the buffer's size in _characters_.
+
+This is not a problem in common cases with `latin1` or `ascii`. But it is
+advised to be mindful about this behavior when working with strings that could
+contain multi-byte characters.
+
 [`'data'`]: #stream_event_data
 [`'drain'`]: #stream_event_drain
 [`'end'`]: #stream_event_end
@ -2065,6 +2078,8 @@ readable buffer so there is nothing for a user to consume.
 [http-incoming-message]: http.html#http_class_http_incomingmessage
 [Readable]: #stream_class_stream_readable
 [zlib]: zlib.html
+[hwm-gotcha]: #stream_highWaterMark_discrepency_after_calling_readable_setencoding
+[Readable]: #stream_class_stream_readable
 [stream-_flush]: #stream_transform_flush_callback
 [stream-_read]: #stream_readable_read_size_1
 [stream-_transform]: #stream_transform_transform_chunk_encoding_callback