In some cases, the http CONNECT/Upgrade API is unshifting an empty
bodyHead buffer onto the socket.
Normally, stream.unshift(chunk) does not set state.reading=false.
However, this check was not being done for the case when the chunk was
empty (either `''` or `Buffer(0)`), and as a result, it was causing the
socket to think that a read had completed, and to stop providing data.
This bug is not limited to http or web sockets, but rather would affect
any parser that unshifts data back onto the source stream without being
very careful to never unshift an empty chunk. Since the intent of
unshift is to *not* change the state.reading property, this is a bug.
Fixes#5557FixesLearnBoost/socket.io#1242
Pretty much everything assumes strings to be utf-8, but crypto
traditionally used binary strings, so we need to keep the default
that way until most users get off of that pattern.
If there is an encoding, and we do 'stream.push(chunk, enc)', and the
encoding argument matches the stated encoding, then we're converting from
a string, to a buffer, and then back to a string. Of course, this is a
completely pointless bit of work, so it's best to avoid it when we know
that we can do so safely.
Fix#5272
The consumption of a readable stream is a dance with 3 partners.
1. The specific stream Author (A)
2. The Stream Base class (B), and
3. The Consumer of the stream (C)
When B calls the _read() method that A implements, it sets a 'reading'
flag, so that parallel calls to _read() can be avoided. When A calls
stream.push(), B knows that it's safe to start calling _read() again.
If the consumer C is some kind of parser that wants in some cases to
pass the source stream off to some other party, but not before "putting
back" some bit of previously consumed data (as in the case of Node's
websocket http upgrade implementation). So, stream.unshift() will
generally *never* be called by A, but *only* called by C.
Prior to this patch, stream.unshift() *also* unset the state.reading
flag, meaning that C could indicate the end of a read, and B would
dutifully fire off another _read() call to A. This is inappropriate.
In the case of fs streams, and other variably-laggy streams that don't
tolerate overlapped _read() calls, this causes big problems.
Also, calling stream.shift() after the 'end' event did not raise any
kind of error, but would cause very strange behavior indeed. Calling it
after the EOF chunk was seen, but before the 'end' event was fired would
also cause weird behavior, and could lead to data being lost, since it
would not emit another 'readable' event.
This change makes it so that:
1. stream.unshift() does *not* set state.reading = false
2. stream.unshift() is allowed up until the 'end' event.
3. unshifting onto a EOF-encountered and zero-length (but not yet
end-emitted) stream will defer the 'end' event until the new data is
consumed.
4. pushing onto a EOF-encountered stream is now an error.
So, if you read(), you have that single tick to safely unshift() data
back into the stream, even if the null chunk was pushed, and the length
was 0.
In cases where a stream may have data added to the read queue before the
user adds a 'readable' event, there is never any indication that it's
time to start reading.
True, there's already data there, which the user would get if they
checked However, as we use 'readable' event listening as the signal to
start the flow of data with a read(0) call internally, we ought to
trigger the same effect (ie, emitting a 'readable' event) even if the
'readable' listener is added after the first emission.
To avoid confusing weirdness, only the *first* 'readable' event listener
is granted this privileged status. After we've started the flow (or,
alerted the consumer that the flow has started) we don't need to start
it again. At that point, it's the consumer's responsibility to consume
the stream.
Closes#5141
Also, set paused=false *before* calling resume(). Otherwise,
there's an edge case where an immediately-emitted chunk might make
it call pause() again incorrectly.
This solves the problem of calling `readable.pipe(writable)` after the
readable stream has already emitted 'end', as often is the case when
writing simple HTTP proxies.
The spirit of streams2 is that things will work properly, even if you
don't set them up right away on the first tick.
This approach breaks down, however, because pipe()ing from an ended
readable will just do nothing. No more data will ever arrive, and the
writable will hang open forever never being ended.
However, that does not solve the case of adding a `on('end')` listener
after the stream has received the EOF chunk, if it was the first chunk
received (and thus, length was 0, and 'end' got emitted). So, with
this, we defer the 'end' event emission until the read() function is
called.
Also, in pipe(), if the source has emitted 'end' already, we call the
cleanup/onend function on nextTick. Piping from an already-ended stream
is thus the same as piping from a stream that is in the process of
ending.
Updates many tests that were relying on 'end' coming immediately, even
though they never read() from the req.
Fix#4942
In the function that pre-emptively fills the Readable queue, it relies
on a recursion through:
stream.push(chunk) ->
maybeReadMore(stream, state) ->
if (not reading more and < hwm) stream.read(0) ->
stream._read() ->
stream.push(chunk) -> repeat.
Since this was only calling read() a single time, and then relying on a
future nextTick to collect more data, it ends up causing a nextTick
recursion error (and potentially a RangeError, even) if you have a very
high highWaterMark, and are getting very small chunks pushed
synchronously in _read (as happens with TLS, or many simple test
streams).
This change implements a new approach, so that read(0) is called
repeatedly as long as it is effective (that is, the length keeps
increasing), and thus quickly fills up the buffer for streams such as
these, without any stacks overflowing.
Now that highWaterMark increases when there are large reads, this
greatly reduces the number of calls necessary to _read(size), assuming
that _read actually respects the size argument.
When a readable listener is added, call read(0) so that data will flow in, up to
the high water mark.
Otherwise, it's somewhat confusing that you have to listen for readable,
and ALSO call read() (when it will certainly return null) just to get some
data out of the stream.
See: #4720
Ability to return just the length of listeners for a given type, using
EventEmitter.listenerCount(emitter, event). This will be a lot cheaper
than creating a copy of the listeners array just to check its length.
This makes it so that `stream.push(chunk)` is the only way to signal the
end of reading, removing the confusing disparity between the
callback-style _read method, and the fact that most real-world streams
do not have a 1:1 corollation between the "please give me data" event,
and the actual arrival of a chunk of data.
It is still possible, of course, to implement a `CallbackReadable` on
top of this. Simply provide a method like this as the callback:
function readCallback(er, chunk) {
if (er)
stream.emit('error', er);
else
stream.push(chunk);
}
However, *only* fs streams actually would behave in this way, so it
makes not a lot of sense to make TCP, TLS, HTTP, and all the rest have
to bend into this uncomfortable paradigm.
A primary motivation of this is to make the onread function more
inline-friendly, but also to make it more easy to explore not having
onread at all, in favor of always using push() to signal the end of
reading.
The Readable and Writable classes will nextTick certain things
if in sync mode. The sync flag gets unset after a call to _read
or _write. However, most of these behaviors should also be
deferred until nextTick if no reads have been made (for example,
the automatic '_read up to hwm' behavior on Readable.push(chunk))
Set the sync flag to true in the constructor, so that it will not
trigger an immediate 'readable' event, call to _read, before the
user has had a chance to set a _read method implementation.
There are cases where a push() call would return true, even though
the thing being pushed was in fact way way larger than the high
water mark, simply because the 'needReadable' was already set, and
would not get unset until nextTick.
In some cases, this could lead to an infinite loop of pushing data
into the buffer, never getting to the 'readable' event which would
unset the needReadable flag.
Fix by splitting up the emitReadable function, so that it always
sets the flag on this tick, even if it defers until nextTick to
actually emit the event.
Also, if we're not ending or already in the process of reading, it
now calls read(0) if we're below the high water mark. Thus, the
highWaterMark value is the intended amount to buffer up to, and it
is smarter about hitting the target.
It seems like a good idea on the face of it, but lowWaterMarks are
actually not useful, and in practice should always be set to zero.
It would be worthwhile for writers if we actually did some kind of
writev() type of thing, but actually this just delays calling write()
and the overhead of doing a bunch of Buffer copies is not worth the
slight benefit of calling write() fewer times.
This is causing the CryptoStreams to get into an awful state when
there is a tight loop calling connection.write(chunk) waiting for
a false return.
Because CryptoStreams use read(0) to cycle data, this was causing
the encrypted side to pull way too much data in from the cleartext
side, since the read(0) would make it always call _read.
The unfortunate side effect, fixed in the next patch, is that
CryptoStreams don't automatically cycle when the Socket drains.
Otherwise sockets that are 'finish'ed won't be unpiped and `writing to
ended stream` error will arise.
This might sound unrealistic, but it happens in net.js. When
`socket.allowHalfOpen === false`, EOF will cause `.destroySoon()` call which
ends the writable side of net.Socket.
Those values, if passed to the _read() cb, will not signal an EOF. Only
null or undefined will mark the end of data, and trigger the end event.
However, great care must be taken if you are returning an empty string
or buffer! There must be some other thing somewhere that will trigger
a read() call, because there will never be a readable event fired later.
This is in preparation for CryptoStreams being ported to streams2, where
it is safe to simply stop reading, because the crypto cycle process will
cause it to read(0) again at some future date.
We detect for non-string and non-buffer values in onread and
turn the stream into an "objectMode" stream.
If we are in "objectMode" mode then howMuchToRead will
always return 1, state.length will always have 1 appended
to it when there is a new item and fromList always takes
the first value from the list.
This means that for object streams, the n in read(n) is
ignored and read() will always return a single value
Fixed a bug with unpipe where the pipe would break because
the flowing state was not reset to false.
Fixed a bug with sync cb(null, null) in _read which would
forget to end the readable stream
Problem 1: If stream.push() triggers a 'readable' event, and the user
calls `read(n)` with some n > the highWaterMark, then the push() will
return false (indicating that they should not push any more), but no
future 'readable' event is coming (because we're above the
highWaterMark).
Solution: return true from push() when needReadable is set.
Problem 2: A read(n) for n != 0, after the stream had encountered an
EOF, would not trigger the 'end' event if the EOF was pushed in
synchronously by the _read() function.
Solution: Check for ended in stream.read() and schedule an end event if
the length now equals 0.
Fix#4585
There was previously an assert() in there, but this part of the code is
so high-volume that the added cost made a measurable dent in http_simple.
Just checking inline is fine, though, and prevents a lot of potential
hazards.
Say that a stream's current read queue has 101 bytes in it, and the
underlying resource has ended (ie, reached EOF).
If you do something like this:
stream.read(100); // leave a byte behind
stream.read(0); // read(0) for some reason
then the read(0) will get 0 from the howMuchToRead function. Since the
stream was ended, this was incorrectly treating the 0 as a "there is no
more in the buffer", and emitting 'end' before that last byte was read.
Why have the read(0) in the first place? We do this in some cases to
trigger the last few bytes of a net socket (such as a child process's
stdio pipes). This was causing issues when piping a `git archive` job
to a file: the resulting tarball was incomplete, because it occasionally
was not getting the last chunk.
When switching into compatibility mode by setting `data` event listener,
`_read()` method will be called immediately. If method implementation
invokes callback in the same tick - all emitted `data` events will be
discarded, because `data` listener wasn't set yet.