Increase the performance and simplify the logic of Buffer#write{U}Int*
and Buffer#read{U}Int* methods by placing the byte manipulation code
directly inline.
Also improve the speed of buffer-write benchmarks by creating a new
call directly to each method by using Function() instead of calling by
buff[fn].
Signed-off-by: Trevor Norris <trev.norris@gmail.com>
Since the SlabAllocator was removed the buffer length/offset is no
longer sent to the onread callback. The benchmarks have been updated to
reflect that.
This commit adds an optimization to the HTTP client that makes it
possible to:
* Pack the headers and the first chunk of the request body into a
single write().
* Pack the chunk header and the chunk itself into a single write().
Because only one write() system call is issued instead of several,
the chances of data ending up in a single TCP packet are phenomenally
higher: the benchmark with `type=buf size=32` jumps from 50 req/s to
7,500 req/s, a 150-fold increase.
This commit removes the check from e4b716ef that pushes binary encoded
strings into the slow path. The commit log mentions that:
We were assuming that any string can be concatenated safely to
CRLF. However, for hex, base64, or binary encoded writes, this
is not the case, and results in sending the incorrect response.
For hex and base64 strings that's certainly true but binary strings
are 'das Ding an sich': string.length is the same before and after
decoding.
Fixes#5528.
The benchmark compare would drop the last run of the binary pairs. So
when they were only run once an error would arise because no data was
generated for the second binary.
The benefits of the hot-path optimization below start to fall off when
the buffer size gets up near 128KB, because the cost of the copy is more
than the cost of the extra write() call. Switch to the write/end method
at that point.
Heuristics and magic numbers are awful, but slow http responses are
worse.
Fix#4975
This will run the benchmarks the number of times specified by NODE_BENCH_RUNS,
to attempt to reduce variability.
If the number of runs is high enough, it'll also throw out the top and bottom
quartiles, since that's where the outliers will be.
It's not very fancy statistics-fu, but it's better than nothing.
Also, linted this file. It had tabs in it. TABS!
For throughput benchmarks, run with just 5s durations rather than 1s and 3s.
For startup benchmark, run with just a single 1s duration, since it's very
consistent anyway.