If the string is external then the length can be quickly retrieved. This
is especially faster for large strings that are being treated as UTF8.
Also, if the string is external then there's no need for a full
String::WriteUtf8 operation. A simple memcpy will do.
* Change calls to String::New() and String::NewSymbol() to their
respective one-byte, two-byte and UTF-8 counterparts.
* Add a FIXED_ONE_BYTE_STRING macro that takes a string literal and
turns it into a v8::Local<v8::String>.
* Add helper functions that make v8::String::NewFromOneByte() easier to
work with. Said function expects a `const uint8_t*` but almost every
call site deals with `const char*` or `const unsigned char*`. Helps
us avoid doing reinterpret_casts all over the place.
* Code that handles file system paths keeps using UTF-8 for backwards
compatibility reasons. At least now the use of UTF-8 is explicit.
* Remove v8::String::NewSymbol() entirely. Almost all call sites were
effectively minor de-optimizations. If you create a string only once,
there is no point in making it a symbol. If you are create the same
string repeatedly, it should probably be cached in a persistent
handle.
Performs a quick, non-exhaustive check on the input string to see if
it's compatible with the specified string encoding.
Curently it only checks that hex strings have a length that is a
multiple of two.
Prior, strings would first be converted to a Buffer before being written
to disk. Now the intermediary step has been removed.
Other changes of note:
* Class member "must_free" was added to req_wrap so to track if the
memory needs to be manually cleaned up after use.
* External String Resource support, so the memory will be used directly
instead of copying out the data.
* Docs have been updated to reflect that if position is not a number
then it will assume null. Previously it specified the argument must be
null, but that was not how the code worked. An attempt was made to
only support == null, but there were too many tests that assumed !=
number would be enough.
* Docs update show some of the write/writeSync arguments are optional.
Memory allocations are now done through smalloc. The Buffer cc class has
been removed completely, but for backwards compatibility have left the
namespace as Buffer.
The .parent attribute is only set if the Buffer is a slice of an
allocation. Which is then set to the alloc object (not a Buffer).
The .offset attribute is now a ReadOnly set to 0, for backwards
compatibility. I'd like to remove it in the future (pre v1.0).
A few alterations have been made to how arguments are either coerced or
thrown. All primitives will now be coerced to their respective values,
and (most) all out of range index requests will throw.
The indexes that are coerced were left for backwards compatibility. For
example: Buffer slice operates more like Array slice, and coerces
instead of throwing out of range indexes. This may change in the future.
The reason for wanting to throw for out of range indexes is because
giving js access to raw memory has high potential risk. To mitigate that
it's easier to make sure the developer is always quickly alerted to the
fact that their code is attempting to access beyond memory bounds.
Because SlowBuffer will be deprecated, and simply returns a new Buffer
instance, all tests on SlowBuffer have been removed.
Heapdumps will now show usage under "smalloc" instead of "Buffer".
ParseArrayIndex was added to node_internals to support proper uint
argument checking/coercion for external array data indexes.
SlabAllocator had to be updated since handle_ no longer exists.
When large strings are used they cause v8's GC to spend a lot more time
cleaning up. In these cases it's much faster to use external string
resources.
UTF8 strings do not use external string resources because only one and
two byte external strings are supported.
EXTERN_APEX is the value at which v8's GC overtakes performance.
The following table has the type and buffer size that use to encode the
strings as rough estimates of the percentage of performance gain from
this patch (UTF8 is missing because they cannot be externalized).
encoding 128KB 1MB 5MB
-----------------------------
ASCII 58% 208% 250%
HEX 15% 74% 86%
BASE64 11% 74% 71%
UCS2 2% 225% 398%
BINARY 2234% 1728% 2305%
BINARY is so much faster across the board because of using the new v8
WriteOneByte API.
v8 has a new API to write out strings to memory. This has been
implemented.
One other change of note is BINARY encoded strings have a new
implementation. This has improved performance substantially.
Because of variations in different base64 implementation, it's been
decided to strip all padding from the end of a base64 string and
calculate its size from that.
The default encoding is 'buffer'. When the input is a string, treat it
as 'binary'. Fixes the following assertion:
node: ../src/string_bytes.cc:309: static size_t
node::StringBytes::StorageSize(v8::Handle<v8::Value>, node::encoding):
Assertion `0 && "buffer encoding specified but string provided"'
failed.
Introduced in 64fc34b2.
Fixes#5482.