v8 will silently return an empty handle
which doesn't delete our data if string length is
above String::kMaxLength
Fixes: https://github.com/nodejs/node/issues/1374
PR-URL: https://github.com/nodejs/node/pull/2402
Reviewed-By: trevnorris - Trevor Norris <trev.norris@gmail.com>
Reviewed-By: indutny - Fedor Indutny <fedor.indutny@gmail.com>
Reviewed-By: bnoordhuis - Ben Noordhuis <info@bnoordhuis.nl>
Amended by @rvagg to change author date from
"1970-08-16 16:09:02 +0200"
to
"2015-08-16 16:09:02 +0200"
as per discussion @ https://github.com/nodejs/node/issues/2713
Address comments and deprecations left in source files. These changes
include:
* Remove the deprecated API.
* Change Buffer::New() that did a copy of the data to Buffer::Copy()
* Change Buffer::Use() to Buffer::New()
PR-URL: https://github.com/nodejs/io.js/pull/1825
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Instead of aborting in case of internal failure, return an empty
Local<Object>. Using the MaybeLocal<T> API, users must check their
return values.
PR-URL: https://github.com/nodejs/io.js/pull/1825
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Make the inner loop execute fewer compare-and-branch executions per
processed byte, resulting in a 50% or more speedup.
This coincidentally fixes an out-of-bounds read:
while (unbase64(*src) < 0 && src < srcEnd)
Should have read:
while (src < srcEnd && unbase64(*src) < 0)
But this commit removes the offending code altogether.
Fixes: https://github.com/nodejs/io.js/issues/2166
PR-URL: https://github.com/nodejs/io.js/pull/2193
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
StringBytes::Write() did a plain memcpy() when is_extern is true but
that's wrong when the source is a two-byte string and the destination
a one-byte or UTF-8 string.
The impact is limited to strings > 1,031,913 bytes because those are
normally the only strings that are externalized, although the use of
the 'externalize strings' extension (--expose_externalize_string) can
also trigger it.
This commit also cleans up the bytes versus characters confusion in
StringBytes::Write() because that was closely intertwined with the
UCS-2 encoding regression. One wasn't fixable without the other.
Fixes: https://github.com/iojs/io.js/issues/1024
Fixes: https://github.com/joyent/node/issues/8683
PR-URL: https://github.com/iojs/io.js/pull/1042
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Make StringBytes::GetExternalParts() return the byte length for two-byte
strings, not the character length. Its callers operate on bytes, not
characters.
This also fixes StringBytes::Size() reporting only half of the actual
number of bytes for external two-byte strings.
PR-URL: https://github.com/iojs/io.js/pull/1042
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Introduced in joyent/node v0.10 as a backwards compatibility measure.
It's an ugly hack and allowing invalid UTF-8 is not a good idea in the
first place, remove it.
PR-URL: https://github.com/iojs/io.js/pull/1042
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Large external two-byte strings reported their character length instead
of their byte length, throwing off the garbage collector heuristic by
a factor of two.
PR-URL: https://github.com/iojs/io.js/pull/1042
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Mark several methods "override" in order to remove build warnings.
PR-URL: https://github.com/iojs/io.js/pull/531
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
The copyright and license notice is already in the LICENSE file. There
is no justifiable reason to also require that it be included in every
file, since the individual files are not individually distributed except
as part of the entire package.
Due to a recent V8 upgrade, more methods require Isolate as an argument.
PR-URL: https://github.com/iojs/io.js/pull/244
Reviewed-by: Ben Noordhuis <info@bnoordhuis.nl>
Move the big endian to little endian conversion logic for UCS2 input
from src/string_bytes.cc to src/node_buffer.cc; StringSlice() is the
only function that actually needs it and with this commit, a second
copy is avoided on big endian architectures.
Introduce two-byte overloads of node::Encode() and StringBytes::Encode()
that ensure that the input is suitably aligned.
Revisits commit 535fec8 from yesterday.
Seen with g++ 4.9.2 on x86_64 Linux: a SIGSEGV is generated when the
input to v8::String::NewFromTwoByte() is not suitably aligned.
g++ 4.9.2 emits SSE instructions for copy loops. That requires aligned
input but that was something StringBytes::Encode() did not enforce until
now. Make a properly aligned copy before handing off the input to V8.
We could, as an optimization, check that the pointer is aligned on a
two-byte boundary but that is technically still UB; pointers-to-char
are allowed to alias other pointers but the reverse is not true:
a pointer-to-uint16_t that aliases a pointer-to-char is in violation
of the pointer aliasing rules.
See https://code.google.com/p/v8/issues/detail?id=3694
Fixes segfaulting test simple/test-stream2-writable.
PR-URL: https://github.com/iojs/io.js/pull/127
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
* v8::Platform has a new MonotonicallyIncreasingTime() method,
implement it.
* The ASCII apocalypse continues with the replacement of external
ASCII strings with external one byte strings.
The previous commits fixed oversights in destructors that should have
been marked virtual but weren't. This commit marks destructors from
derived classes with the override keyword.
Now that we are building with C++11 features enabled, replace use
of NULL with nullptr.
The benefit of using nullptr is that it can never be confused for
an integral type because it does not support implicit conversions
to integral types except boolean - unlike NULL, which is defined
as a literal `0`.
Mechanically replace assert() statements with UNREACHABLE(), CHECK(),
or CHECK_{EQ,NE,LT,GT,LE,GE}() statements.
The exceptions are src/node.h and src/node_object_wrap.h because they
are public headers.
PR-URL: https://github.com/node-forward/node/pull/16
Reviewed-By: Fedor Indutny <fedor@indutny.com>
Previously v8's WriteUtf8 function would produce invalid utf-8 output
when encountering unmatched surrogate code units [1]. The new
REPLACE_INVALID_UTF8 option fixes that by replacing invalid code points
with the unicode replacement character.
[1]: JS Strings are defined as arrays of 16 bit unsigned integers. There
is no unicode enforcement, so one can easily end up with invalid unicode
code unit sequences inside a string.
64bit constants are keyed for x64 platforms only, add PowerPC based
platform constants.
Node's "ucs2" encoding wants LE character data stored in the Buffer, so
we need to reorder on BE platforms. See
http://nodejs.org/api/buffer.html regarding Node's "ucs2" encoding
specification
Signed-off-by: Timothy J Fontaine <tjfontaine@gmail.com>
Don't call DecodeWrite() with a Buffer as its argument because it in
turn calls StringBytes::Write() and that method expects a Local<String>.
"Why then does that function take a Local<Value>?" I hear you ask.
Good question but I don't have the answer. I added a CHECK for good
measure and what do you know, all of a sudden a large number of crypto
tests started failing.
Calling DecodeWrite(BINARY) on a buffer is nonsensical anyway: if you
want the contents of the buffer, just copy out the data, there is no
need to decode it - and that's exactly what this commit does.
Fixes a great many instances of the following run-time error in debug
builds:
FATAL ERROR: v8::String::Cast() Could not convert to string
Make calls to v8::Isolate::AdjustAmountOfExternalAllocatedMemory() take
special care when negating 32 bits unsigned types like size_t.
Before this commit, values were negated before they got promoted to
64 bits, meaning that on 32 bits architectures, a value like 42 got
cast to 4294967254 instead of -42.
That in turn made the garbage collector start scavenging like crazy
because it thought the system was out of memory.
That's bad enough but calls to AdjustAmountOfExternalAllocatedMemory()
were made from weak callbacks, i.e. at a time when the garbage collector
was already busy. It triggered asserts in debug builds and caused
random crashes and memory corruption in release builds.
The behavior in release builds is arguably a V8 bug and should perhaps
be reported upstream.
Partially fixes#7309 but requires further bug fixes to src/smalloc.cc
that I'll address in a follow-up commit.
The variable isn't actually used uninitialized but g++ 4.8 doesn't know
that. Set it to NULL to silence the following compiler warning:
../src/string_bytes.cc:247:29: warning: 'data' may be used
uninitialized in this function [-Wmaybe-uninitialized]
unsigned a = hex2bin(src[i * 2 + 0]);
^
../src/string_bytes.cc:299:15: note: 'data' was declared here
const char* data;
^
This commit removes the simple/test-event-emitter-memory-leak test for
being unreliable with the new garbage collector: the memory pressure
exerted by the test case is too low for the garbage collector to kick
in. It can be made to work again by limiting the heap size with the
--max_old_space_size=x flag but that won't be very reliable across
platforms and architectures.
If the string is external then the length can be quickly retrieved. This
is especially faster for large strings that are being treated as UTF8.
Also, if the string is external then there's no need for a full
String::WriteUtf8 operation. A simple memcpy will do.
* Change calls to String::New() and String::NewSymbol() to their
respective one-byte, two-byte and UTF-8 counterparts.
* Add a FIXED_ONE_BYTE_STRING macro that takes a string literal and
turns it into a v8::Local<v8::String>.
* Add helper functions that make v8::String::NewFromOneByte() easier to
work with. Said function expects a `const uint8_t*` but almost every
call site deals with `const char*` or `const unsigned char*`. Helps
us avoid doing reinterpret_casts all over the place.
* Code that handles file system paths keeps using UTF-8 for backwards
compatibility reasons. At least now the use of UTF-8 is explicit.
* Remove v8::String::NewSymbol() entirely. Almost all call sites were
effectively minor de-optimizations. If you create a string only once,
there is no point in making it a symbol. If you are create the same
string repeatedly, it should probably be cached in a persistent
handle.
Performs a quick, non-exhaustive check on the input string to see if
it's compatible with the specified string encoding.
Curently it only checks that hex strings have a length that is a
multiple of two.
Prior, strings would first be converted to a Buffer before being written
to disk. Now the intermediary step has been removed.
Other changes of note:
* Class member "must_free" was added to req_wrap so to track if the
memory needs to be manually cleaned up after use.
* External String Resource support, so the memory will be used directly
instead of copying out the data.
* Docs have been updated to reflect that if position is not a number
then it will assume null. Previously it specified the argument must be
null, but that was not how the code worked. An attempt was made to
only support == null, but there were too many tests that assumed !=
number would be enough.
* Docs update show some of the write/writeSync arguments are optional.
Memory allocations are now done through smalloc. The Buffer cc class has
been removed completely, but for backwards compatibility have left the
namespace as Buffer.
The .parent attribute is only set if the Buffer is a slice of an
allocation. Which is then set to the alloc object (not a Buffer).
The .offset attribute is now a ReadOnly set to 0, for backwards
compatibility. I'd like to remove it in the future (pre v1.0).
A few alterations have been made to how arguments are either coerced or
thrown. All primitives will now be coerced to their respective values,
and (most) all out of range index requests will throw.
The indexes that are coerced were left for backwards compatibility. For
example: Buffer slice operates more like Array slice, and coerces
instead of throwing out of range indexes. This may change in the future.
The reason for wanting to throw for out of range indexes is because
giving js access to raw memory has high potential risk. To mitigate that
it's easier to make sure the developer is always quickly alerted to the
fact that their code is attempting to access beyond memory bounds.
Because SlowBuffer will be deprecated, and simply returns a new Buffer
instance, all tests on SlowBuffer have been removed.
Heapdumps will now show usage under "smalloc" instead of "Buffer".
ParseArrayIndex was added to node_internals to support proper uint
argument checking/coercion for external array data indexes.
SlabAllocator had to be updated since handle_ no longer exists.
When large strings are used they cause v8's GC to spend a lot more time
cleaning up. In these cases it's much faster to use external string
resources.
UTF8 strings do not use external string resources because only one and
two byte external strings are supported.
EXTERN_APEX is the value at which v8's GC overtakes performance.
The following table has the type and buffer size that use to encode the
strings as rough estimates of the percentage of performance gain from
this patch (UTF8 is missing because they cannot be externalized).
encoding 128KB 1MB 5MB
-----------------------------
ASCII 58% 208% 250%
HEX 15% 74% 86%
BASE64 11% 74% 71%
UCS2 2% 225% 398%
BINARY 2234% 1728% 2305%
BINARY is so much faster across the board because of using the new v8
WriteOneByte API.
v8 has a new API to write out strings to memory. This has been
implemented.
One other change of note is BINARY encoded strings have a new
implementation. This has improved performance substantially.
Because of variations in different base64 implementation, it's been
decided to strip all padding from the end of a base64 string and
calculate its size from that.