Provide an (initially experimental) implementation of the WHATWG Encoding
Standard API (`TextDecoder` and `TextEncoder`). The is the same API
implemented on the browser side.
By default, with small-icu, only the UTF-8, UTF-16le and UTF-16be decoders
are supported. With full-icu enabled, every encoding other than iso-8859-16
is supported.
This provides a basic test, but does not include the full web platform
tests. Note: many of the web platform tests for this would fail by default
because we ship with small-icu by default.
A process warning will be emitted on first use to indicate that the
API is still experimental. No runtime flag is required to use the
feature.
Refs: https://encoding.spec.whatwg.org/
PR-URL: https://github.com/nodejs/node/pull/13644
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
- Categorize all nonspacing marks (Mn) and enclosing marks (Me) as
0-width
- Categorize all spacing marks (Mc) as non-0-width.
- Treat soft hyphens (a format character Cf) as non-0-width.
- Do not treat all unassigned code points as 0-width; instead, let ICU
select the default for that character per UAX #11.
- Avoid getting the General_Category of a character multiple times as it
is an intensive operation.
Refs: http://unicode.org/reports/tr11/
PR-URL: https://github.com/nodejs/node/pull/13918
Reviewed-By: James M Snell <jasnell@gmail.com>
PR-URL: https://github.com/nodejs/node/pull/13085
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Daniel Bevenius <daniel.bevenius@gmail.com>
This commit contains three separate changes:
- Always return a string from ToUnicode no matter if an error occurred.
- Disable CheckHyphens boolean flag. This flag will soon be enabled in
the URL Standard, but is implemented manually by selectively ignoring
certain errors.
- Enable CheckBidi boolean flag per URL Standard update.
This allows domain names with hyphens at 3 and 4th position, as well as
those with leading and trailing hyphens. They are technically invalid,
but seen in the wild.
Tests are updated and simplified accordingly.
PR-URL: https://github.com/nodejs/node/pull/12966
Fixes: https://github.com/nodejs/node/issues/12965
Refs: https://github.com/whatwg/url/pull/309
Reviewed-By: Daijiro Wachi <daijiro.wachi@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Call uc_init() after u_setDataDirectory() to find out if the data
directory is actually valid.
This commit removes parallel/test-intl-no-icu-data, added in commit
46345b9 ("src: make --icu-data-dir= switch testable"). It no longer
works now that an invalid --icu-data-dir= argument is rejected.
Coverage is now provided by parallel/test-icu-data-dir.
Fixes: https://github.com/nodejs/node/issues/13043
Refs: https://github.com/nodejs/node-gyp/issues/1199
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Use `static` definitions and anonymous namespaces to reduce the
number of symbols that are exported from the `node` binary.
PR-URL: https://github.com/nodejs/node/pull/12366
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
* We use these functions that are declared in <unicode/ustring.h>
u_strFromUTF8()
u_strToUTF8()
* At present, <unicode/ustring.h> is indirectly included, but this will
likely change in future ICUs. Adding this header has been the right
thing to do for many years, so it is backwards compatible.
Fixes: https://github.com/nodejs/node/issues/11753
PR-URL: https://github.com/nodejs/node/issues/11754
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Old behavior can be restored using a special `lenient` mode, as used in
the legacy URL parser.
PR-URL: https://github.com/nodejs/node/pull/11549
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
- Templatize AsBuffer() and create a generic version for inclusion in
the Buffer class
- Use MaybeStackBuffer::storage()
- If possible, avoid double conversion in ToASCII()/ToUnicode()
- More descriptive assertion error in tests
PR-URL: https://github.com/nodejs/node/pull/11464
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Move some code around so we can properly test whether the switch
actually does anything.
PR-URL: https://github.com/nodejs/node/pull/11255
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Mutations of the environment can invalidate pointers to environment
variables, so make `secure_getenv()` copy them out instead of returning
pointers.
PR-URL: https://github.com/nodejs/node/pull/11051
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Sam Roberts <vieuxtech@gmail.com>
Fix `buffer.transcode()` for transcoding from single-byte character
encodings to UCS2.
These would previously perform out-of-bounds reads due to an
incorrect `sizeof()` expression.
Fixes: https://github.com/nodejs/node/issues/9834
PR-URL: https://github.com/nodejs/node/pull/9838
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
* Adds process.versions.cldr, .tz, and .unicode
* Changes how process.versions.icu is loaded
* Lazy loads the process.versions.* values for these
* add an exception to util.js
to cause 'node -p process.versions' to still work
* update process.version docs
Fixes: https://github.com/nodejs/node/issues/9237
Add buffer.transcode(source, from, to) method. Primarily uses ICU
to transcode a buffer's content from one of Node.js' supported
encodings to another.
Originally part of a proposal to add a new unicode module. Decided
to refactor the approach towrds individual PRs without a new module.
Refs: https://github.com/nodejs/node/pull/8075
PR-URL: https://github.com/nodejs/node/pull/9038
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Rather than the pseudo-wcwidth impl used currently, use the ICU
character properties database to calculate string width and
determine if a character is full width or not. This allows the
algorithm to correctly identify emoji's as full width, ensures
the algorithm will continue to fucntion properly as new unicode
codepoints are added, and it's faster.
This was originally part of a proposal to add a new unicode module,
but has been split out.
Refs: https://github.com/nodejs/node/pull/8075
PR-URL: https://github.com/nodejs/node/pull/9040
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Steven R Loomis <srloomis@us.ibm.com>
Implements WHATWG URL support. Example:
```
var u = new url.URL('http://example.org');
```
Currently passing all WHATWG url parsing tests and all but two of the
setter tests. The two setter tests are intentionally skipped for now
but will be revisited.
PR-URL: https://github.com/nodejs/node/pull/7448
Reviewed-By: Ilkka Myller <ilkka.myller@nodefield.com>
ICU has a punycode implementation built in. Use it instead of the
javascript implementation because it's much faster.
PR-URL: https://github.com/nodejs/node/pull/7355
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
The copyright and license notice is already in the LICENSE file. There
is no justifiable reason to also require that it be included in every
file, since the individual files are not individually distributed except
as part of the entire package.
Now that we are building with C++11 features enabled, replace use
of NULL with nullptr.
The benefit of using nullptr is that it can never be confused for
an integral type because it does not support implicit conversions
to integral types except boolean - unlike NULL, which is defined
as a literal `0`.
The two main goals of this change are:
- To make it easier to build the Intl option using ICU (particularly,
using a newer ICU than v8/Chromium's version)
- To enable a much smaller ICU build with only English support The goal
here is to get node.js binaries built this way by default so that the
Intl API can be used. Additional data can be added at execution time
(see Readme and wiki)
More details are at https://github.com/joyent/node/pull/7719
In particular, this change adds the "--with-intl=" configure option to
provide more ways of building "Intl":
- "full-icu" picks up an ICU from deps/icu
- "small-icu" is similar, but builds only English
- "system-icu" uses pkg-config to find an installed ICU
- "none" does nothing (no Intl)
For Windows builds, the "full-icu" or "small-icu" options are added to
vcbuild.bat.
Note that the existing "--with-icu-path" option is not removed from
configure, but may not be used alongside the new option.
Wiki changes have already been made on
https://github.com/joyent/node/wiki/Installation
and a new page created at
https://github.com/joyent/node/wiki/Intl
(marked as provisional until this change lands.)
Summary of changes:
* README.md : doc updates
* .gitignore : added "deps/icu" as this is the location where ICU is
unpacked to.
* Makefile : added the tools/icu/* files to cpplint, but excluded a
problematic file.
* configure : added the "--with-intl" option mentioned above.
Calculate at config time the list of ICU source files to use and data
packaging options.
* node.gyp : add the new files src/node_i18n.cc/.h as well as ICU
linkage.
* src/node.cc : add call into
node::i18n::InitializeICUDirectory(icu_data_dir) as well as new
--icu-data-dir option and NODE_ICU_DATA env variable to configure ICU
data loading. This loading is only relevant in the "small"
configuration.
* src/node_i18n.cc : new source file for the above Initialize..
function, to setup ICU as needed.
* tools/icu : new directory with some tools needed for this build.
* tools/icu/icu-generic.gyp : new .gyp file that builds ICU in some new
ways, both on unix/mac and windows.
* tools/icu/icu-system.gyp : new .gyp file to build node against a
pkg-config detected ICU.
* tools/icu/icu_small.json : new config file for the "English-only" small
build.
* tools/icu/icutrim.py : new tool for trimming down ICU data. Reads the
above .json file.
* tools/icu/iculslocs.cc : new tool for repairing ICU data manifests
after trim operation.
* tools/icu/no-op.cc : dummy file to force .gyp into using a C++ linker.
* vcbuild.bat : added small-icu and full-icu options, to call into
configure.
* Fixed toolset dependencies, see
https://github.com/joyent/node/pull/7719#issuecomment-54641687
Note that because of a bug in gyp {CC,CXX}_host must also be set.
Otherwise gcc/g++ will be used by default for part of the build.
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
Reviewed-by: Fedor Indutny <fedor@indutny.com>