This patch adds a fast path for parsing of simple path-only URLs, as commonly
found in HTTP requests received by a server.
Benchmark results [ms], before / after patch:
/foo/bar 0.008956 0.000418 (fast path used)
http://example.com/ 0.011426 0.011437 (normal slow path, no change)
In a simple 'ab' benchmark of a single-threaded web server, this patch
increases the request rate from around 6400 to 7400 req/s.
Reviewed-By: Fedor Indutny <fedor@indutny.com>
See https://code.google.com/p/chromium/issues/detail?id=25916
Parse URLs with backslashes the same as web browsers, by replacing all
backslashes with forward slashes, except those that occur after the
first # character.
Manual rebase of 9520ade
Signed-off-by: Trevor Norris <trev.norris@gmail.com>
When using url.parse(), path and pathname usually return '/' when there
is no path available. However when you have a protocol that contains
non-lowercase letters and the input string does not have a trailing
slash, both path and pathname will be undefined.
In cases where there are multiple @-chars in a url, Node currently
parses the hostname and auth sections differently than web browsers.
This part of the bug is serious, and should be landed in v0.10, and
also ported to v0.8, and releases made as soon as possible.
The less serious issue is that there are many other sorts of malformed
urls which Node either accepts when it should reject, or interprets
differently than web browsers. For example, `http://a.com*foo` is
interpreted by Node like `http://a.com/*foo` when web browsers treat
this as `http://a.com%3Bfoo/`.
In general, *only* the `hostEndingChars` should be the characters that
delimit the host portion of the URL. Most of the current `nonHostChars`
that appear in the hostname should be escaped, but some of them (such as
`;` and `%` when it does not introduce a hex pair) should raise an
error.
We need to have a broader discussion about whether it's best to throw in
these cases, and potentially break extant programs, or return an object
that has every field set to `null` so that any attempt to read the
hostname/auth/etc. will appear to be empty.
`url.format` should escape ? and # chars in pathname, and # chars in
search, because they change the semantics of the operation otherwise.
Don't escape % chars, or anything else. (see: #4082)
added a .path property = .pathname + .search for use with http.request
And tests to verify everything.
With the tests, I changed over to deepEqual, and I would note the comment on the test
['.//g', 'f:/a', 'f://g'], which I think is a fundamental problem
This supersedes pull 1596
The file:// protocol *always* has a hostname; it's frequently
abbreviated as an empty string, which represents 'localhost'
implicitly.
According to RFC 1738 (http://tools.ietf.org/html/rfc1738):
A file URL takes the form:
file://<host>/<path>
where <host> is the fully qualified domain name of the system on
which the <path> is accessible...
As a special case, <host> can be the string "localhost" or the empty
string; this is interpreted as 'the machine from which the URL is
being interpreted'.
The change for #954 introduced a regression that would cause
the url parser to fail on special chars found in the auth
segment. Fix that, and also don't create invalid urls when
format() is called on an object containing an auth member
containing '@' characters or delimiters.
1. Allow single-quotes in urls, but escape them.
2. Add comments about which RFCs we're following for guidance.
3. Handle any invalid character in the hostname portion.
4. lcase protocol and hostname portions, since they are
case-insensitive.
This does 3 things:
1. Delimiters and "unwise" characters are never included in the
hostname or path.
2. url.format will sanitize string URLs that are passed to it.
3. The parsed url's 'href' member will be the sanitized url, which may
not match the argument to url.parse.
Now the path module can be adapted to support windows paths without breaking
the url module. It also allows the undocumented keepBlanks flag to be
removed from path.join and path.normalizeArray.
1. Express desired path.join behavior in tests.
2. Update fs.realpath to reflect new path.join behavior
3. Update url.resolve() to use new path.join behavior.
Note that "//" is still a special indicator for the hostname, and this does
not change the parsing of mailto: and other "slashless" url schemes. It
does however remove some oddness in url.parse(req.url) which is the most
common use-case for the url.parse function.
Also, make a slight change from original on url-module to put the
spacePattern into the function. On closer inspection, it turns out that the
nonlocal-var cost is higher than the compiling-a-regexp cost.
Also, documentation.