From dbdea02a993f7fc40bccdac8a6eec47c67a37a3b Mon Sep 17 00:00:00 2001 From: James M Snell Date: Fri, 20 May 2016 13:11:09 -0700 Subject: [PATCH] doc: general improvements to url.md copy General cleanup and restructuring of the doc. Added additional detail to how URLs are serialized. PR-URL: https://github.com/nodejs/node/pull/6904 Reviewed-By: Robert Jefe Lindstaedt Reviewed-By: Anna Henningsen Reviewed-By: Sakthipriyan Vairamani Reviewed-By: Benjamin Gruenbaum Reviewed-By: Brian White --- doc/api/url.md | 273 ++++++++++++++++++++++++++++++++++--------------- 1 file changed, 191 insertions(+), 82 deletions(-) diff --git a/doc/api/url.md b/doc/api/url.md index f8c2980bd5..d269927280 100644 --- a/doc/api/url.md +++ b/doc/api/url.md @@ -2,139 +2,248 @@ Stability: 2 - Stable -This module has utilities for URL resolution and parsing. -Call `require('url')` to use it. +The `url` module provides utilities for URL resolution and parsing. It can be +accessed using: -## URL Parsing +```js +const url = require('url'); +``` -Parsed URL objects have some or all of the following fields, depending on -whether or not they exist in the URL string. Any parts that are not in the URL -string will not be in the parsed object. Examples are shown for the URL +## URL Strings and URL Objects -`'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'` +A URL string is a structured string containing multiple meaningful components. +When parsed, a URL object is returned containing properties for each of these +components. -* `href`: The full URL that was originally parsed. Both the protocol and host are lowercased. +The following details each of the components of a parsed URL. The example +`'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'` is used to +illustrate each. - Example: `'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'` +``` ++---------------------------------------------------------------------------+ +| href | ++----------++-----------+-----------------+-------------------------+-------+ +| protocol || auth | host | path | hash | +| || +----------+------+----------+--------------+ | +| || | hostname | port | pathname | search | | +| || | | | +-+------------+ | +| || | | | | | query | | +" http: // user:pass @ host.com : 8080 /p/a/t/h ? query=string #hash " +| || | | | | | | | ++----------++-----------+-----------+------+----------+-+-----------+-------+ +(all spaces in the "" line should be ignored -- they're purely for formatting) +``` -* `protocol`: The request protocol, lowercased. +### urlObject.href - Example: `'http:'` +The `href` property is the full URL string that was parsed with both the +`protocol` and `host` components converted to lower-case. -* `slashes`: The protocol requires slashes after the colon. +For example: `'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'` - Example: true or false +### urlObject.protocol -* `host`: The full lowercased host portion of the URL, including port - information. +The `protocol` property identifies the URL's lower-cased protocol scheme. - Example: `'host.com:8080'` +For example: `'http:'` -* `auth`: The authentication information portion of a URL. +### urlObject.slashes - Example: `'user:pass'` +The `slashes` property is a `boolean` with a value of `true` if two ASCII +forward-slash characters (`/`) are required following the colon in the +`protocol`. -* `hostname`: Just the lowercased hostname portion of the host. +### urlObject.host - Example: `'host.com'` +The `host` property is the full lower-cased host portion of the URL, including +the `port` if specified. -* `port`: The port number portion of the host. +For example: `'host.com:8080'` - Example: `'8080'` +### urlObject.auth -* `pathname`: The path section of the URL, that comes after the host and - before the query, including the initial slash if present. No decoding is - performed. +The `auth` property is the username and password portion of the URL, also +referred to as "userinfo". This string subset follows the `protocol` and +double slashes (if present) and preceeds the `host` component, delimited by an +ASCII "at sign" (`@`). The format of the string is `{username}[:{password}]`, +with the `[:{password}]` portion being optional. - Example: `'/p/a/t/h'` +For example: `'user:pass'` -* `search`: The 'query string' portion of the URL, including the leading - question mark. +### urlObject.hostname - Example: `'?query=string'` +The `hostname` property is the lower-cased host name portion of the `host` +component *without* the `port` included. -* `path`: Concatenation of `pathname` and `search`. No decoding is performed. +For example: `'host.com'` - Example: `'/p/a/t/h?query=string'` +### urlObject.port -* `query`: Either the 'params' portion of the query string, or a - querystring-parsed object. +The `port` property is the numeric port portion of the `host` component. - Example: `'query=string'` or `{'query':'string'}` +For example: `'8080'` -* `hash`: The 'fragment' portion of the URL including the pound-sign. +### urlObject.pathname - Example: `'#hash'` +The `pathname` property consists of the entire path section of the URL. This +is everything following the `host` (including the `port`) and before the start +of the `query` or `hash` components, delimited by either the ASCII question +mark (`?`) or hash (`#`) characters. -### Escaped Characters +For example `'/p/a/t/h'` -Spaces (`' '`) and the following characters will be automatically escaped in the -properties of URL objects: +No decoding of the path string is performed. -``` -< > " ` \r \n \t { } | \ ^ ' -``` +### urlObject.search + +The `search` property consists of the entire "query string" portion of the +URL, including the leading ASCII question mark (`?`) character. + +For example: `'?query=string'` + +No decoding of the query string is performed. + +### urlObject.path + +The `path` property is a concatenation of the `pathname` and `search` +components. + +For example: `'/p/a/t/h?query=string'` + +No decoding of the `path` is performed. + +### urlObject.query + +The `query` property is either the "params" portion of the query string ( +everything *except* the leading ASCII question mark (`?`), or an object +returned by the [`querystring`][] module's `parse()` method: ---- +For example: `'query=string'` or `{'query': 'string'}` -The following methods are provided by the URL module: +If returned as a string, no decoding of the query string is performed. If +returned as an object, both keys and values are decoded. -## url.format(urlObj) +### urlObject.hash + +The `hash` property consists of the "fragment" portion of the URL including +the leading ASCII hash (`#`) character. + +For example: `'#hash'` + +## url.format(urlObject) -Take a parsed URL object, and return a formatted URL string. - -Here's how the formatting process works: - -* `href` will be ignored. -* `path` will be ignored. -* `protocol` is treated the same with or without the trailing `:` (colon). - * The protocols `http`, `https`, `ftp`, `gopher`, `file` will be - postfixed with `://` (colon-slash-slash) as long as `host`/`hostname` are present. - * All other protocols `mailto`, `xmpp`, `aim`, `sftp`, `foo`, etc will - be postfixed with `:` (colon). -* `slashes` set to `true` if the protocol requires `://` (colon-slash-slash) - * Only needs to be set for protocols not previously listed as requiring - slashes, such as `mongodb://localhost:8000/`, or if `host`/`hostname` are absent. -* `auth` will be used if present. -* `hostname` will only be used if `host` is absent. -* `port` will only be used if `host` is absent. -* `host` will be used in place of `hostname` and `port`. -* `pathname` is treated the same with or without the leading `/` (slash). -* `query` (object; see `querystring`) will only be used if `search` is absent. -* `search` will be used in place of `query`. - * It is treated the same with or without the leading `?` (question mark). -* `hash` is treated the same with or without the leading `#` (pound sign, anchor). - -## url.parse(urlStr[, parseQueryString][, slashesDenoteHost]) +* `urlObject` {Object} A URL object (either as returned by `url.parse()` or + constructed otherwise). + +The `url.format()` method processes the given URL object and returns a formatted +URL string. + +The formatting process essentially operates as follows: + +* A new empty string `result` is created. +* If `urlObject.protocol` is a string, it is appended as-is to `result`. +* Otherwise, if `urlObject.protocol` is not `undefined` and is not a string, an + [`Error`][] is thrown. +* For all string values of `urlObject.protocol` that *do not end* with an ASCII + colon (`:`) character, the literal string `:` will be appended to `result`. +* If either the `urlObject.slashes` property is true, `urlObject.protocol` + begins with one of `http`, `https`, `ftp`, `gopher`, or `file`, or + `urlObject.protocol` is `undefined`, the literal string `//` will be appended + to `result`. +* If the value of the `urlObject.auth` property is truthy, and either + `urlObject.host` or `urlObject.hostname` are not `undefined`, the value of + `urlObject.auth` will be coerced into a string and appended to `result` + followed by the literal string `@`. +* If the `urlObject.host` property is `undefined` then: + * If the `urlObject.hostname` is a string, it is appended to `result`. + * Otherwise, if `urlObject.hostname` is not `undefined` and is not a string, + an [`Error`][] is thrown. + * If the `urlObject.port` property value is truthy, and `urlObject.hostname` + is not `undefined`: + * The literal string `:` is appended to `result`, and + * The value of `urlObject.port` is coerced to a string and appended to + `result`. +* Otherwise, if the `urlObject.host` property value is truthy, the value of + `urlObject.host` is coerced to a string and appended to `result`. +* If the `urlObject.pathname` property is a string that is not an empty string: + * If the `urlObject.pathname` *does not start* with an ASCII forward slash + (`/`), then the literal string '/' is appended to `result`. + * The value of `urlObject.pathname` is appended to `result`. +* Otherwise, if `urlObject.pathname` is not `undefined` and is not a string, an + [`Error`][] is thrown. +* If the `urlObject.search` property is `undefined` and if the `urlObject.query` + property is an `Object`, the literal string `?` is appended to `result` + followed by the output of calling the [`querystring`][] module's `stringify()` + method passing the value of `urlObject.query`. +* Otherwise, if `urlObject.search` is a string: + * If the value of `urlObject.search` *does not start* with the ASCII question + mark (`?`) character, the literal string `?` is appended to `result`. + * The value of `urlObject.search` is appended to `result`. +* Otherwise, if `urlObject.search` is not `undefined` and is not a string, an + [`Error`][] is thrown. +* If the `urlObject.hash` property is a string: + * If the value of `urlObject.hash` *does not start* with the ASCII hash (`#`) + character, the literal string `#` is appended to `result`. + * The value of `urlObject.hash` is appended to `result`. +* Otherwise, if the `urlObject.hash` property is not `undefined` and is not a + string, an [`Error`][] is thrown. +* `result` is returned. + + +## url.parse(urlString[, parseQueryString[, slashesDenoteHost]]) -Take a URL string, and return an object. - -Pass `true` as the second argument to also parse the query string using the -`querystring` module. If `true` then the `query` property will always be -assigned an object, and the `search` property will always be a (possibly -empty) string. If `false` then the `query` property will not be parsed or -decoded. Defaults to `false`. +* `urlString` {string} The URL string to parse. +* `parseQueryString` {boolean} If `true`, the `query` property will always + be set to an object returned by the [`querystring`][] module's `parse()` + method. If `false`, the `query` property on the returned URL object will be an + unparsed, undecoded string. Defaults to `false`. +* `slashesDenoteHost` {boolean} If `true`, the first token after the literal + string `//` and preceeding the next `/` will be interpreted as the `host`. + For instance, given `//foo/bar`, the result would be + `{host: 'foo', pathname: '/bar'}` rather than `{pathname: '//foo/bar'}`. + Defaults to `false`. -Pass `true` as the third argument to treat `//foo/bar` as -`{ host: 'foo', pathname: '/bar' }` rather than -`{ pathname: '//foo/bar' }`. Defaults to `false`. +The `url.parse()` method takes a URL string, parses it, and returns a URL +object. ## url.resolve(from, to) -Take a base URL, and a href URL, and resolve them as a browser would for -an anchor tag. Examples: +* `from` {string} The Base URL being resolved against. +* `to` {string} The HREF URL being resolved. + +The `url.resolve()` method resolves a target URL relative to a base URL in a +manner similar to that of a Web browser resolving an anchor tag HREF. + +For example: ```js url.resolve('/one/two/three', 'four') // '/one/two/four' url.resolve('http://example.com/', '/one') // 'http://example.com/one' url.resolve('http://example.com/one', '/two') // 'http://example.com/two' ``` + +## Escaped Characters + +URLs are only permitted to contain a certain range of characters. Spaces (`' '`) +and the following characters will be automatically escaped in the +properties of URL objects: + +``` +< > " ` \r \n \t { } | \ ^ ' +``` + +For example, the ASCII space character (`' '`) is encoded as `%20`. The ASCII +forward slash (`/`) character is encoded as `%3C`. + + +[`Error`]: errors.html#errors_class_error +[`querystring`]: querystring.html