Skip to content

Commit

Permalink
doc: general improvements to url.md copy
Browse files Browse the repository at this point in the history
General cleanup and restructuring of the doc. Added
additional detail to how URLs are serialized.
  • Loading branch information
jasnell committed May 25, 2016
1 parent 7c3e920 commit 1361514
Showing 1 changed file with 191 additions and 82 deletions.
273 changes: 191 additions & 82 deletions doc/api/url.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,139 +2,248 @@

Stability: 2 - Stable

This module has utilities for URL resolution and parsing.
Call `require('url')` to use it.
The `url` module provides utilities for URL resolution and parsing. It can be
accessed using:

## URL Parsing
```js
const url = require('url');
```

Parsed URL objects have some or all of the following fields, depending on
whether or not they exist in the URL string. Any parts that are not in the URL
string will not be in the parsed object. Examples are shown for the URL
## URL Strings and URL Objects

`'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'`
A URL string is a structured string containing multiple meaningful components.
When parsed, a URL object is returned containing properties for each of these
components.

* `href`: The full URL that was originally parsed. Both the protocol and host are lowercased.
The following details each of the components of a parsed URL. The example
`'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'` is used to
illustrate each.

Example: `'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'`
```
+---------------------------------------------------------------------------+
| href |
+----------++-----------+-----------------+-------------------------+-------+
| protocol || auth | host | path | hash |
| || +----------+------+----------+--------------+ |
| || | hostname | port | pathname | search | |
| || | | | +-+------------+ |
| || | | | | | query | |
" http: // user:pass @ host.com : 8080 /p/a/t/h ? query=string #hash "
| || | | | | | | |
+----------++-----------+-----------+------+----------+-+-----------+-------+
(all spaces in the "" line should be ignored -- they're purely for formatting)
```

* `protocol`: The request protocol, lowercased.
### urlObject.href

Example: `'http:'`
The `href` property is the full URL string that was parsed with both the
`protocol` and `host` components converted to lower-case.

* `slashes`: The protocol requires slashes after the colon.
For example: `'http://user:pass@host.com:8080/p/a/t/h?query=string#hash'`

Example: true or false
### urlObject.protocol

* `host`: The full lowercased host portion of the URL, including port
information.
The `protocol` property identifies the URL's lower-cased protocol scheme.

Example: `'host.com:8080'`
For example: `'http:'`

* `auth`: The authentication information portion of a URL.
### urlObject.slashes

Example: `'user:pass'`
The `slashes` property is a `boolean` with a value of `true` if two ASCII
forward-slash characters (`/`) are required following the colon in the
`protocol`.

* `hostname`: Just the lowercased hostname portion of the host.
### urlObject.host

Example: `'host.com'`
The `host` property is the full lower-cased host portion of the URL, including
the `port` if specified.

* `port`: The port number portion of the host.
For example: `'host.com:8080'`

Example: `'8080'`
### urlObject.auth

* `pathname`: The path section of the URL, that comes after the host and
before the query, including the initial slash if present. No decoding is
performed.
The `auth` property is the username and password portion of the URL, also
referred to as "userinfo". This string subset follows the `protocol` and
double slashes (if present) and preceeds the `host` component, delimited by an
ASCII "at sign" (`@`). The format of the string is `{username}[:{password}]`,
with the `[:{password}]` portion being optional.

Example: `'/p/a/t/h'`
For example: `'user:pass'`

* `search`: The 'query string' portion of the URL, including the leading
question mark.
### urlObject.hostname

Example: `'?query=string'`
The `hostname` property is the lower-cased host name portion of the `host`
component *without* the `port` included.

* `path`: Concatenation of `pathname` and `search`. No decoding is performed.
For example: `'host.com'`

Example: `'/p/a/t/h?query=string'`
### urlObject.port

* `query`: Either the 'params' portion of the query string, or a
querystring-parsed object.
The `port` property is the numeric port portion of the `host` component.

Example: `'query=string'` or `{'query':'string'}`
For example: `'8080'`

* `hash`: The 'fragment' portion of the URL including the pound-sign.
### urlObject.pathname

Example: `'#hash'`
The `pathname` property consists of the entire path section of the URL. This
is everything following the `host` (including the `port`) and before the start
of the `query` or `hash` components, delimited by either the ASCII question
mark (`?`) or hash (`#`) characters.

### Escaped Characters
For example `'/p/a/t/h'`

Spaces (`' '`) and the following characters will be automatically escaped in the
properties of URL objects:
No decoding of the path string is performed.

```
< > " ` \r \n \t { } | \ ^ '
```
### urlObject.search

The `search` property consists of the entire "query string" portion of the
URL, including the leading ASCII question mark (`?`) character.

For example: `'?query=string'`

No decoding of the query string is performed.

### urlObject.path

The `path` property is a concatenation of the `pathname` and `search`
components.

For example: `'/p/a/t/h?query=string'`

No decoding of the `path` is performed.

### urlObject.query

The `query` property is either the "params" portion of the query string (
everything *except* the leading ASCII question mark (`?`), or an object
returned by the [`querystring`][] module's `parse()` method:

---
For example: `'query=string'` or `{'query': 'string'}`

The following methods are provided by the URL module:
If returned as a string, no decoding of the query string is performed. If
returned as an object, both keys and values are decoded.

## url.format(urlObj)
### urlObject.hash

The `hash` property consists of the "fragment" portion of the URL including
the leading ASCII hash (`#`) character.

For example: `'#hash'`

## url.format(urlObject)
<!-- YAML
added: v0.1.25
-->

Take a parsed URL object, and return a formatted URL string.

Here's how the formatting process works:

* `href` will be ignored.
* `path` will be ignored.
* `protocol` is treated the same with or without the trailing `:` (colon).
* The protocols `http`, `https`, `ftp`, `gopher`, `file` will be
postfixed with `://` (colon-slash-slash) as long as `host`/`hostname` are present.
* All other protocols `mailto`, `xmpp`, `aim`, `sftp`, `foo`, etc will
be postfixed with `:` (colon).
* `slashes` set to `true` if the protocol requires `://` (colon-slash-slash)
* Only needs to be set for protocols not previously listed as requiring
slashes, such as `mongodb://localhost:8000/`, or if `host`/`hostname` are absent.
* `auth` will be used if present.
* `hostname` will only be used if `host` is absent.
* `port` will only be used if `host` is absent.
* `host` will be used in place of `hostname` and `port`.
* `pathname` is treated the same with or without the leading `/` (slash).
* `query` (object; see `querystring`) will only be used if `search` is absent.
* `search` will be used in place of `query`.
* It is treated the same with or without the leading `?` (question mark).
* `hash` is treated the same with or without the leading `#` (pound sign, anchor).

## url.parse(urlStr[, parseQueryString][, slashesDenoteHost])
* `urlObject` {Object} A URL object (either as returned by `url.parse()` or
constructed otherwise).

The `url.format()` method processes the given URL object and returns a formatted
URL string.

The formatting process essentially operates as follows:

* A new empty string `result` is created.
* If `urlObject.protocol` is a string, it is appended as-is to `result`.
* Otherwise, if `urlObject.protocol` is not `undefined` and is not a string, an
[`Error`][] is thrown.
* For all string values of `urlObject.protocol` that *do not end* with an ASCII
colon (`:`) character, the literal string `:` will be appended to `result`.
* If either the `urlObject.slashes` property is true, `urlObject.protocol`
begins with one of `http`, `https`, `ftp`, `gopher`, or `file`, or
`urlObject.protocol` is `undefined`, the literal string `//` will be appended
to `result`.
* If the value of the `urlObject.auth` property is truthy, and either
`urlObject.host` or `urlObject.hostname` are not `undefined`, the value of
`urlObject.auth` will be coerced into a string and appended to `result`
followed by the literal string `@`.
* If the `urlObject.host` property is `undefined` then:
* If the `urlObject.hostname` is a string, it is appended to `result`.
* Otherwise, if `urlObject.hostname` is not `undefined` and is not a string,
an [`Error`][] is thrown.
* If the `urlObject.port` property value is truthy, and `urlObject.hostname`
is not `undefined`:
* The literal string `:` is appended to `result`, and
* The value of `urlObject.port` is coerced to a string and appended to
`result`.
* Otherwise, if the `urlObject.host` property value is truthy, the value of
`urlObject.host` is coerced to a string and appended to `result`.
* If the `urlObject.pathname` property is a string that is not an empty string:
* If the `urlObject.pathname` *does not start* with an ASCII forward slash
(`/`), then the literal string '/' is appended to `result`.
* The value of `urlObject.pathname` is appended to `result`.
* Otherwise, if `urlObject.pathname` is not `undefined` and is not a string, an
[`Error`][] is thrown.
* If the `urlObject.search` property is `undefined` and if the `urlObject.query`
property is an `Object`, the literal string `?` is appended to `result`
followed by the output of calling the [`querystring`][] module's `stringify()`
method passing the value of `urlObject.query`.
* Otherwise, if `urlObject.search` is a string:
* If the value of `urlObject.search` *does not start* with the ASCII question
mark (`?`) character, the literal string `?` is appended to `result`.
* The value of `urlObject.search` is appended to `result`.
* Otherwise, if `urlObject.search` is not `undefined` and is not a string, an
[`Error`][] is thrown.
* If the `urlObject.hash` property is a string:
* If the value of `urlObject.hash` *does not start* with the ASCII hash (`#`)
character, the literal string `#` is appended to `result`.
* The value of `urlObject.hash` is appended to `result`.
* Otherwise, if the `urlObject.hash` property is not `undefined` and is not a
string, an [`Error`][] is thrown.
* `result` is returned.


## url.parse(urlString[, parseQueryString[, slashesDenoteHost]])
<!-- YAML
added: v0.1.25
-->

Take a URL string, and return an object.

Pass `true` as the second argument to also parse the query string using the
`querystring` module. If `true` then the `query` property will always be
assigned an object, and the `search` property will always be a (possibly
empty) string. If `false` then the `query` property will not be parsed or
decoded. Defaults to `false`.
* `urlString` {string} The URL string to parse.
* `parseQueryString` {boolean} If `true`, the `query` property will always
be set to an object returned by the [`querystring`][] module's `parse()`
method. If `false`, the `query` property on the returned URL object will be an
unparsed, undecoded string. Defaults to `false`.
* `slashesDenoteHost` {boolean} If `true`, the first token after the literal
string `//` and preceeding the next `/` will be interpreted as the `host`.
For instance, given `//foo/bar`, the result would be
`{host: 'foo', pathname: '/bar'}` rather than `{pathname: '//foo/bar'}`.
Defaults to `false`.

Pass `true` as the third argument to treat `//foo/bar` as
`{ host: 'foo', pathname: '/bar' }` rather than
`{ pathname: '//foo/bar' }`. Defaults to `false`.
The `url.parse()` method takes a URL string, parses it, and returns a URL
object.

## url.resolve(from, to)
<!-- YAML
added: v0.1.25
-->

Take a base URL, and a href URL, and resolve them as a browser would for
an anchor tag. Examples:
* `from` {string} The Base URL being resolved against.
* `to` {string} The HREF URL being resolved.

The `url.resolve()` method resolves a target URL relative to a base URL in a
manner similar to that of a Web browser resolving an anchor tag HREF.

For example:

```js
url.resolve('/one/two/three', 'four') // '/one/two/four'
url.resolve('http://example.com/', '/one') // 'http://example.com/one'
url.resolve('http://example.com/one', '/two') // 'http://example.com/two'
```

## Escaped Characters

URLs are only permitted to contain a certain range of characters. Spaces (`' '`)
and the following characters will be automatically escaped in the
properties of URL objects:

```
< > " ` \r \n \t { } | \ ^ '
```

For example, the ASCII space character (`' '`) is encoded as `%20`. The ASCII
forward slash (`/`) character is encoded as `%3C`.


[`Error`]: errors.html#errors_class_error
[`querystring`]: querystring.html

0 comments on commit 1361514

Please sign in to comment.