Skip to content

Commit

Permalink
encoding: rudimentary TextDecoder support w/o ICU
Browse files Browse the repository at this point in the history
Also split up the tests.

Backport-PR-URL: #14786
Backport-Reviewed-By: Anna Henningsen <anna@addaleax.net>

PR-URL: #14489
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
  • Loading branch information
TimothyGu authored and addaleax committed Aug 12, 2017
1 parent a781bb4 commit c5ee34e
Show file tree
Hide file tree
Showing 8 changed files with 428 additions and 268 deletions.
7 changes: 7 additions & 0 deletions doc/api/errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -712,6 +712,12 @@ only used in the [WHATWG URL API][] for strict compliance with the specification
native Node.js APIs, `func(undefined)` and `func()` are treated identically, and
the [`ERR_INVALID_ARG_TYPE`][] error code may be used instead.

<a id="ERR_NO_ICU"></a>
### ERR_NO_ICU

Used when an attempt is made to use features that require [ICU][], while
Node.js is not compiled with ICU support.

<a id="ERR_SOCKET_ALREADY_BOUND"></a>
### ERR_SOCKET_ALREADY_BOUND
Used when an attempt is made to bind a socket that has already been bound.
Expand Down Expand Up @@ -795,6 +801,7 @@ are most likely an indication of a bug within Node.js itself.
[`new URLSearchParams(iterable)`]: url.html#url_constructor_new_urlsearchparams_iterable
[`process.on('uncaughtException')`]: process.html#process_event_uncaughtexception
[`process.send()`]: process.html#process_process_send_message_sendhandle_options_callback
[ICU]: intl.html#intl_internationalization_support
[Node.js Error Codes]: #nodejs-error-codes
[V8's stack trace API]: https://github.com/v8/v8/wiki/Stack-Trace-API
[WHATWG URL API]: url.html#url_the_whatwg_url_api
Expand Down
2 changes: 1 addition & 1 deletion doc/api/intl.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ option:
| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full
| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full
| [REPL][] | partial (inaccurate line editing) | full | full | full
| [`require('util').TextDecoder`][] | none (class does not exist) | partial/full (depends on OS) | partial (Unicode-only) | full
| [`require('util').TextDecoder`][] | partial (basic encodings support) | partial/full (depends on OS) | partial (Unicode-only) | full

*Note*: The "(not locale-aware)" designation denotes that the function carries
out its operation just like the non-`Locale` version of the function, if one
Expand Down
61 changes: 39 additions & 22 deletions doc/api/util.md
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,7 @@ added: v8.0.0
A Symbol that can be used to declare custom promisified variants of functions,
see [Custom promisified functions][].

### Class: util.TextDecoder
## Class: util.TextDecoder
<!-- YAML
added: v8.3.0
-->
Expand All @@ -563,23 +563,33 @@ while (buffer = getNextChunkSomehow()) {
string += decoder.decode(); // end-of-stream
```

#### WHATWG Supported Encodings
### WHATWG Supported Encodings

Per the [WHATWG Encoding Standard][], the encodings supported by the
`TextDecoder` API are outlined in the tables below. For each encoding,
one or more aliases may be used. Support for some encodings is enabled
only when Node.js is using the full ICU data (see [Internationalization][]).
`util.TextDecoder` is `undefined` when ICU is not enabled during build.
one or more aliases may be used.

##### Encodings Supported By Default
Different Node.js build configurations support different sets of encodings.
While a very basic set of encodings is supported even on Node.js builds without
ICU enabled, support for some encodings is provided only when Node.js is built
with ICU and using the full ICU data (see [Internationalization][]).

#### Encodings Supported Without ICU

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf8'` | `'unicode-1-1-utf-8'`, `'utf-8'` |
| `'utf-16be'`| |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'`| `'utf-16'` |

##### Encodings Requiring Full-ICU
#### Encodings Supported by Default (With ICU)

| Encoding | Aliases |
| ----------- | --------------------------------- |
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
| `'utf-16le'`| `'utf-16'` |
| `'utf-16be'`| |

#### Encodings Requiring Full ICU Data

| Encoding | Aliases |
| ----------------- | -------------------------------- |
Expand Down Expand Up @@ -621,13 +631,14 @@ only when Node.js is using the full ICU data (see [Internationalization][]).
*Note*: The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
is not supported.

#### new TextDecoder([encoding[, options]])
### new TextDecoder([encoding[, options]])

* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
supports. Defaults to `'utf-8'`.
* `options` {Object}
* `fatal` {boolean} `true` if decoding failures are fatal. Defaults to
`false`.
`false`. This option is only supported when ICU is enabled (see
[Internationalization][]).
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
order mark in the decoded result. When `false`, the byte order mark will
be removed from the output. This option is only used when `encoding` is
Expand All @@ -636,7 +647,7 @@ is not supported.
Creates an new `TextDecoder` instance. The `encoding` may specify one of the
supported encodings or an alias.

#### textDecoder.decode([input[, options]])
### textDecoder.decode([input[, options]])

* `input` {ArrayBuffer|DataView|TypedArray} An `ArrayBuffer`, `DataView` or
Typed Array instance containing the encoded data.
Expand All @@ -652,49 +663,55 @@ internally and emitted after the next call to `textDecoder.decode()`.
If `textDecoder.fatal` is `true`, decoding errors that occur will result in a
`TypeError` being thrown.

#### textDecoder.encoding
### textDecoder.encoding

* Value: {string}
* {string}

The encoding supported by the `TextDecoder` instance.

#### textDecoder.fatal
### textDecoder.fatal

* Value: {boolean}
* {boolean}

The value will be `true` if decoding errors result in a `TypeError` being
thrown.

#### textDecoder.ignoreBOM
### textDecoder.ignoreBOM

* Value: {boolean}
* {boolean}

The value will be `true` if the decoding result will include the byte order
mark.

### Class: util.TextEncoder
## Class: util.TextEncoder
<!-- YAML
added: v8.3.0
-->

> Stability: 1 - Experimental
An implementation of the [WHATWG Encoding Standard][] `TextEncoder` API. All
instances of `TextEncoder` only support `UTF-8` encoding.
instances of `TextEncoder` only support UTF-8 encoding.

```js
const encoder = new TextEncoder();
const uint8array = encoder.encode('this is some data');
```

#### textEncoder.encode([input])
### textEncoder.encode([input])

* `input` {string} The text to encode. Defaults to an empty string.
* Returns: {Uint8Array}

UTF-8 Encodes the `input` string and returns a `Uint8Array` containing the
UTF-8 encodes the `input` string and returns a `Uint8Array` containing the
encoded bytes.

### textDecoder.encoding

* {string}

The encoding supported by the `TextEncoder` instance. Always set to `'utf-8'`.

## Deprecated APIs

The following APIs have been deprecated and should no longer be used. Existing
Expand Down
Loading

0 comments on commit c5ee34e

Please sign in to comment.