From ed3386b886e7ee110f3777c748f15caed65dbfa6 Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Mon, 26 Jun 2017 13:05:49 +0800 Subject: [PATCH 1/6] doc: add documentation on ICU Refs: https://github.com/nodejs/node/pull/13644#discussion_r121616327 --- doc/api/_toc.md | 1 + doc/api/intl.md | 204 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+) create mode 100644 doc/api/intl.md diff --git a/doc/api/_toc.md b/doc/api/_toc.md index 4865334ec02b39..1075bc6be39858 100644 --- a/doc/api/_toc.md +++ b/doc/api/_toc.md @@ -26,6 +26,7 @@ * [HTTP](http.html) * [HTTPS](https.html) * [Inspector](inspector.html) +* [Internationalization](intl.html) * [Modules](modules.html) * [Net](net.html) * [OS](os.html) diff --git a/doc/api/intl.md b/doc/api/intl.md new file mode 100644 index 00000000000000..4e423bfd4d2bd7 --- /dev/null +++ b/doc/api/intl.md @@ -0,0 +1,204 @@ +# Internationalization Support + +Node.js has many features that make it easier to write internationalized +programs. Some of them are: + +- Locale-sensitive or Unicode-aware functions in the [ECMAScript Language + Specification][ECMA-262]: + - [`String.prototype.normalize()`][] + - [`String.prototype.toLowerCase()`][] + - [`String.prototype.toUpperCase()`][] +- All functionality described in the [ECMAScript Internationalization API + Specification][ECMA-402] (aka ECMA-402): + - [`Intl`][] object + - Locale-sensitive methods like [`String.prototype.localeCompare()`][] and + [`Date.prototype.toLocaleString()`][] +- The [WHATWG URL parser][]'s [internationalized domain names][] (IDNs) support +- [`require('buffer').transcode()`][] + +Node.js (and its underlying V8 engine) uses [ICU][] to implement these features +in native C/C++ code. However, some of them require a very large ICU data file +in order to support all locales of the world. Since most Node.js users will +make use of only a small section in the full ICU data set, we provide several +options for customizing ICU support in a Node.js build. + +## Options for building Node.js + +To control how ICU is used in Node.js, four `configure` options are available +during compilation. Additional details on how to compile Node.js are documented +in [BUILDING.md][]. + +- `--with-intl=none` / `--without-intl` +- `--with-intl=system-icu` +- `--with-intl=small-icu` (default) +- `--with-intl=full-icu` + +An overview of available Node.js and JavaScript features for each `configure` +option: + +| | `none` | `system-icu` | `small-icu` | `full-icu` +|-----------------------------------------|--------------------------------|------------------------------|------------------------|------------ +| [`String.prototype.normalize()`][] | none (function is no-op) | full | full | full +| `String.prototype.to*Case()` | full | full | full | full +| [`Intl`][] | none (object does not exist) | partial/full (depends on OS) | partial (English-only) | full +| [`String.prototype.localeCompare()`][] | partial (not locale-aware) | full | full | full +| `String.prototype.toLocale*Case()` | partial (not locale-aware) | full | full | full +| [`Number.prototype.toLocaleString()`][] | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full +| `Date.prototype.toLocale*String()` | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full +| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full +| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full + +*Note*: The "(not locale-aware)" designation denotes that the function carries +out its operation just like the non-`Locale` version of the function, if one +exists. For example, under `none` mode, `Date.prototype.toLocaleString()`'s +operation is identical to that of `Date.prototype.toString()`. + +### Disable all internationalization features (`none`) + +If this option is chosen, most internationalization features mentioned above +will be **unavailable** in the resulting `node` binary. + +### Build with a pre-installed ICU (`system-icu`) + +Node.js can link against an ICU build already installed on the system. In fact, +most Linux distributions already come with ICU installed, and this option would +make it possible to reuse the same set of data used by other components in your +OS. + +Functionalities that only require the ICU library itself, such as +[`String.prototype.normalize()`][] and the [WHATWG URL parser][], are fully +supported under `system-icu`. Features that require ICU locale data in +addition, such as [`Intl.DateTimeFormat`][] *may* be fully or partially +supported, depending on the completeness of the ICU data installed on the +system. + +### Embed a limited set of ICU data (`small-icu`) + +This option makes the resulting binary link against the ICU library statically, +and includes a subset of ICU data (typically only the English locale) within +the `node` executable. + +Functionalities that only require the ICU library itself, such as +[`String.prototype.normalize()`][] and the [WHATWG URL parser][], are fully +supported under `small-icu`. Features that require ICU locale data in addition, +such as [`Intl.DateTimeFormat`][], generally only work with the English locale: + +```js +const january = new Date(9e8); +const english = new Intl.DateTimeFormat('en', { month: 'long' }); +const spanish = new Intl.DateTimeFormat('es', { month: 'long' }); + +console.log(english.format(january)); + // Prints "January" +console.log(spanish.format(january)); + // Prints "M01" on small-icu + // Should print "enero" +``` + +This mode provides a good balance between features and binary size, and it is +the default behavior if no `--with-intl` flag is passed. The official binaries +are also built in this mode. + +#### Providing ICU data at runtime + +If you use the `small-icu` option, you can still provide additional locale data +at runtime so that the JS methods would work for all ICU locales. Assuming the +data file is stored at `/some/directory`, you could make ICU be aware of it +through either: + +* The [`NODE_ICU_DATA`][] environmental variable: + + ```shell + env NODE_ICU_DATA=/some/directory node + ``` + +* The [`--icu-data-dir`][] CLI parameter: + + ```shell + node --icu-data-dir=/some/directory + ``` + +(If both are specified, the `--icu-data-dir` CLI parameter takes precedence.) + +ICU is able to automatically find and load a variety of data formats, but the +data must be appropriate for the ICU version, and the file correctly named. +The most common name for the data file is `icudt5X[bl].dat`, where `5X` denotes +the intended ICU version, and `b` or `l` indicates the system's endianness. +Check "[ICU Data][]" article in the ICU User Guide for other supported formats +and more details on ICU data in general. + +The [full-icu][] npm module can greatly simplify ICU data installation by +detecting the ICU version of the running `node` executable and downloading the +appropriate data file. After installing the module through `npm i full-icu`, +the data file will be available at `./node_modules/full-icu`. This path can be +then passed either to `NODE_ICU_DATA` or `--icu-data-dir` as shown above to +enable full `Intl` support. + +### Embed the entire ICU (`full-icu`) + +This option makes the resulting binary link against ICU statically and include +a full set of ICU data. A binary created this way has no further external +dependencies and supports all locales, but might be rather large. See +[BUILDING.md][BUILDING.md#full-icu] on how to compile a binary using this mode. + +## Detecting internationalization support + +To verify that ICU is enabled at all (`system-icu`, `small-icu`, or +`full-icu`), simply checking the existence of `Intl` should suffice: + +```js +const hasICU = typeof Intl === 'object'; +``` + +Alternatively, checking for `process.versions.icu`, a property defined only +when ICU is enabled, works too: + +```js +const hasICU = typeof process.versions.icu === 'string'; +``` + +To check for support for a non-English locale (i.e. `full-icu` or +`system-icu`), [`Intl.DateTimeFormat`][] can be a good distinguishing factor: + +```js +const hasFullICU = (() => { + try { + const january = new Date(9e8); + const spanish = new Intl.DateTimeFormat('es', { month: 'long' }); + return spanish.format(january) === 'enero'; + } catch (err) { + return false; + } +})(); +``` + +For more verbose tests for `Intl` support, the following resources may be found +to be helpful: + +- [btest402][]: Generally used to check whether Node.js with `Intl` support is + built correctly. +- [Test262][]: ECMAScript's official conformance test suite includes a section + dedicated to ECMA-402. + +[btest402]: https://github.com/srl295/btest402 +[BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md +[BUILDING.md#full-icu]: https://github.com/nodejs/node/blob/master/BUILDING.md#build-with-full-icu-support-all-locales-supported-by-icu +[`Date.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Date/toLocaleString +[ECMA-262]: https://tc39.github.io/ecma262/ +[ECMA-402]: https://tc39.github.io/ecma402/ +[full-icu]: https://www.npmjs.com/package/full-icu +[ICU]: http://icu-project.org/ +[ICU Data]: http://userguide.icu-project.org/icudata +[`--icu-data-dir`]: https://nodejs.org/api/cli.html#cli_icu_data_dir_file +[internationalized domain names]: https://en.wikipedia.org/wiki/Internationalized_domain_name +[`Intl`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Intl +[`Intl.DateTimeFormat`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat +[`NODE_ICU_DATA`]: https://nodejs.org/api/cli.html#cli_node_icu_data_file +[`Number.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString +[`require('buffer').transcode()`]: https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc +[`String.prototype.localeCompare()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare +[`String.prototype.normalize()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/normalize +[`String.prototype.toLowerCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toLowerCase +[`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase +[Test262]: https://github.com/tc39/test262/tree/master/test/intl402 +[WHATWG URL parser]: https://nodejs.org/api/url.html#url_the_whatwg_url_api From a844e1f5bcca0e015e093de21824847cfe92bf0a Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Tue, 27 Jun 2017 09:41:47 +0800 Subject: [PATCH 2/6] Address jasnell's comments --- doc/api/intl.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/doc/api/intl.md b/doc/api/intl.md index 4e423bfd4d2bd7..fd8324200f1f54 100644 --- a/doc/api/intl.md +++ b/doc/api/intl.md @@ -18,9 +18,11 @@ programs. Some of them are: Node.js (and its underlying V8 engine) uses [ICU][] to implement these features in native C/C++ code. However, some of them require a very large ICU data file -in order to support all locales of the world. Since most Node.js users will -make use of only a small section in the full ICU data set, we provide several -options for customizing ICU support in a Node.js build. +in order to support all locales of the world. Because it is expected that most +Node.js users will make use of only a small portion of ICU functionality, only +a subset of the full ICU data set is provided by Node.js by default. Several +options are provided for customizing and expanding the ICU data set either when +building or running Node.js. ## Options for building Node.js @@ -62,7 +64,7 @@ will be **unavailable** in the resulting `node` binary. Node.js can link against an ICU build already installed on the system. In fact, most Linux distributions already come with ICU installed, and this option would -make it possible to reuse the same set of data used by other components in your +make it possible to reuse the same set of data used by other components in the OS. Functionalities that only require the ICU library itself, such as @@ -101,9 +103,9 @@ are also built in this mode. #### Providing ICU data at runtime -If you use the `small-icu` option, you can still provide additional locale data +If the `small-icu` option is used, one can still provide additional locale data at runtime so that the JS methods would work for all ICU locales. Assuming the -data file is stored at `/some/directory`, you could make ICU be aware of it +data file is stored at `/some/directory`, it can be made available to ICU through either: * The [`NODE_ICU_DATA`][] environmental variable: @@ -124,7 +126,7 @@ ICU is able to automatically find and load a variety of data formats, but the data must be appropriate for the ICU version, and the file correctly named. The most common name for the data file is `icudt5X[bl].dat`, where `5X` denotes the intended ICU version, and `b` or `l` indicates the system's endianness. -Check "[ICU Data][]" article in the ICU User Guide for other supported formats +Check ["ICU Data"][] article in the ICU User Guide for other supported formats and more details on ICU data in general. The [full-icu][] npm module can greatly simplify ICU data installation by @@ -188,7 +190,7 @@ to be helpful: [ECMA-402]: https://tc39.github.io/ecma402/ [full-icu]: https://www.npmjs.com/package/full-icu [ICU]: http://icu-project.org/ -[ICU Data]: http://userguide.icu-project.org/icudata +["ICU Data"]: http://userguide.icu-project.org/icudata [`--icu-data-dir`]: https://nodejs.org/api/cli.html#cli_icu_data_dir_file [internationalized domain names]: https://en.wikipedia.org/wiki/Internationalized_domain_name [`Intl`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Intl From 1ac14915d87cc1b353a28e1872e5dff56c908931 Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Tue, 27 Jun 2017 09:43:04 +0800 Subject: [PATCH 3/6] use relative links to api docs Looks like this is staying in node/node. --- doc/api/intl.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/api/intl.md b/doc/api/intl.md index fd8324200f1f54..821818efd2f6cf 100644 --- a/doc/api/intl.md +++ b/doc/api/intl.md @@ -191,16 +191,16 @@ to be helpful: [full-icu]: https://www.npmjs.com/package/full-icu [ICU]: http://icu-project.org/ ["ICU Data"]: http://userguide.icu-project.org/icudata -[`--icu-data-dir`]: https://nodejs.org/api/cli.html#cli_icu_data_dir_file +[`--icu-data-dir`]: cli.html#cli_icu_data_dir_file [internationalized domain names]: https://en.wikipedia.org/wiki/Internationalized_domain_name [`Intl`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Intl [`Intl.DateTimeFormat`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat -[`NODE_ICU_DATA`]: https://nodejs.org/api/cli.html#cli_node_icu_data_file +[`NODE_ICU_DATA`]: cli.html#cli_node_icu_data_file [`Number.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString -[`require('buffer').transcode()`]: https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc +[`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc [`String.prototype.localeCompare()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare [`String.prototype.normalize()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/normalize [`String.prototype.toLowerCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toLowerCase [`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase [Test262]: https://github.com/tc39/test262/tree/master/test/intl402 -[WHATWG URL parser]: https://nodejs.org/api/url.html#url_the_whatwg_url_api +[WHATWG URL parser]: url.html#url_the_whatwg_url_api From 379d7b3c4fd466f4882a0aa28aa30ec6ecbfdf99 Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Tue, 27 Jun 2017 09:43:40 +0800 Subject: [PATCH 4/6] address vsemozhetbyt's comment --- doc/api/all.md | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/api/all.md b/doc/api/all.md index 7122fe96fa0128..24eda32f44d3b5 100644 --- a/doc/api/all.md +++ b/doc/api/all.md @@ -21,6 +21,7 @@ @include http @include https @include inspector +@include intl @include modules @include net @include os From 36e3695d9c1e3a6ac2466c4cfbc65a053c867f95 Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Tue, 27 Jun 2017 10:24:21 +0800 Subject: [PATCH 5/6] mention repl line editing --- doc/api/intl.md | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/doc/api/intl.md b/doc/api/intl.md index 821818efd2f6cf..41aa0879309b18 100644 --- a/doc/api/intl.md +++ b/doc/api/intl.md @@ -15,6 +15,7 @@ programs. Some of them are: [`Date.prototype.toLocaleString()`][] - The [WHATWG URL parser][]'s [internationalized domain names][] (IDNs) support - [`require('buffer').transcode()`][] +- More accurate [REPL][] line editing Node.js (and its underlying V8 engine) uses [ICU][] to implement these features in native C/C++ code. However, some of them require a very large ICU data file @@ -38,17 +39,18 @@ in [BUILDING.md][]. An overview of available Node.js and JavaScript features for each `configure` option: -| | `none` | `system-icu` | `small-icu` | `full-icu` -|-----------------------------------------|--------------------------------|------------------------------|------------------------|------------ -| [`String.prototype.normalize()`][] | none (function is no-op) | full | full | full -| `String.prototype.to*Case()` | full | full | full | full -| [`Intl`][] | none (object does not exist) | partial/full (depends on OS) | partial (English-only) | full -| [`String.prototype.localeCompare()`][] | partial (not locale-aware) | full | full | full -| `String.prototype.toLocale*Case()` | partial (not locale-aware) | full | full | full -| [`Number.prototype.toLocaleString()`][] | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full -| `Date.prototype.toLocale*String()` | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full -| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full -| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full +| | `none` | `system-icu` | `small-icu` | `full-icu` +|-----------------------------------------|-----------------------------------|------------------------------|------------------------|------------ +| [`String.prototype.normalize()`][] | none (function is no-op) | full | full | full +| `String.prototype.to*Case()` | full | full | full | full +| [`Intl`][] | none (object does not exist) | partial/full (depends on OS) | partial (English-only) | full +| [`String.prototype.localeCompare()`][] | partial (not locale-aware) | full | full | full +| `String.prototype.toLocale*Case()` | partial (not locale-aware) | full | full | full +| [`Number.prototype.toLocaleString()`][] | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full +| `Date.prototype.toLocale*String()` | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full +| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full +| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full +| [REPL][] | partial (inaccurate line editing) | full | full | full *Note*: The "(not locale-aware)" designation denotes that the function carries out its operation just like the non-`Locale` version of the function, if one @@ -197,6 +199,7 @@ to be helpful: [`Intl.DateTimeFormat`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat [`NODE_ICU_DATA`]: cli.html#cli_node_icu_data_file [`Number.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString +[REPL]: repl.html#repl_repl [`require('buffer').transcode()`]: buffer.html#buffer_buffer_transcode_source_fromenc_toenc [`String.prototype.localeCompare()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare [`String.prototype.normalize()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/normalize From 10beb9cbc84e403478caabeead5f2c20692735bb Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Wed, 5 Jul 2017 13:33:56 +0800 Subject: [PATCH 6/6] Adjust for linting --- doc/api/intl.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/api/intl.md b/doc/api/intl.md index 41aa0879309b18..c0d6b82289264f 100644 --- a/doc/api/intl.md +++ b/doc/api/intl.md @@ -93,10 +93,10 @@ const english = new Intl.DateTimeFormat('en', { month: 'long' }); const spanish = new Intl.DateTimeFormat('es', { month: 'long' }); console.log(english.format(january)); - // Prints "January" +// Prints "January" console.log(spanish.format(january)); - // Prints "M01" on small-icu - // Should print "enero" +// Prints "M01" on small-icu +// Should print "enero" ``` This mode provides a good balance between features and binary size, and it is