Skip to content

Commit

Permalink
buffer: introduce latin1 encoding term
Browse files Browse the repository at this point in the history
When node began using the OneByte API (f150d56) it also switched to
officially supporting ISO-8859-1. Though at the time no new encoding
string was introduced.

Introduce the new encoding string 'latin1' to be more explicit. The
previous 'binary' and documented as an alias to 'latin1'.  While many
tests have switched to use 'latin1', there are still plenty that do both
'binary' and 'latin1' checks side-by-side to ensure there is no
regression.

PR-URL: #7111
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
  • Loading branch information
trevnorris authored and addaleax committed Aug 8, 2016
1 parent 75b37a6 commit 28071a1
Show file tree
Hide file tree
Showing 37 changed files with 246 additions and 124 deletions.
16 changes: 13 additions & 3 deletions doc/api/buffer.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,22 @@ The character encodings currently supported by Node.js include:
this encoding will also correctly accept "URL and Filename Safe Alphabet" as
specified in [RFC 4648, Section 5].

* `'binary'` - A way of encoding the buffer into a one-byte (`latin-1`)
encoded string. The string `'latin-1'` is not supported. Instead, pass
`'binary'` to use `'latin-1'` encoding.
* `'latin1'` - A way of encoding the buffer into a one-byte encoded string
(as defined by the IANA in [RFC1345](https://tools.ietf.org/html/rfc1345),
page 63, to be the Latin-1 supplement block and C0/C1 control codes).

* `'binary'` - (deprecated) A way of encoding the buffer into a one-byte
(`latin1`) encoded string.

* `'hex'` - Encode each byte as two hexadecimal characters.

_Note_: Today's browsers follow the [WHATWG
spec](https://encoding.spec.whatwg.org/) that aliases both `latin1` and
`iso-8859-1` to `win-1252`. Meaning, while doing something like `http.get()`,
if the returned charset is one of those listed in the WHATWG spec it's possible
that the server actually returned `win-1252` encoded data, and using `latin1`
encoding may incorrectly decode the graphical characters.

## Buffers and TypedArray

Buffers are also `Uint8Array` TypedArray instances. However, there are subtle
Expand Down
68 changes: 34 additions & 34 deletions doc/api/crypto.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ console.log(encrypted);
### cipher.final([output_encoding])

Returns any remaining enciphered contents. If `output_encoding`
parameter is one of `'binary'`, `'base64'` or `'hex'`, a string is returned.
parameter is one of `'latin1'`, `'base64'` or `'hex'`, a string is returned.
If an `output_encoding` is not provided, a [`Buffer`][] is returned.

Once the `cipher.final()` method has been called, the `Cipher` object can no
Expand Down Expand Up @@ -198,13 +198,13 @@ The `cipher.setAutoPadding()` method must be called before [`cipher.final()`][].
### cipher.update(data[, input_encoding][, output_encoding])

Updates the cipher with `data`. If the `input_encoding` argument is given,
it's value must be one of `'utf8'`, `'ascii'`, or `'binary'` and the `data`
it's value must be one of `'utf8'`, `'ascii'`, or `'latin1'` and the `data`
argument is a string using the specified encoding. If the `input_encoding`
argument is not given, `data` must be a [`Buffer`][]. If `data` is a
[`Buffer`][] then `input_encoding` is ignored.

The `output_encoding` specifies the output format of the enciphered
data, and can be `'binary'`, `'base64'` or `'hex'`. If the `output_encoding`
data, and can be `'latin1'`, `'base64'` or `'hex'`. If the `output_encoding`
is specified, a string using the specified encoding is returned. If no
`output_encoding` is provided, a [`Buffer`][] is returned.

Expand Down Expand Up @@ -277,7 +277,7 @@ console.log(decrypted);
### decipher.final([output_encoding])

Returns any remaining deciphered contents. If `output_encoding`
parameter is one of `'binary'`, `'base64'` or `'hex'`, a string is returned.
parameter is one of `'latin1'`, `'base64'` or `'hex'`, a string is returned.
If an `output_encoding` is not provided, a [`Buffer`][] is returned.

Once the `decipher.final()` method has been called, the `Decipher` object can
Expand Down Expand Up @@ -313,13 +313,13 @@ The `decipher.setAutoPadding()` method must be called before
### decipher.update(data[, input_encoding][, output_encoding])

Updates the decipher with `data`. If the `input_encoding` argument is given,
it's value must be one of `'binary'`, `'base64'`, or `'hex'` and the `data`
it's value must be one of `'latin1'`, `'base64'`, or `'hex'` and the `data`
argument is a string using the specified encoding. If the `input_encoding`
argument is not given, `data` must be a [`Buffer`][]. If `data` is a
[`Buffer`][] then `input_encoding` is ignored.

The `output_encoding` specifies the output format of the enciphered
data, and can be `'binary'`, `'ascii'` or `'utf8'`. If the `output_encoding`
data, and can be `'latin1'`, `'ascii'` or `'utf8'`. If the `output_encoding`
is specified, a string using the specified encoding is returned. If no
`output_encoding` is provided, a [`Buffer`][] is returned.

Expand Down Expand Up @@ -361,7 +361,7 @@ Computes the shared secret using `other_public_key` as the other
party's public key and returns the computed shared secret. The supplied
key is interpreted using the specified `input_encoding`, and secret is
encoded using specified `output_encoding`. Encodings can be
`'binary'`, `'hex'`, or `'base64'`. If the `input_encoding` is not
`'latin1'`, `'hex'`, or `'base64'`. If the `input_encoding` is not
provided, `other_public_key` is expected to be a [`Buffer`][].

If `output_encoding` is given a string is returned; otherwise, a
Expand All @@ -371,45 +371,45 @@ If `output_encoding` is given a string is returned; otherwise, a

Generates private and public Diffie-Hellman key values, and returns
the public key in the specified `encoding`. This key should be
transferred to the other party. Encoding can be `'binary'`, `'hex'`,
transferred to the other party. Encoding can be `'latin1'`, `'hex'`,
or `'base64'`. If `encoding` is provided a string is returned; otherwise a
[`Buffer`][] is returned.

### diffieHellman.getGenerator([encoding])

Returns the Diffie-Hellman generator in the specified `encoding`, which can
be `'binary'`, `'hex'`, or `'base64'`. If `encoding` is provided a string is
be `'latin1'`, `'hex'`, or `'base64'`. If `encoding` is provided a string is
returned; otherwise a [`Buffer`][] is returned.

### diffieHellman.getPrime([encoding])

Returns the Diffie-Hellman prime in the specified `encoding`, which can
be `'binary'`, `'hex'`, or `'base64'`. If `encoding` is provided a string is
be `'latin1'`, `'hex'`, or `'base64'`. If `encoding` is provided a string is
returned; otherwise a [`Buffer`][] is returned.

### diffieHellman.getPrivateKey([encoding])

Returns the Diffie-Hellman private key in the specified `encoding`,
which can be `'binary'`, `'hex'`, or `'base64'`. If `encoding` is provided a
which can be `'latin1'`, `'hex'`, or `'base64'`. If `encoding` is provided a
string is returned; otherwise a [`Buffer`][] is returned.

### diffieHellman.getPublicKey([encoding])

Returns the Diffie-Hellman public key in the specified `encoding`, which
can be `'binary'`, `'hex'`, or `'base64'`. If `encoding` is provided a
can be `'latin1'`, `'hex'`, or `'base64'`. If `encoding` is provided a
string is returned; otherwise a [`Buffer`][] is returned.

### diffieHellman.setPrivateKey(private_key[, encoding])

Sets the Diffie-Hellman private key. If the `encoding` argument is provided
and is either `'binary'`, `'hex'`, or `'base64'`, `private_key` is expected
and is either `'latin1'`, `'hex'`, or `'base64'`, `private_key` is expected
to be a string. If no `encoding` is provided, `private_key` is expected
to be a [`Buffer`][].

### diffieHellman.setPublicKey(public_key[, encoding])

Sets the Diffie-Hellman public key. If the `encoding` argument is provided
and is either `'binary'`, `'hex'` or `'base64'`, `public_key` is expected
and is either `'latin1'`, `'hex'` or `'base64'`, `public_key` is expected
to be a string. If no `encoding` is provided, `public_key` is expected
to be a [`Buffer`][].

Expand Down Expand Up @@ -460,7 +460,7 @@ Computes the shared secret using `other_public_key` as the other
party's public key and returns the computed shared secret. The supplied
key is interpreted using specified `input_encoding`, and the returned secret
is encoded using the specified `output_encoding`. Encodings can be
`'binary'`, `'hex'`, or `'base64'`. If the `input_encoding` is not
`'latin1'`, `'hex'`, or `'base64'`. If the `input_encoding` is not
provided, `other_public_key` is expected to be a [`Buffer`][].

If `output_encoding` is given a string will be returned; otherwise a
Expand All @@ -476,14 +476,14 @@ The `format` arguments specifies point encoding and can be `'compressed'`,
`'uncompressed'`, or `'hybrid'`. If `format` is not specified, the point will
be returned in `'uncompressed'` format.

The `encoding` argument can be `'binary'`, `'hex'`, or `'base64'`. If
The `encoding` argument can be `'latin1'`, `'hex'`, or `'base64'`. If
`encoding` is provided a string is returned; otherwise a [`Buffer`][]
is returned.

### ecdh.getPrivateKey([encoding])

Returns the EC Diffie-Hellman private key in the specified `encoding`,
which can be `'binary'`, `'hex'`, or `'base64'`. If `encoding` is provided
which can be `'latin1'`, `'hex'`, or `'base64'`. If `encoding` is provided
a string is returned; otherwise a [`Buffer`][] is returned.

### ecdh.getPublicKey([encoding[, format]])
Expand All @@ -495,13 +495,13 @@ The `format` argument specifies point encoding and can be `'compressed'`,
`'uncompressed'`, or `'hybrid'`. If `format` is not specified the point will be
returned in `'uncompressed'` format.

The `encoding` argument can be `'binary'`, `'hex'`, or `'base64'`. If
The `encoding` argument can be `'latin1'`, `'hex'`, or `'base64'`. If
`encoding` is specified, a string is returned; otherwise a [`Buffer`][] is
returned.

### ecdh.setPrivateKey(private_key[, encoding])

Sets the EC Diffie-Hellman private key. The `encoding` can be `'binary'`,
Sets the EC Diffie-Hellman private key. The `encoding` can be `'latin1'`,
`'hex'` or `'base64'`. If `encoding` is provided, `private_key` is expected
to be a string; otherwise `private_key` is expected to be a [`Buffer`][]. If
`private_key` is not valid for the curve specified when the `ECDH` object was
Expand All @@ -512,7 +512,7 @@ public point (key) is also generated and set in the ECDH object.

Stability: 0 - Deprecated

Sets the EC Diffie-Hellman public key. Key encoding can be `'binary'`,
Sets the EC Diffie-Hellman public key. Key encoding can be `'latin1'`,
`'hex'` or `'base64'`. If `encoding` is provided `public_key` is expected to
be a string; otherwise a [`Buffer`][] is expected.

Expand Down Expand Up @@ -604,7 +604,7 @@ console.log(hash.digest('hex'));
### hash.digest([encoding])

Calculates the digest of all of the data passed to be hashed (using the
[`hash.update()`][] method). The `encoding` can be `'hex'`, `'binary'` or
[`hash.update()`][] method). The `encoding` can be `'hex'`, `'latin1'` or
`'base64'`. If `encoding` is provided a string will be returned; otherwise
a [`Buffer`][] is returned.

Expand All @@ -615,7 +615,7 @@ called. Multiple calls will cause an error to be thrown.

Updates the hash content with the given `data`, the encoding of which
is given in `input_encoding` and can be `'utf8'`, `'ascii'` or
`'binary'`. If `encoding` is not provided, and the `data` is a string, an
`'latin1'`. If `encoding` is not provided, and the `data` is a string, an
encoding of `'utf8'` is enforced. If `data` is a [`Buffer`][] then
`input_encoding` is ignored.

Expand Down Expand Up @@ -678,7 +678,7 @@ console.log(hmac.digest('hex'));
### hmac.digest([encoding])

Calculates the HMAC digest of all of the data passed using [`hmac.update()`][].
The `encoding` can be `'hex'`, `'binary'` or `'base64'`. If `encoding` is
The `encoding` can be `'hex'`, `'latin1'` or `'base64'`. If `encoding` is
provided a string is returned; otherwise a [`Buffer`][] is returned;

The `Hmac` object can not be used again after `hmac.digest()` has been
Expand All @@ -688,7 +688,7 @@ called. Multiple calls to `hmac.digest()` will result in an error being thrown.

Updates the `Hmac` content with the given `data`, the encoding of which
is given in `input_encoding` and can be `'utf8'`, `'ascii'` or
`'binary'`. If `encoding` is not provided, and the `data` is a string, an
`'latin1'`. If `encoding` is not provided, and the `data` is a string, an
encoding of `'utf8'` is enforced. If `data` is a [`Buffer`][] then
`input_encoding` is ignored.

Expand Down Expand Up @@ -768,7 +768,7 @@ object, it is interpreted as a hash containing two properties:
* `key` : {String} - PEM encoded private key
* `passphrase` : {String} - passphrase for the private key

The `output_format` can specify one of `'binary'`, `'hex'` or `'base64'`. If
The `output_format` can specify one of `'latin1'`, `'hex'` or `'base64'`. If
`output_format` is provided a string is returned; otherwise a [`Buffer`][] is
returned.

Expand All @@ -779,7 +779,7 @@ called. Multiple calls to `sign.sign()` will result in an error being thrown.

Updates the `Sign` content with the given `data`, the encoding of which
is given in `input_encoding` and can be `'utf8'`, `'ascii'` or
`'binary'`. If `encoding` is not provided, and the `data` is a string, an
`'latin1'`. If `encoding` is not provided, and the `data` is a string, an
encoding of `'utf8'` is enforced. If `data` is a [`Buffer`][] then
`input_encoding` is ignored.

Expand Down Expand Up @@ -831,7 +831,7 @@ console.log(verify.verify(public_key, signature));

Updates the `Verify` content with the given `data`, the encoding of which
is given in `input_encoding` and can be `'utf8'`, `'ascii'` or
`'binary'`. If `encoding` is not provided, and the `data` is a string, an
`'latin1'`. If `encoding` is not provided, and the `data` is a string, an
encoding of `'utf8'` is enforced. If `data` is a [`Buffer`][] then
`input_encoding` is ignored.

Expand All @@ -843,7 +843,7 @@ Verifies the provided data using the given `object` and `signature`.
The `object` argument is a string containing a PEM encoded object, which can be
one an RSA public key, a DSA public key, or an X.509 certificate.
The `signature` argument is the previously calculated signature for the data, in
the `signature_format` which can be `'binary'`, `'hex'` or `'base64'`.
the `signature_format` which can be `'latin1'`, `'hex'` or `'base64'`.
If a `signature_format` is specified, the `signature` is expected to be a
string; otherwise `signature` is expected to be a [`Buffer`][].

Expand All @@ -869,7 +869,7 @@ or [buffers][`Buffer`]. The default value is `'buffer'`, which makes methods
default to [`Buffer`][] objects.

The `crypto.DEFAULT_ENCODING` mechanism is provided for backwards compatibility
with legacy programs that expect `'binary'` to be the default encoding.
with legacy programs that expect `'latin1'` to be the default encoding.

New applications should expect the default to be `'buffer'`. This property may
become deprecated in a future Node.js release.
Expand All @@ -889,7 +889,7 @@ recent OpenSSL releases, `openssl list-cipher-algorithms` will display the
available cipher algorithms.

The `password` is used to derive the cipher key and initialization vector (IV).
The value must be either a `'binary'` encoded string or a [`Buffer`][].
The value must be either a `'latin1'` encoded string or a [`Buffer`][].

The implementation of `crypto.createCipher()` derives keys using the OpenSSL
function [`EVP_BytesToKey`][] with the digest algorithm set to MD5, one
Expand All @@ -913,7 +913,7 @@ recent OpenSSL releases, `openssl list-cipher-algorithms` will display the
available cipher algorithms.

The `key` is the raw key used by the `algorithm` and `iv` is an
[initialization vector][]. Both arguments must be `'binary'` encoded strings or
[initialization vector][]. Both arguments must be `'latin1'` encoded strings or
[buffers][`Buffer`].

### crypto.createCredentials(details)
Expand Down Expand Up @@ -968,7 +968,7 @@ recent OpenSSL releases, `openssl list-cipher-algorithms` will display the
available cipher algorithms.

The `key` is the raw key used by the `algorithm` and `iv` is an
[initialization vector][]. Both arguments must be `'binary'` encoded strings or
[initialization vector][]. Both arguments must be `'latin1'` encoded strings or
[buffers][`Buffer`].

### crypto.createDiffieHellman(prime[, prime_encoding][, generator][, generator_encoding])
Expand All @@ -979,7 +979,7 @@ optional specific `generator`.
The `generator` argument can be a number, string, or [`Buffer`][]. If
`generator` is not specified, the value `2` is used.

The `prime_encoding` and `generator_encoding` arguments can be `'binary'`,
The `prime_encoding` and `generator_encoding` arguments can be `'latin1'`,
`'hex'`, or `'base64'`.

If `prime_encoding` is specified, `prime` is expected to be a string; otherwise
Expand Down Expand Up @@ -1345,7 +1345,7 @@ unified Stream API, and before there were [`Buffer`][] objects for handling
binary data. As such, the many of the `crypto` defined classes have methods not
typically found on other Node.js classes that implement the [streams][stream]
API (e.g. `update()`, `final()`, or `digest()`). Also, many methods accepted
and returned `'binary'` encoded strings by default rather than Buffers. This
and returned `'latin1'` encoded strings by default rather than Buffers. This
default was changed after Node.js v0.8 to use [`Buffer`][] objects by default
instead.

Expand Down
10 changes: 5 additions & 5 deletions lib/_http_outgoing.js
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ OutgoingMessage.prototype._send = function(data, encoding, callback) {
data = this._header + data;
} else {
this.output.unshift(this._header);
this.outputEncodings.unshift('binary');
this.outputEncodings.unshift('latin1');
this.outputCallbacks.unshift(null);
this.outputSize += this._header.length;
if (typeof this._onPendingData === 'function')
Expand Down Expand Up @@ -453,7 +453,7 @@ OutgoingMessage.prototype.write = function(chunk, encoding, callback) {
if (typeof chunk === 'string' &&
encoding !== 'hex' &&
encoding !== 'base64' &&
encoding !== 'binary') {
encoding !== 'latin1') {
len = Buffer.byteLength(chunk, encoding);
chunk = len.toString(16) + CRLF + chunk + CRLF;
ret = this._send(chunk, encoding, callback);
Expand All @@ -468,7 +468,7 @@ OutgoingMessage.prototype.write = function(chunk, encoding, callback) {
this.connection.cork();
process.nextTick(connectionCorkNT, this.connection);
}
this._send(len.toString(16), 'binary', null);
this._send(len.toString(16), 'latin1', null);
this._send(crlf_buf, null, null);
this._send(chunk, encoding, null);
ret = this._send(crlf_buf, null, callback);
Expand Down Expand Up @@ -581,10 +581,10 @@ OutgoingMessage.prototype.end = function(data, encoding, callback) {
};

if (this._hasBody && this.chunkedEncoding) {
ret = this._send('0\r\n' + this._trailer + '\r\n', 'binary', finish);
ret = this._send('0\r\n' + this._trailer + '\r\n', 'latin1', finish);
} else {
// Force a flush, HACK.
ret = this._send('', 'binary', finish);
ret = this._send('', 'latin1', finish);
}

if (this.connection && data)
Expand Down
2 changes: 1 addition & 1 deletion lib/_tls_wrap.js
Original file line number Diff line number Diff line change
Expand Up @@ -608,7 +608,7 @@ TLSSocket.prototype.setServername = function(name) {

TLSSocket.prototype.setSession = function(session) {
if (typeof session === 'string')
session = Buffer.from(session, 'binary');
session = Buffer.from(session, 'latin1');
this._handle.setSession(session);
};

Expand Down
Loading

1 comment on commit 28071a1

@Fishrock123
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR-URL: #8022

Please sign in to comment.