utf-8

Encode/decode UTF8.

As you probably noticed, there is already standards ways to encode/decode utf-8 strings into buffers in the NodeJS standard library.

This library can be useful if you need to write at given buffer indexes or to validate utf-8 encoded buffers.

Otherwise, use NodeJS standard library.

Usage

npm install utf-8

Encoding

A char:

import * as UTF8 from 'utf-8';

UTF8.setBytesFromCharCode('é'.charCodeAt(0));
// [0xC3, 0xA9]

A string:

UTF8.setBytesFromString('1.3$ ~= 1€');
// [49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]

Decoding

A char:

String.fromCharCode(UTF8.getCharCode([0xc3, 0xa9]));
// 'é'

A string:

UTF8.getStringFromBytes([49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172]);
// '1.3$ ~= 1€'

TypedArrays are welcome

As inputs:

const bytes = new Uint8Array([
  0xc3, 0xa9, 49, 46, 51, 36, 32, 126, 61, 32, 49, 226, 130, 172,
]);

// The first char
String.fromCharCode(UTF8.getCharCode(bytes));
// é

// The following string at the offset 2
UTF8.getStringFromBytes(bytes, 2);
// '1.3$ ~= 1€'

As well as outputs :

const bytes = new Uint8Array(14);

// First encoding a char
UTF8.setBytesFromCharCode('é'.charCodeAt(0));

// Then encoding a string
UTF8.setBytesFromString('1.3$ ~= 1€', 2);

UTF8 encoding detection

UTF8.isNotUTF8(bytes);
// true | false

This function can prove the text contained by the given bytes is not UTF-8 (or badly encoded UTF-8 string). It's not reciprocally true, especially for short strings with which false positives are frequent.

Strict mode

If you try to encode an UTF8 string in an ArrayBuffer too short to contain the complete string, it will silently fail. To avoid this behavior, use the strict mode :

UTF8.setBytesFromString('1.3$ ~= 1€', 2, null, true);

Thanks

The Debian project for it's free (as freedom) russian/japanese man pages used for real world files tests!

Authors

Nicolas Froidure

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
.vscode		.vscode
fixtures		fixtures
src		src
.codeclimate.yml		.codeclimate.yml
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

utf-8

Usage

Encoding

Decoding

TypedArrays are welcome

UTF8 encoding detection

Strict mode

Thanks

Authors

License

About

Releases

Sponsor this project

Packages

Contributors 5

Languages

License

nfroidure/utf-8

Folders and files

Latest commit

History

Repository files navigation

utf-8

Usage

Encoding

Decoding

TypedArrays are welcome

UTF8 encoding detection

Strict mode

Thanks

Authors

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 5

Languages

Packages