Skip to content
This repository has been archived by the owner on Jun 5, 2020. It is now read-only.
Daniel Wirtz edited this page Jun 26, 2014 · 35 revisions

Welcome to the utfx wiki!

FAQ

  • What's wrong with using binary strings?
    There are two considerations to make when using binary strings. The first is that, in current JS engines, each 8bit value (UTF8 generates 1 to 4 for each code point) will require 16bit of space in memory. The second is that when the binary string has to be post-processed (e.g. written to a buffer), the memory overhead will nearly double until the garbage collector has cleaned the intermediate binary string.

  • What's wrong with using plain arrays?
    Just like with binary strings, arrays store the values as JavaScript numbers. The internal representation of these may vary depending on the JS engine used. Assuming that JS numbers wrap a 32bit value (as long as it's not a double), it's even worse than using binary strings (please correct me if this is wrong).

  • So, what's the ideal thing to do?
    Like when writing your own highly use case specific encoder and decoder, the ideal thing to do is to process code points respectively bytes as they are processed, basically eliminating any memory overhead. With utfx this is achieved by providing sources and/or destinations as functions where appropriate.

Examples

Using array and string arguments

  • Converting a standard JavaScript string to UTF8 code points:

    var string = ...;
    var codepoints = []; utfx.UTF16toUTF8(string, codepoints);
  • Decoding an array of UTF8 bytes to UTF8 code points:

    var bytes = [...];
    var codepoints = []; utfx.decodeUTF8(bytes, codepoints);
  • Converting and encoding a standard JavaScript string as UTF8 bytes:

    var string = ...;
    var bytes = []; utfx.UTF16toUTF8Bytes(string, bytes);
  • Decoding and converting an array of UTF8 bytes to a standard JavaScript string:

    var bytes = [...];
    var string = utfx.UTF8BytesToUTF16(bytes);

Using source and destination functions

  • Converting an arbitrary input source of UTF16 characters to an arbitrary output destination of UTF8 code points:

    var string = ..., i = 0;
    utfx.UTF16toUTF8(function() {
        return i < string.length ? string.charCodeAt(i++) : null;
    }, function(cp) {
        ...
    });
  • Encoding an arbitrary input source of UTF8 code points to an arbitrary output destination of UTF8 bytes:

    var codepoints = [...], i = 0;
    utfx.encodeUTF8(function() {
        return i < codepoints.length ? codepoints[i++] : null;
    }, function(b) {
        ...
    });


* [FAQ](./#faq) * [Examples](./#examples)
Clone this wiki locally