-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue of toString() function #30
Comments
the code I run test is actually not the above, but an old version of it which does not use string concatenation /**
* Reads UTF8 bytes as a string.
* @param {Uint8Array} buffer Source buffer
* @param {number} start Source start
* @param {number} end Source end
* @returns {string} String read
*/
utf8.read = function utf8_read(buffer, start, end) {
var len = end - start;
if (len < 1)
return "";
var parts = null,
chunk = [],
i = 0, // char offset
t; // temporary
while (start < end) {
t = buffer[start++];
if (t < 128)
chunk[i++] = t;
else if (t > 191 && t < 224)
chunk[i++] = (t & 31) << 6 | buffer[start++] & 63;
else if (t > 239 && t < 365) {
t = ((t & 7) << 18 | (buffer[start++] & 63) << 12 | (buffer[start++] & 63) << 6 | buffer[start++] & 63) - 0x10000;
chunk[i++] = 0xD800 + (t >> 10);
chunk[i++] = 0xDC00 + (t & 1023);
} else
chunk[i++] = (t & 15) << 12 | (buffer[start++] & 63) << 6 | buffer[start++] & 63;
if (i > 8191) {
(parts || (parts = [])).push(String.fromCharCode.apply(String, chunk));
i = 0;
}
}
if (parts) {
if (i)
parts.push(String.fromCharCode.apply(String, chunk.slice(0, i)));
return parts.join("");
}
return String.fromCharCode.apply(String, chunk.slice(0, i));
}; |
Internally this uses plain-js TextDecoder. Possibly related: nodejs/node#39879 Another approach may be to create a node Can you please share your benchmark as runnable code (a gist will do) so we can experiment? |
I've added some benchmarks here, the results are.. eye opening: #32 |
Please can you try again with If you're still seeing a big discrepancy can you please share your benchmark code? |
Closing as the performance issue has been resolved, please reopen if you are still seeing it. This module is now only a little bit slower than using node Buffers, likely because of the compatibility requirements in #39 % node benchmarks/from-string.js
Uint8Arrays.fromString x 2,011,648 ops/sec ±0.54% (86 runs sampled)
TextEncoder x 1,368,021 ops/sec ±2.22% (81 runs sampled)
Buffer.from x 2,200,960 ops/sec ±4.02% (88 runs sampled)
Fastest is Buffer.from % node benchmarks/to-string.js
Uint8Arrays.toString x 3,138,022 ops/sec ±0.52% (93 runs sampled)
TextDecoder x 425,713 ops/sec ±3.29% (79 runs sampled)
utf8ReadFromCharCode x 2,065,098 ops/sec ±1.19% (92 runs sampled)
utf8ReadConcat x 1,162,056 ops/sec ±0.68% (92 runs sampled)
Buffer.toString x 3,509,065 ops/sec ±0.79% (87 runs sampled)
Fastest is Buffer.toString |
// https://github.com/feross/buffer/blob/master/index.js#L1035
const MAX_ARGUMENTS_LENGTH = 0x10000
function decodeCodePointsArray (codePoints) {
const len = codePoints.length
if (len <= MAX_ARGUMENTS_LENGTH) {
return String.fromCharCode(...codePoints)
}
let res = ''
let i = 0
while (i < len) {
res += String.fromCharCode(...codePoints.slice(i, i += MAX_ARGUMENTS_LENGTH))
}
return res
}
// https://github.com/achingbrain/uint8arrays/issues/30#issuecomment-1199120924
// https://github.com/anonrig/undici/blob/494d27782ad2df68e2e61603914aa0a2dce0f5c8/lib/fetch/util.js#L872
function utf8Spread (buffer, start = 0, end = buffer.byteLength) {
if (end - start < 1) {
return ''
}
let arr = new Array(end - start)
let j = 0
for (let i = start; i < end;) {
const t = buffer[i++]
if (t <= 0x7F) {
arr[j++] = t
} else if (t >= 0xC0 && t < 0xE0) {
arr[j++] = (t & 0x1F) << 6 | buffer[i++] & 0x3F
} else if (t >= 0xE0 && t < 0xF0) {
arr[j++] = (t & 0xF) << 12 | (buffer[i++] & 0x3F) << 6 | buffer[i++] & 0x3F
} else if (t >= 0xF0) {
const t2 = ((t & 7) << 18 | (buffer[i++] & 0x3F) << 12 | (buffer[i++] & 0x3F) << 6 | buffer[i++] & 0x3F) - 0x10000
arr[j++] = 0xD800 + (t2 >> 10)
arr[j++] = 0xDC00 + (t2 & 0x3FF)
}
}
arr = arr.slice(0, j + 1)
return decodeCodePointsArray(arr)
}
EDIT: i used your benchmark code which constructed a new text decoder every time, which was the bottleneck notably on node 18+ async is stupidly fast on big strings: await new Response(DATA).text() |
@achingbrain Maybe you also want to re-using the same TextDecoder in your testcase? uint8arrays/benchmarks/to-string.js Line 78 in 4264165
- const res = new TextDecoder().decode(DATA)
+ const res = textDecoder.decode(DATA) |
Motivation
This causes performance issue in
protons
, see ipfs/protons#51 (comment)Description
Benchmark shows that
toString()
takes > 10x times to convert to a utf8 string, compared to protobufreference implementation: https://github.com/protobufjs/protobuf.js/blob/6f0806ddef408647fbaed049dbd8929ad9f5c10c/lib/utf8/index.js#L40-L62
The text was updated successfully, but these errors were encountered: