Optimizing hashing performance #136

fabiospampinato · 2023-11-25T18:24:01Z

What is the problem this feature will solve?

Making the hash functions significantly faster.

What is the feature you are proposing to solve the problem?

I'm not sure what the best option for this is, but running the following file:

import crypto from 'node:crypto';

const value = '/Users/fabio/path/to/some/random/file.js';

console.time ( 'crypto' );
for ( let i = 0; i < 100_000; i++ ) {
  crypto.createHash("sha1").update(value).digest("hex");
}
console.timeEnd ( 'crypto' );

I see the following output:

~ ❯ node hashbench.js
crypto: 106.557ms
~ ❯ node hashbench.js
crypto: 108.052ms
~ ❯ node hashbench.js
crypto: 107.642ms
~ ❯ node hashbench.js
crypto: 107.136ms
~ ❯ node hashbench.js
crypto: 107.747ms
~ ❯ bun hashbench.js
[29.98ms] crypto
~ ❯ bun hashbench.js
[33.87ms] crypto
~ ❯ bun hashbench.js
[30.74ms] crypto
~ ❯ bun hashbench.js
[30.78ms] crypto
~ ❯ bun hashbench.js
[30.12ms] crypto

Basically Node's sha1 function seems at least 3x slower than Bun's.

Hashing is at the core of many important things, so I'd argue it's important to hash as fast as possible since that would speed up a cascade of use cases.

I'm not sure what the best solution is here, but if Bun is 3x faster than Node here presumably there's a lot of room for improvement.

What alternatives have you considered?

No response

H4ad · 2023-11-25T21:18:34Z

I spent a few minutes trying to investigate this and here are some numbers.

Bun

[39.36ms] crypto

Node.js 16.20.1

crypto: 85.105ms

Node.js 18.18.1

crypto: 105.906ms

Node.js 20.9.0

crypto: 91.476ms

Node.js 21.1.0

crypto: 90.148ms

About bun, the update function is handled by CryptoDigest.cpp.

Apparently, bun uses openssl 1.x, Node.js 16.x also uses that same version, after 16.x Node.js started using openssl 3.x

Instead of EVP_* functions (code), bun uses SHA1 Functions (code).

I updated the benchmark to:

const crypto = require('node:crypto');

const value = '/Users/fabio/path/to/some/random/file.js';
const hash = crypto.createHash("sha1");

const start = performance.now();
hash.update(value)
console.log(`End: ${performance.now() - start}`);

Bun: 0.0250780000000006ms
Node.js 21.1.0: 0.03838299959897995ms
Node.js 21.1.0 (with %OptimizeFunctionOnNextCall(hash.update)): 0.03241100162267685ms

So, even calling only update is slower than bun, so maybe using SHA1_UPDATE instead of EVP_DigestUpdate can help.

Uzlopak · 2023-11-25T21:26:04Z

But is is even recommendable to use non EVP Calls?

Uzlopak · 2023-11-26T02:14:13Z

Btw. It is crazy how slow it is to sha256 hmac a string. It is something i cant optimize in probot and results in about 6k req/s performance of a probot server.

H4ad · 2023-11-26T13:25:03Z

But is is even recommendable to use non EVP Calls?

I don't have any idea, in the documentation page, they deprecated some overloads of SHA1 functions but they still support it if you change the arguments to pass context, data, length.

So, I think is safe to use.

Btw. It is crazy how slow it is to sha256 hmac a string. It is something i cant optimize in probot and results in about 6k req/s performance of a probot server.

Once I cached all the string and then performed only one update and digest, this improved the performance significantly.

tniessen · 2023-11-28T10:29:28Z

Apparently, bun uses openssl 1.x, Node.js 16.x also uses that same version, after 16.x Node.js started using openssl 3.x

There are various known performance issues related to OpenSSL 3. It's possible that they negatively affect performance of hashing through EVP_* as well, but someone would have to benchmark the C/C++ code itself to be sure.

Instead of EVP_* functions (code), bun uses SHA1 Functions (code).

The EVP_* functions are recommended over low-level functions etc. The latter are gradually being deprecated and using them might break compatibility with OpenSSL providers/engines.

FWIW, I proposed crypto.hash() at some point (see nodejs/node#42233), which would be faster than crypto.Hash for one-shot hashing. It might still be slower than an equivalent implementation using OpenSSL 1.1.1 and low-level OpenSSL APIs.

ronag · 2023-11-28T10:41:46Z

Are we using OpenSSL 3.0 or 3.1?

Uzlopak · 2023-11-28T10:42:35Z

@ronag
A fork of 3.0 with quic patched. No original OpenSSL 3.0

We are basically cut off from direct OpenSSL updates.

ronag · 2023-11-28T10:43:41Z

So basically we are stuck with this until we update OpenSSL.

ronag · 2023-11-28T10:44:01Z

Anyone considered switching to BoringSSL?

Uzlopak · 2023-11-28T10:54:30Z

I guess a PR to switch to BoringSSL would result in a lot of negative feedback. Also BoringSSL does not guarantee stable ABI.

https://boringssl.googlesource.com/boringssl/+/HEAD/PORTING.md#porting-from-openssl-to-boringssl

Note: BoringSSL does not have a stable API or ABI. It must be updated with its consumers. It is not suitable for, say, a system library in a traditional Linux distribution. For instance, Chromium statically links the specific revision of BoringSSL it was built against. Likewise, Android's system-internal copy of BoringSSL is not exposed by the NDK and must not be used by third-party applications.

E.g. @mcollina mentioned that stable ABI is important for him. So I assume for other it is also important.

ronag · 2023-11-28T10:59:31Z

E.g. @mcollina mentioned that stable ABI is important for him. So I assume for other it is also important.

Is this a theoretical issue or a practical one? Do they actually break the ABI that often? I think quic support is better in BoringSSL so at the end of the day it might be easier to use?

targos · 2023-11-28T11:02:23Z

https://boringssl.googlesource.com/boringssl/+/HEAD/README.md

Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.

tniessen · 2023-11-28T14:02:55Z

BoringSSL also lacks various features that users of Node.js might rely on and that'd we'd have to polyfill somehow, e.g., certain algorithms.

GeoffreyBooth · 2023-12-01T04:52:43Z

@nodejs/performance @nodejs/crypto

lemire · 2023-12-01T21:53:15Z

At a glance, this does not look like a hashing performance issue.

lemire · 2023-12-01T23:02:57Z

Try the following... Start with this script... (call it hash.mjs)

import crypto from 'node:crypto';

const value = '/Users/fabio/path/to/some/random/file.js';

console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
  crypto.createHash("sha1").update(value).digest("hex");
}
console.timeEnd ( 'crypto' );

(I picked the script as-is, but I extended to loop to 1000_000 repetitions.)

The conjecture is that the hashing is slow, and by that, presumably, we mean the update call. Ok. Let us trim the code... create a new script that does not update... (call it bhash.msj)

import crypto from 'node:crypto';

console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
  crypto.createHash("sha1");
}
console.timeEnd ( 'crypto' );

If hashing done by update is the bottleneck, this second script should be very fast compared to the previous one. Let us see... I use Node.js v20.10.0.

$ node hash.mjs 
crypto: 1.442s
$ node bhash.mjs 
crypto: 1.039s

Yep. It is not hashing. Only 0.4 s are spent on hashing. The bulk of the time is spent on crypto.createHash("sha1").

Let us try with Bun now...

$ bun hash.mjs 
[750.63ms] crypto
$ bun bhash.mjs 
[105.19ms] crypto

So bun spends 0.64 s on hashing... roughly as much as Node.js. The big difference between Node and Bun is that Bun is 10 times faster on the crypto.createHash("sha1") part.

So let us try something else... let us call crypto.createHash once:

import crypto from 'node:crypto';

const value = '/Users/fabio/path/to/some/random/file.js';

console.time ( 'crypto' );
const hasher = crypto.createHash("sha1");
for ( let i = 0; i < 1000_000; i++ ) {
  hasher.update(value);
}
console.timeEnd ( 'crypto' );

I get...

$ node chash.mjs 
crypto: 172.675ms
$ bun chash.mjs 
[170.99ms] crypto

Yep. So we have narrowed it down to crypto.createHash.

So let us profile the bhash.mjs script with Node:

   7.69%  node     libc.so.6             [.] pthread_rwlock_unlock@@GLIBC_2.34
   7.35%  node     libc.so.6             [.] malloc
   5.51%  node     libc.so.6             [.] pthread_rwlock_rdlock@GLIBC_2.2.5
   4.77%  node     node                  [.] Builtins_GetProperty
   4.65%  node     libc.so.6             [.] _int_free
   4.10%  node     libc.so.6             [.] _int_malloc

So we are spending time allocating memory and doing some threading. These are expensive jobs...

And now Bun...

   5.08%  bun          bun                   [.] 0x0000000002e52279
   4.26%  bun          bun                   [.] 0x0000000002e52897
   3.56%  bun          bun                   [.] 0x00000000030aabdf
   3.55%  bun          libc.so.6             [.] __memset_evex_unaligned_erms
   2.20%  bun          libc.so.6             [.] __strncasecmp_l_evex

My conclusion is that though the problem might come from OpenSSL, and hashing... it does not appear to be so. As you can see, just Builtins_GetProperty seems to be a bottleneck.

Maybe we should get the advice of someone who understands deeply the C++ architecture: @joyeecheung

Note: I have not disproved that Bun could have an advantage when hashing. It probably does. I am just saying that on this particular benchmark, the processor I used (Linux+x64), with a small string, hashing is not the bottleneck. Node.js has lost the race before it has even begun.

joyeecheung · 2023-12-02T21:53:37Z

I did some profiling to see where the bottleneck is - it seems #136 (comment) was based on only self-time, but it's still important to look at the call graph.

The rwlock comes from OpenSSL methods e.g. evp_md_init_internal
Builtins_GetProperty comes from Builtins_InstanceOf which comes from the instanceof in argument validations in the JS-land Hash
The creation of BaseObject incurs a significant overhead (partly due to the global handle creation though I think another part is the cleanup queue maintenance, which are inlined and not shown in the profile). I did a prototype in wip: crypto: use cppgc to manage Hash node#51017 to use Oilpan to manage the Hash object instead and createHash() becomes 15-17% faster. Note that V8 is also working on conservative stack scanning. AFAIU this makes the handle management closer to what JSC has so we might get speedup of handle management for free without doing something like this, though it's unclear when they'll ship it https://bugs.chromium.org/p/v8/issues/detail?id=13270

const { createHash } = require('node:crypto');

const array = [];
for ( let i = 0; i < 1000_000; i++ ) {
  array.push(null);
}
const start_time = performance.now();
for ( let i = 0; i < 1000_000; i++ ) {
  array[i] = createHash("sha1");
}
console.log(performance.now() - start_time);
console.log(array.length);

On main the profile of the snippet above looks like this (simplified with only the important bits, number is total time in the process being profiled, the % outside these are mostly setup/teardown/GC stuff, or just samples with a slightly different stack)

--48.93%--node::crypto::Hash::New
  --22.40%--node::BaseObject::BaseObject
  --16.25%--node::crypto::Hash::HashInit
    --15.03%--evp_md_init_internal
  --5.34%--EVP_get_digestbyname
  --3.25%--node::Utf8Value::Utf8Value
  --4.96%--v8::internal::ApiNatives::InstantiateObject
    --3.55%--v8::internal::Heap::CollectGarbage
--7.65%--Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit
  --5.46%--v8::internal::Runtime_AllocateInYoungGeneration
--3.82%--Builtins_InstanceOf

With the Oilpan prototype in nodejs/node#51017 I get a somewhat different profile that looks like this

--40.03%--node::crypto::Hash::New
  --21.29%--node::crypto::Hash::HashInit
    --18.93%--evp_md_init_internal
  --7.04%--EVP_get_digestbyname
  --3.29%--node::SetCppgcReference
  --3.25%--node::Utf8Value::Utf8Value
  --8.54%--v8::internal::ApiNatives::InstantiateObject
    --6.49%--v8::internal::Heap::CollectGarbage
--6.83%--Builtins_InstanceOf
--4.40%--Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit
  --4.23%--v8::internal::Runtime_AllocateInYoungGeneration

With a different allocation scheme it sheds off a chunk of the overhead though there are still a lot to be improved:

$./node_main run.js
1642.1952390000001
1000000

$ out/Release/node run.js
1276.719182
1000000

$ bun run.js
233.12367799999998
1000000

I think another thing that can be done is caching the EVP_MD of known algorithms (not 100% sure if that's safe to do, maybe people from @nodejs/crypto knows for sure). And maybe some JS voodoo can be done to create a fast path to avoid the instanceof overhead in argument validation.

joyeecheung · 2023-12-02T22:31:56Z

Actually regarding the OpenSSL overhead the solution (at least a partial solution) seems surprisingly simple - we are just under-initializing OpenSSL: nodejs/node#51019

joyeecheung · 2023-12-02T23:48:23Z

Actually #136 (comment) was incorrect, I was just looking at the numbers from the Oilpan prototype, initializing it with OPENSSL_INIT_ADD_ALL_DIGESTS won't make a differenct because EVP_get_digestbyname would also do that.

On another note, it may be a good idea to just move away from EVP_get_digestbyname if we are on OpenSSL 3. Locally switching to EVP_MD_fetch (but without proper lifecycle management so it leaks) can speed up createHash() by another 5-7% - though handling the life cycle of that ref-counted EVP_MD in the existing implementation can be somewhat tricky.

                                                                                confidence improvement accuracy (*)   (**)  (***)
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash'           ***      5.65 %       ±1.21% ±1.62% ±2.11%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash'          ***      6.37 %       ±0.84% ±1.12% ±1.45%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash'           ***      5.71 %       ±1.10% ±1.47% ±1.91%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash'           ***      7.57 %       ±1.50% ±2.00% ±2.62%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=10 sync='createHash'         ***      6.56 %       ±1.15% ±1.52% ±1.98%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=100 sync='createHash'        ***      5.65 %       ±0.80% ±1.06% ±1.38%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=20 sync='createHash'         ***      6.73 %       ±0.98% ±1.31% ±1.70%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=50 sync='createHash'         ***      6.05 %       ±0.89% ±1.18% ±1.54%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=10 sync='createHash'         ***      6.86 %       ±1.15% ±1.53% ±2.00%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=100 sync='createHash'        ***      7.02 %       ±0.98% ±1.31% ±1.71%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=20 sync='createHash'         ***      7.28 %       ±0.92% ±1.22% ±1.58%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=50 sync='createHash'         ***      6.52 %       ±1.12% ±1.50% ±1.95%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=10 sync='createHash'         ***      5.86 %       ±1.11% ±1.48% ±1.92%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=100 sync='createHash'        ***      5.40 %       ±1.16% ±1.54% ±2.01%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=20 sync='createHash'         ***      6.29 %       ±1.07% ±1.43% ±1.87%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=50 sync='createHash'         ***      5.59 %       ±1.17% ±1.56% ±2.03%

Uzlopak · 2023-12-03T00:36:16Z

Doesnt the OpenSSL docs say something about avoiding the lookup functions like EVP_get_digestbyname? I think I read something about this. Something along the lines that OpenSSL 3 is totally object oriented and thus e.g. you register digests in an Object at runtime. So calling something like EVP_get_digestbyname means basically, that OpenSSL goes through each name and checks if it finds the proper digest.

joyeecheung · 2023-12-03T00:39:32Z

Yes, though I think we'll need a bit of internal refactoring to move on to explicit fetching because so far most of the crypto classes don't think about freeing the EVP_MD (there is also a somewhat weird (undocumented?) signature that allows the algorithm to be another Hash object, I am not sure what should happen in terms of life cycle management in that case)

joyeecheung · 2023-12-03T02:06:35Z

About the InstanceOf - it seems mostly coming from the this instanceof Hash check which is pre-ES6 way of checking construct calls. The post-ES6 way is new.target which can be much faster (in V8 that can just be just checking a register) - with nodejs/node#51026 Builtins_InstanceOf no longer shows up in the profile.

On a side note, I see that there are still a lot of this instanceof scattering around in the code base. Probably can be dusted away too.

joyeecheung · 2023-12-03T20:17:33Z

I had another "surprisingly simple" (hopefully not just being naive) idea of migrating to EVP_MD_fetch and caching the EVP_MD. With nodejs/node#51034 nodejs/node#51026 and nodejs/node#51017 I am getting this from the createHash() microbenchmark pasted above:

$ out/Release/node run.js
790.332561
1000000

$ ./node_main run.js
1634.7148000000002
1000000

At this point the most significant bottleneck is garbage collection (specifically scavenge) from createHash() and this has more to do with how the benchmark is written. If I update the snippet to generate a digest and put it into the array, the other bottleneck is the string decoding part that converts input from a string to a buffer.

With a microbenchmark like this:

'use strict';
const { createHash } = require('node:crypto');
const { readFileSync } = require('node:fs');

const kHashCount = parseInt(process.env.HASH_COUNT) || 10000;
const kInputLength = parseInt(process.env.INPUT_LENGTH) || 100;
const kUpdateCount = parseInt(process.env.UPDATE_COUNT) || 1;

const array = [];
for ( let i = 0; i < kHashCount; i++ ) {
  array.push(null);
}
const input = readFileSync('./test/fixtures/snapshot/typescript.js').slice(0, kInputLength);

const start_time = performance.now();
for ( let i = 0; i < kHashCount; i++ ) {
  const hash = createHash('sha1');
  for (let j = 0; j < kUpdateCount; j++) {
    hash.update(input);
  }
  array[i] = hash.digest('hex');
}
console.log(performance.now() - start_time);
console.log(`${kHashCount} hashes, ${kUpdateCount} updates, input length ${input.length}, ${array.length} results`);

Using a buffer as input, there are only performance differences between my local branch, the main branch, and bun when we create a lot of hashes - the bottleneck is the performance of createHash() as well as GC. When hashing a big chunk of input at one go, or hashing fewer streamed bigger input, they perform similarly.

$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 out/Release/node hash.js
1268.3055590000001
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 ./node_main hash.js
2013.5803560000002
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 bun hash.js
713.9530219999999
1000000 hashes, 1 updates, input length 100, 1000000 results

$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 out/Release/node hash.js
960.9329419999999
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 ./node_main hash.js
968.0744119999999
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 bun hash.js
970.135489
1000 hashes, 100 updates, input length 10000, 1000 results

$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 out/Release/node hash.js
9.995244000000003
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 ./node_main hash.js
9.976660999999996
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 bun hash.js
9.789406
1 hashes, 1 updates, input length 10000000, 1 results

And if we add a .toString() to the input before slicing it, the local branch performs a 1.2-1.6x worse than bun, here the string decoding performance becomes another the bottleneck

$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 out/Release/node hash.js
1306.387378
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 ./node_main hash.js
2217.123724
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 bun hash.js
885.439071
1000000 hashes, 1 updates, input length 100, 1000000 results

$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 out/Release/node hash.js
1819.652489
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 ./node_main hash.js
1804.612928
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 bun hash.js
1183.056887
1000 hashes, 100 updates, input length 10000, 1000 results

$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 out/Release/node hash.js
24.548640999999996
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 ./node_main hash.js
24.163148999999997
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 bun hash.js
20.361431000000003
1 hashes, 1 updates, input length 10000000, 1 results

One way to mitigate the GC performance impact for the first use case could be, introducing a one-shot helper (probably with EVP_Q_digest) for generating digests at one go and not creating intermediate objects. Setting performance aside that helper might be useful enough to be worth it on its own anyway.
We could continue improving the string decoding performance, for example using simdutf + fast API calls, though our hands are tied again for non-flat/two-byte strings. But that's a more generic issue than just crypto. Also this doesn't matter in the case where the user passes buffer as input. This is still somewhat a v8 issue (i.e. string->WriteUtf8() and others arent' quite performant)
In terms of GC performance in the objet-based implementation, I think beyond migrating to Oilpan there isn't much that can be done on Node.js's side, that's probably more of a V8 issue, or it may not be a valid issue in the first place given that allocating tons of objects in a tight loop to really stress the GC out is not a common pattern in the real world.

fabiospampinato · 2023-12-03T20:29:50Z

the other bottleneck is the string decoding part that converts input from a string to a buffer.

As I understand it V8 keeps track of whether a string is encoded in latin1 or not, because if it's using latin1 it can just allocate 1 byte per character rather than 2. In that case, for latin1 strings, if the rope is flattened and whatever else may be needed, shouldn't decoding be essentially free because the slice of memory that the string points to can just be hashed directly? Maybe it's more complicated than that 🤔

joyeecheung · 2023-12-03T20:39:18Z

In that case, for latin1 strings, if the rope is flattened and whatever else may be needed, shouldn't decoding be essentially free because the slice of memory that the string points to can just be hashed directly?

V8 does not hand over the raw bytes to embedders, only a copy. Also it's only a pure copy for ASCII-only strings. For strings that are Latin1 but not ASCII there's still a bit of conversion to split each code point into two bytes.

lemire · 2023-12-05T22:43:44Z

I assume this is talking about buffering in JS land (possibly with a JS array)

What I am saying is that crypto.createHash("sha1") only needs an object that contains "sha1". It does not need to do anything, at least nothing in C++ land right away. I am just alluding to the design space you have.

I do agree with you, btw, that it makes a lot of sense to do a one-shot hashing... But what I am also saying is that crypto.createHash("sha1").update(buffer).digest(...) should not have wildly different performance characteristics.

joyeecheung · 2023-12-05T22:48:12Z

Yeah, there could be some micro-optimization that only starts a object-based implementation upon second update / when the first update contains a big enough input (say >1MB). Otherwise fallback to a one-shot digest implementation. Though the prerequisite would still be having a one-shot digest implementation...

PR-URL: #51026 Refs: nodejs/performance#136 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

lemire · 2023-12-05T23:12:38Z

Yeah, there could be some micro-optimization that only starts a object-based implementation upon second update / when the first update contains a big enough input (say >1MB). Otherwise fallback to a one-shot digest implementation. Though the prerequisite would still be having a one-shot digest implementation..

Precisely. ❤️ ❤️ The idea is that if you propose a new function, not everyone will rush to adopt it. But it might be possible to provide similar benefits to people who use the conventional API. 💨

PR-URL: #51026 Refs: nodejs/performance#136 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Vinícius Lourenço Claro Cardoso <contact@viniciusl.com.br> Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>

merceyz · 2023-12-15T21:26:04Z

@joyeecheung If you want a real world example where createHash is called a lot then Yarn might provide that.

Setup:

git clone https://github.com/yarnpkg/berry
cd berry
git checkout db6210f48355d2986e965f90009b22f18d3b6342
yarn build:cli --no-minify

Running the following command will call createHash 75068 times:

rm -f .yarn/install-state.gz
YARN_IGNORE_PATH=1 CI=true node ./packages/yarnpkg-cli/bundles/yarn.js

and the following will, at the time of writing, call it 90340 times:

rm -f yarn.lock .yarn/install-state.gz
YARN_IGNORE_PATH=1 node ./packages/yarnpkg-cli/bundles/yarn.js

about 1853 of those should be hashing streamed files and the rest strings.

On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <jasnell@gmail.com>

On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: nodejs#51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <jasnell@gmail.com>

On OpenSSL 3, migrate from EVP_get_digestbyname() to EVP_MD_fetch() to get the implementation and use a per-Environment cache for it. The EVP_MDs are freed during Environment cleanup. Drive-by: declare the smart pointer for EVP_MD_CTX as EVPMDCtxPointer instead of EVPMDPointer to avoid confusion with EVP_MD pointers. PR-URL: #51034 Refs: https://www.openssl.org/docs/man3.0/man7/crypto.html#Explicit-fetching Refs: nodejs/performance#136 Reviewed-By: James M Snell <jasnell@gmail.com>

joyeecheung · 2024-02-26T21:31:09Z

I looked into the "buffering input" idea for crypto.createHash and it turns out to be more complicated than I thought - most importantly, errors would be delayed until the second update() or digest() is called (e.g. if you pass an algorithm that the OpenSSL version loaded in to the process doesn't support, you can't get an error immediately, because we don't eagerly cache all the supported algorithms to avoid overhead for the first access, so you'd have to wait until the actual hashing call happens and OpenSSL tells us that the algorithm is not supported). Added to it the fact that Hash objects are also streams, it could be difficult to get the error handling correct, So I'll just go with nodejs/node#51044 for now.