Accept (async) iterables in `crypto.subtle.digest` #390

twiss · 2024-12-20T12:40:38Z

Addresses part of #73 by accepting an iterable or async iterable (such as a ReadableStream) of BufferSources in crypto.subtle.digest, in order to enable incremental/streaming hashing.

(In the future, we could do something similar for crypto.subtle.sign and crypto.subtle.verify fairly easily, but I started with crypto.subtle.digest since it seems most important and I wanted to check whether folks are happy with this direction first.)

Note that the spec text doesn't require incremental/streaming hashing because FIPS-180-4 defines the SHA functions as single-shot functions which take a complete message, even though it is very common for implementations to provide a streaming API.

Note also that the Web IDL isn't super pretty because it doesn't support syntax like

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or iterable<BufferSource> or async iterable<BufferSource>) data
);

so instead we do

typedef object IterableOfBufferSources;
typedef object AsyncIterableOfBufferSources;

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or IterableOfBufferSources or AsyncIterableOfBufferSources) data
);

and check the types involved in the calling function.

(Since this is a somewhat specific use case, where it's important that we retain a reference to the iterable after the function returns a Promise, rather than letting Web IDL copy the data as e.g. sequence<BufferSource> would, I personally think that's fine and we don't specifically need syntactic sugar for this in Web IDL, which might be difficult to achieve anyway.)

Finally, I've left the language around running in parallel for now, even though it's not completely clear whether we want/need it. However, when accepting async iterables specifically, we need to return a Promise and queue a task to resolve it anyway, so for this specific functionality it probably doesn't matter that much.

AFAICT, due to the loose definition of running in parallel, implementations are already free to do the hashing on the same thread if they wish (e.g. if the overhead of using multiple threads isn't worth it).

Finally, I didn't include ECMAScript in the list of xref'ed specs, because it caused a bunch of ambiguities for TypeError and SyntaxError, increasing the number of required manual citations more than I had to do here.

(Apologies for the long list of review requests, but since this is a fairly significant new feature I wanted to make sure everybody's on board with this.)

Preview | Diff

jasnell · 2025-01-04T02:43:04Z

Thinking about the case for Symbol.iterator a bit.. since a string implements Symbol.iterator this change would mean that we'd need to add an additional type check in the implementation to catch string arguments before performing the Symbol.iterator check on the input. Instead, perhaps we can also update digest to accept a string input that is always interpreted as utf8. So, for instance. await crypto.subtle.digest('sha256', 'hello') would currently throw an error but could be supported along with the change here.

twiss · 2025-01-04T09:48:54Z

Since String's Symbol.iterator produces strings, they would be rejected by the current text which says

If value is not a BufferSource, reject promise ...

without any specific check for strings. Except that an empty string would produce the same result as an empty array or empty BufferSource, I suppose, which is perhaps a bit odd, but not strictly wrong.

Either way, we could add support for strings later in a fully backwards compatible way even with this change, so I'd propose that we keep it separate.

(Also, I think there's something to be said for being explicit, as in await crypto.subtle.digest('SHA-256', new TextEncoder().encode('hello')) or await crypto.subtle.digest('SHA-256', stream.pipeThrough(new TextEncoderStream())), so I kind of think we should discuss this separately before changing it as well.)

annevk · 2025-01-06T13:16:47Z

Was implementer interest ever established?

I think it's better if the Web IDL just used object directly if that's unavoidable. Using it redundantly and indirectly does not actually make things easier to read (and maybe we should make it an error to use something redundantly). You also need to use the relevant ECMAScript operations to handle the passed in object just like Web IDL would do. "conforms to" doesn't mean anything. It's probably worth filing an issue against Web IDL to get advice and possibly get this added to Web IDL directly as doing these kind of conversations yourself is not recommended.

I would also ask @MattiasBuelens or others participating in https://github.com/whatwg/streams for review as there's quite a few subtleties.

twiss · 2025-01-06T14:18:03Z

Was implementer interest ever established?

There is implementer interest from the server-side runtimes, and this specific API was proposed by @jakearchibald when he was at Google, does that count? ;)

More seriously, it was discussed in the WebAppSec WG a couple times but no very concrete signals yet. I figured it might be easier to discuss when there's a concrete proposal on the table. But obviously if you're opposed to or not interested in streaming altogether that's also fair.

I think it's better if the Web IDL just used object directly if that's unavoidable. Using it redundantly and indirectly does not actually make things easier to read (and maybe we should make it an error to use something redundantly).

The reason I did it this way is because if you have tooling that tells you the types/IDL involved, it'll give you a hint about what kind of objects you can pass, as opposed to just object which is fairly meaningless. Maybe, to make it not redundant but still somewhat descriptive, we could do

typedef object MaybeAsyncIterableOfBufferSources;

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  (BufferSource or MaybeAsyncIterableOfBufferSources) data
);

or even

typedef object BufferSources; // (BufferSource or iterable<BufferSource> or async iterable<BufferSource>)

Promise<ArrayBuffer> digest(
  AlgorithmIdentifier algorithm,
  BufferSources data
);

? (Unless it ends up being added to Web IDL, of course.)

You also need to use the relevant ECMAScript operations to handle the passed in object just like Web IDL would do.

Fair enough.

"conforms to" doesn't mean anything.

The ECMAScript spec uses similar language and offers a definition for it here. I'll link to that.

It's probably worth filing an issue against Web IDL to get advice and possibly get this added to Web IDL directly as doing these kind of conversations yourself is not recommended.

Fair enough. I wasn't sure if it's a pattern we need anywhere else but ofc I'm happy to discuss it.

annevk · 2025-01-06T14:36:58Z

ECMAScript uses it as a statement of fact, not as a way to determine what type of object a thing is.

twiss · 2025-01-06T14:57:40Z

ECMAScript uses it as a statement of fact, not as a way to determine what type of object a thing is.

OK, fair enough. Anyway I don't think this check is strictly necessary since GetIterator also returns a TypeError if the passed object is not an iterable object, so I'll just remove it.

twiss requested review from jakearchibald, davidben, panva, jasnell, annevk, nmahendru and Frosne December 20, 2024 12:40

twiss force-pushed the stream-hashing branch from 252ff08 to d124b53 Compare December 20, 2024 12:43

jasnell approved these changes Dec 28, 2024

View reviewed changes

jasnell mentioned this pull request Jan 4, 2025

Implement iterator/async iterator support for web crypto digest nodejs/node#56463

Open

twiss added 2 commits January 6, 2025 13:14

Accept (async) iterables in crypto.subtle.digest

360fc74

Fix handling of empty iterables

f9b4b88

twiss force-pushed the stream-hashing branch from d124b53 to f9b4b88 Compare January 6, 2025 12:20

ItsMakni approved these changes Jan 6, 2025

View reviewed changes

twiss mentioned this pull request Jan 6, 2025

Accept iterable and async iterable in functions? whatwg/webidl#1461

Open

twiss mentioned this pull request Jan 6, 2025

Add async iterable<T> type to WebIDL whatwg/webidl#1397

Open

5 tasks

twiss added 2 commits January 6, 2025 17:43

Remove superfluous check that data conforms to iterable interface

007a188

Simplify IDL

951e47b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept (async) iterables in `crypto.subtle.digest` #390

Accept (async) iterables in `crypto.subtle.digest` #390

twiss commented Dec 20, 2024 •

edited by pr-preview bot

Loading

jasnell commented Jan 4, 2025

twiss commented Jan 4, 2025

annevk commented Jan 6, 2025

twiss commented Jan 6, 2025

annevk commented Jan 6, 2025

twiss commented Jan 6, 2025 •

edited

Loading

Accept (async) iterables in crypto.subtle.digest #390

Are you sure you want to change the base?

Accept (async) iterables in crypto.subtle.digest #390

Conversation

twiss commented Dec 20, 2024 • edited by pr-preview bot Loading

jasnell commented Jan 4, 2025

twiss commented Jan 4, 2025

annevk commented Jan 6, 2025

twiss commented Jan 6, 2025

annevk commented Jan 6, 2025

twiss commented Jan 6, 2025 • edited Loading

Accept (async) iterables in `crypto.subtle.digest` #390

Accept (async) iterables in `crypto.subtle.digest` #390

twiss commented Dec 20, 2024 •

edited by pr-preview bot

Loading

twiss commented Jan 6, 2025 •

edited

Loading