More clearly define how seller trusted signals fetching works #1230

morlovich · 2024-07-22T15:07:42Z

This explicitly represents a parallel process picking sets of requests to merge, and what rules such merging must follow.
(It does not specify how such subsets are selected, since implementations have considerable freedom in how to do it).

Preview | Diff

and the bundled queryFeatureSupport('*'). This also includes fixes to some bugs in parsing of these URLs (some of the checks were missing).

there was another comment about the same line.

morlovich · 2024-07-22T15:15:21Z

@MattMenke2 Would appreciate your review on this as well, in part to see if it moves things to where specifying V2 would be easier. (Unfortunately bidder stuff is totally different...)

MattMenke2

Not quite a full review - want to poke a bit more tomorrow.

spec.bs

MattMenke2 · 2024-07-24T20:48:12Z

spec.bs

+  :: A [=list=] of [=trusted scoring signals requests=] that hasn't yet been organized to aid
+    batching.
+  : <dfn>request map</dfn>
+  :: A [=map=] from a tuple of [=script fetcher=], [=URL=], {{unsigned short}} or null, and


Why is script fetcher here? Currently, it looks like there's a 1:1 mapping a batch to fetcher, so we could make it a batcher-wide value. Even if we wanted to merge requests for different fetchers, I assume we wouldn't want it as a key in the map, because then it would prevent merging requests. Am I missing something?

We do want it to prevent merging requests, because if the script fetchers for the batch are not the same, the cross-origin header check gets weird.

I don't have a good answer as to what happens cross-auction, though. Or if we somehow have two component sellers with the same script.

(Similarly, there is a question of how much reuse of bidder script sellers can actually happen; I think it ought to be implementation-defined within some bounds, since for us it depends on process limits!)

We create a new batcher for each fetcher, though, so the fetcher is always the same, and could be a top-level pointer.

Yeah, but this removes the decision from here?

I mean, it already is there, so I'm wondering why it's here as well? Do you plan to change things to use a global batcher? If we're partitioning by script fetcher anyways, I'm not sure what that would get us?

If we don't share fetches when the script fetcher is different, I'm not sure that gets us anything, though?

We could make the batcher global (doesn't V2 require that for the cache?) and then maybe have the fetcher per auction, or have some optional way for fetchers to be shared (which is what impl does if there are concurrent auctions)?

V2 does indeed use a global batcher. In V2, I also have no plans to partition anything based on the fetcher, just on the seller origin (and main frame origin, and URL, and doubtless a couple other things). Guess perhaps I should have led with that.

impl-wise, the partitioning by script fetcher roughly corresponds to different script URLs getting different worklet processes, and in cross-origin case those waiting for script headers before asking trusted signals for stuff. I guess in spec those two would eventually need to be separate stages of the batching process, with cross-origin things first waiting for the headers before being considered for merging, but that doesn't seem to be necessary yet.

The spec just does a bad job of representing this sort of thing at all; the previous CL you reviewed changed "we fetch the script every single time we call scoreAd" to "we fetch it once per component auction", which is closer but not 100%... and re-arranging lifetime of ScriptFetchers is basically about that.

(One of my TODO entries is doing that for bid scripts, since those still fetch for every generateBid()).

Different script URLs share worklet processes, just not objects within the process.

Anyhow, I'm OK with this as-is, just wanted to dive into it a bit. Should finish reviewing this by EOD.

spec.bs

MattMenke2 · 2024-07-24T20:52:32Z

spec.bs

+
+1. Until this object is no longer needed:
+  1. Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty or
+    some heuristically chosen amount of time has passed.


"...or some time has passed and it's no longer empty", perhaps? I think we'd rather not think about running other steps with an empty queue.

No, because this may be empty by the "request map" field might not be.

... Though the distinction between request queue and request map only exists because I have no idea on how to express an atomic update to "request map", so the ideal solution would be to only have request map, have things atomically inserted into it, and then the condition can be as you said.

If this is intended to trigger when the queue is empty, then "Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty" doesn't work alone. I had assumed this was an exclusive or. I still think we want to rephrase this.

Well... it basically does two things:

Wake up when there is new stuff in the queue.

Wake up when we feel like batching things based on some timer.
In either case we may send requests. Or might not. So I don't really understand what you mean by 'exclusive' here...

So I read this as do one or the other, not having two different triggers in the same implementation. i.e.:

option1:

void ThreadFunc() {
while (running()) {
WaitForNonEmpty();
DoStuff();
}
}

or

void ThreadFunc() {
while (running()) {
WaitForHeuristic();
DoStuff();
}
}

As opposed to:

void ThreadFunc() {
while (running()) {
MagicallyCombineWaits(&WaitForNonEmpty, &WaitForHeuristic);
DoStuff();
}
}

I think we should make clear it's the latter. The heuristic wait is a mandatory part of the algorithm (at least if the other heuristic can leave map non-empty)

Thanks, yeah, that would be an issue. Do you perhaps have suggestions on clearer phrasing?

Maybe:

Wait until one of the following is true:

|batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty.

|batcher|'s [=trusted scoring signals batcher/request map=] is non-empty and some heuristically chosen amount of time has passed.

Thanks, this is indeed much better; done, though I used "at least one"

spec.bs

MattMenke2

I think this look good.

morlovich · 2024-08-15T14:56:52Z

Ping, Mike? Or should I bounce this back to Dom?

miketaylr

Very sorry for the delay. Rubber stamp given Matt's review (which I think is sufficient going forward).

morlovich · 2024-08-22T19:34:49Z

@qingxinwu This probably warrants extract attention to the "batch and fetch trusted scoring signals" algorithm, it does some weird stuff.

(And I am reminded I was supposed to be reviewing your CL...)

spec.bs

qingxinwu · 2024-08-26T04:27:22Z

spec.bs

+To <dfn>batch and fetch trusted scoring signals</dfn> given a [=trusted scoring signals batcher=]
+|batcher|:
+
+1. Until this object is no longer needed:


"this object" means |batcher|? And I'm not sure what "no longer needed" means

It means that the implementer gets to figure out memory management for this thing... (did change to |batcher|)

spec.bs

qingxinwu · 2024-08-26T19:50:59Z

spec.bs

+    1. |batcher|'s [=trusted scoring signals batcher/request queue=] [=map/is not empty=].
+    1. |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=] and some
+      heuristically chosen amount of time has passed.
+  1. Atomically transfer all entries in |batcher|'s [=trusted scoring signals batcher/


is this just a [=list/clone=], and then [=list/empty=] |batcher|'s [=trusted scoring signals batcher/request queue=]? I'm not sure if the "Atomically" makes much difference here spec wise?

If it's not atomic, another thread may append the entry in between the two operations, and the entry would get lost.

Ah I see.

Add a first step to this algorithm to assert this is running in parallel. (e.g., https://github.com/WICG/turtledove/blob/main/spec.bs#L1635).

Atomically do: 1. [=list/clone=] ... 1. [=list/empty=]... may be better, since it uses standard methods, instead of "transfer entries", which does not seem clear spec wise.

Do you know if there's a good way to spec "atomically"? I vaguelly remembered that you asked about this in one PR. If not, I may want to tag Dominic about their ideas.

Done. No idea on 3

@domfarolino Is there a way to spec the concept of "atomically"? Not sure if the term itself is enough.

Ah https://html.spec.whatwg.org/multipage/infrastructure.html#starting-a-new-parallel-queue (and the following example) may be also related.

Well only if the two execution contexts are actually the "same", in which they don't need to be distinct in parallel to each other. But they do need to be distinct I suppose, given @morlovich's comment.

Right, just saw Maks's comment (that comment didn't show up yet when I commented 😂 ). So ignore my previous comment then.

Note added. Using an actual queue like the parallel queue infra was considered, but it feels awkward to combine with the "just sleep for a bit, too, that's OK" stuff...

Can you clarify what you mean by "just sleep for a bit, too, that's OK"? And what bearing it would have on your adoption of a parallel queue? (I'm not saying you should use a parallel queue here I'm just trying to make sure you don't think using one requires arbitrary synchronous sleep steps).

So I wasn't really speaking of not using a parallel queue --- this does after all --- but rather of modelling the fetch request queue it uses on how the "start a new parallel queue" algorithm Qingxin linked to is written --- which is basically continuously spin-dequeueing. On further thought, that approach can actually work here, too, but I think it's less clear than talking about sleeping, since conceptually what this is saying is that implementations waiting for a few milliseconds to merge requests is an acceptable implementation choice. (As is not waiting!)

spec.bs

qingxinwu · 2024-08-26T20:43:26Z

spec.bs

+      empty [=list=].
+    1. [=list/Append=] |request| to |batcher|'s [=trusted scoring signals batcher/request map=]
+      [|key|].
+  1. Some number of times, heuristically, select a |key| and a non-empty sublist of |batcher|'s


"some number of times", maybe "[=iteration/While=] |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=]"?

That wouldn't do what's desired --- essentially an implementation can choose how much to batch, it doesn't have to handle everything it's got pending.

then my understanding of the sentence here is:

for i in range(0, a heuristically chosen number) { select a key and a sublist of requestMap[key]; handle the batch; }

where after the loop, the request map does not need to be empty. Is that correct?
I thought after running the loop, we wanted to have all requests in requestMap handled (sure not in a single batch, but in whatever number of batches they chose). Otherwise, I'm looking for a place that ensures all pending requests are fetched in the end.

Yup, since it might want to wait a bit to collect more stuff in request map to get a bigger batch.

Hmm, I guess we technically are not requiring eventual progress here. Not immediately sure how to do so.

maybe a note about this does not need to be drained during the for loop and can wait for more requests to batch, but just need to make sure all requests are batched (requestMap is empty) in the end ?

Rephrased the note there somewhat (not pushed yet, though, working on Dominic's suggestion, too).

morlovich · 2024-08-27T14:13:30Z

Hmm, build failure talking about [=fenced frame config/effective sandbox flags=]

morlovich · 2024-08-27T14:21:59Z

Looks to be because of WICG/fenced-frame@00676a5
I'll try to put together a minimal fix, though looks like there is a TODO to be TODONE by someone more familiar with this.

spec.bs

qingxinwu · 2024-08-29T21:15:39Z

almost there. Just tagged Dom for one question, and #1230 (comment) is the last comment I have.

to trusted seller signals fetches.

morlovich · 2024-09-05T14:28:04Z

Looks like another change in fenced frame spec breaking out build, looking into it.

qingxinwu

LGTM

SHA: 5af3f7f Reason: push, by JensenPaul Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

) SHA: 5af3f7f Reason: push, by brusshamilton Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Maks Orlovich added 26 commits June 6, 2024 09:50

Specify support for cross-origin trusted signals,

b5a3955

and the bundled queryFeatureSupport('*'). This also includes fixes to some bugs in parsing of these URLs (some of the checks were missing).

Merge branch 'main' into spec-cross-origin3

8622290

Simpler part of feedback

3d2c0f4

Refactor trustedBiddingSignalsURL config.

e05d110

Update scoring, also remove some diff noise.

77ca599

Split function.

b44f2c5

Apply feedback

c4a8806

Apply feedback.

069b85e

Apply suggestions I didn't need to ask follow ups on

c391741

Apply a typo fix Qingxin pointed out which I missed because

d2ea4ac

there was another comment about the same line.

Fix same origin checks

b0e2cdb

varify ignored placeholders

2b245a9

Rework the merging language.

510f367

Some first steps towards expression some fetch parallelism.

47dd440

Very vague spec of proper batcher.

6940b92

Fill out the algorithm and wire it through

909fde1

Improve some phrasing as suggested

1a4104c

Merge branch 'spec-cross-origin3' into rework-fetch

74e1bcb

Merge branch 'rework-fetch' into fetcher-more-spec

820f892

Merge branch 'main' into spec-cross-origin3

6015db8

Merge branch 'spec-cross-origin3' into rework-fetch

385a90f

Merge branch 'rework-fetch' into fetcher-more-spec

23a06d9

Fix wrong indentation level, now I can see in a preview diff.

971951c

Fix missing comma.

68a7308

Merge branch 'main' into rework-fetch

764002a

Merge branch 'rework-fetch' into fetcher-more-spec

c735f94

morlovich mentioned this pull request Jul 22, 2024

More clearly define how seller trusted signals fetching works morlovich/turtledove#4

Closed

MattMenke2 reviewed Jul 24, 2024

View reviewed changes

Apply feedback

a2501fb

MattMenke2 approved these changes Jul 25, 2024

View reviewed changes

morlovich requested review from domfarolino and miketaylr and removed request for domfarolino July 26, 2024 12:58

JensenPaul added the spec Relates to the spec label Jul 30, 2024

Maks Orlovich added 2 commits August 15, 2024 10:19

Merge branch 'main' into fetcher-more-spec

4f72b3d

Fix merge mistake

55dc68d

miketaylr approved these changes Aug 19, 2024

View reviewed changes

morlovich requested a review from qingxinwu August 22, 2024 19:34

qingxinwu reviewed Aug 26, 2024

View reviewed changes

spec.bs Outdated Show resolved Hide resolved

qingxinwu reviewed Aug 26, 2024

View reviewed changes

Apply feedback

e61116d

qingxinwu reviewed Aug 27, 2024

View reviewed changes

spec.bs Outdated Show resolved Hide resolved

spec.bs Outdated Show resolved Hide resolved

Maks Orlovich added 3 commits August 27, 2024 11:10

Merge branch 'main' into fetcher-more-spec

b5fd043

Fix return capitalization, as per feedback

1c1f216

Apply feedback

3e06bf5

Maks Orlovich added 2 commits September 3, 2024 13:16

Merge branch 'main' into fetcher-more-spec, and pass policyContainer

77f52e0

to trusted seller signals fetches.

Notes changes

ceee609

Merge branch 'main' into fetcher-more-spec (to force a rebuild)

f484b9b

qingxinwu approved these changes Sep 6, 2024

View reviewed changes

JensenPaul merged commit 5af3f7f into WICG:main Sep 6, 2024
2 checks passed

github-actions bot added a commit that referenced this pull request Sep 6, 2024

More clearly define how seller trusted signals fetching works (#1230)

2cd09c9

SHA: 5af3f7f Reason: push, by JensenPaul Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

More clearly define how seller trusted signals fetching works #1230

More clearly define how seller trusted signals fetching works #1230

Conversation

morlovich commented Jul 22, 2024 • edited by pr-preview bot Loading

morlovich commented Jul 22, 2024

MattMenke2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MattMenke2 left a comment

Choose a reason for hiding this comment

morlovich commented Aug 15, 2024

miketaylr left a comment

Choose a reason for hiding this comment

morlovich commented Aug 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingxinwu Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingxinwu Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morlovich commented Aug 27, 2024

morlovich commented Aug 27, 2024

qingxinwu commented Aug 29, 2024

morlovich commented Sep 5, 2024

qingxinwu left a comment

Choose a reason for hiding this comment

morlovich commented Jul 22, 2024 •

edited by pr-preview bot

Loading

qingxinwu Sep 5, 2024 •

edited

Loading

qingxinwu Aug 29, 2024 •

edited

Loading