-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More clearly define how seller trusted signals fetching works #1230
Conversation
and the bundled queryFeatureSupport('*'). This also includes fixes to some bugs in parsing of these URLs (some of the checks were missing).
there was another comment about the same line.
@MattMenke2 Would appreciate your review on this as well, in part to see if it moves things to where specifying V2 would be easier. (Unfortunately bidder stuff is totally different...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite a full review - want to poke a bit more tomorrow.
spec.bs
Outdated
:: A [=list=] of [=trusted scoring signals requests=] that hasn't yet been organized to aid | ||
batching. | ||
: <dfn>request map</dfn> | ||
:: A [=map=] from a tuple of [=script fetcher=], [=URL=], {{unsigned short}} or null, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is script fetcher here? Currently, it looks like there's a 1:1 mapping a batch to fetcher, so we could make it a batcher-wide value. Even if we wanted to merge requests for different fetchers, I assume we wouldn't want it as a key in the map, because then it would prevent merging requests. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do want it to prevent merging requests, because if the script fetchers for the batch are not the same, the cross-origin header check gets weird.
I don't have a good answer as to what happens cross-auction, though. Or if we somehow have two component sellers with the same script.
(Similarly, there is a question of how much reuse of bidder script sellers can actually happen; I think it ought to be implementation-defined within some bounds, since for us it depends on process limits!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a new batcher for each fetcher, though, so the fetcher is always the same, and could be a top-level pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but this removes the decision from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it already is there, so I'm wondering why it's here as well? Do you plan to change things to use a global batcher? If we're partitioning by script fetcher anyways, I'm not sure what that would get us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't share fetches when the script fetcher is different, I'm not sure that gets us anything, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make the batcher global (doesn't V2 require that for the cache?) and then maybe have the fetcher per auction, or have some optional way for fetchers to be shared (which is what impl does if there are concurrent auctions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V2 does indeed use a global batcher. In V2, I also have no plans to partition anything based on the fetcher, just on the seller origin (and main frame origin, and URL, and doubtless a couple other things). Guess perhaps I should have led with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl-wise, the partitioning by script fetcher roughly corresponds to different script URLs getting different worklet processes, and in cross-origin case those waiting for script headers before asking trusted signals for stuff. I guess in spec those two would eventually need to be separate stages of the batching process, with cross-origin things first waiting for the headers before being considered for merging, but that doesn't seem to be necessary yet.
The spec just does a bad job of representing this sort of thing at all; the previous CL you reviewed changed "we fetch the script every single time we call scoreAd" to "we fetch it once per component auction", which is closer but not 100%... and re-arranging lifetime of ScriptFetchers is basically about that.
(One of my TODO entries is doing that for bid scripts, since those still fetch for every generateBid()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different script URLs share worklet processes, just not objects within the process.
Anyhow, I'm OK with this as-is, just wanted to dive into it a bit. Should finish reviewing this by EOD.
spec.bs
Outdated
|
||
1. Until this object is no longer needed: | ||
1. Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty or | ||
some heuristically chosen amount of time has passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...or some time has passed and it's no longer empty", perhaps? I think we'd rather not think about running other steps with an empty queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because this may be empty by the "request map" field might not be.
... Though the distinction between request queue and request map only exists because I have no idea on how to express an atomic update to "request map", so the ideal solution would be to only have request map, have things atomically inserted into it, and then the condition can be as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is intended to trigger when the queue is empty, then "Wait until |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty" doesn't work alone. I had assumed this was an exclusive or. I still think we want to rephrase this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... it basically does two things:
- Wake up when there is new stuff in the queue.
- Wake up when we feel like batching things based on some timer.
In either case we may send requests. Or might not. So I don't really understand what you mean by 'exclusive' here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I read this as do one or the other, not having two different triggers in the same implementation. i.e.:
option1:
void ThreadFunc() {
while (running()) {
WaitForNonEmpty();
DoStuff();
}
}
or
void ThreadFunc() {
while (running()) {
WaitForHeuristic();
DoStuff();
}
}
As opposed to:
void ThreadFunc() {
while (running()) {
MagicallyCombineWaits(&WaitForNonEmpty, &WaitForHeuristic);
DoStuff();
}
}
I think we should make clear it's the latter. The heuristic wait is a mandatory part of the algorithm (at least if the other heuristic can leave map non-empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yeah, that would be an issue. Do you perhaps have suggestions on clearer phrasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
Wait until one of the following is true:
- |batcher|'s [=trusted scoring signals batcher/request queue=] is no longer empty.
- |batcher|'s [=trusted scoring signals batcher/request map=] is non-empty and some heuristically chosen amount of time has passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is indeed much better; done, though I used "at least one"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this look good.
Ping, Mike? Or should I bounce this back to Dom? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very sorry for the delay. Rubber stamp given Matt's review (which I think is sufficient going forward).
@qingxinwu This probably warrants extract attention to the "batch and fetch trusted scoring signals" algorithm, it does some weird stuff. (And I am reminded I was supposed to be reviewing your CL...) |
spec.bs
Outdated
To <dfn>batch and fetch trusted scoring signals</dfn> given a [=trusted scoring signals batcher=] | ||
|batcher|: | ||
|
||
1. Until this object is no longer needed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this object" means |batcher|? And I'm not sure what "no longer needed" means
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that the implementer gets to figure out memory management for this thing... (did change to |batcher|)
spec.bs
Outdated
1. |batcher|'s [=trusted scoring signals batcher/request queue=] [=map/is not empty=]. | ||
1. |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=] and some | ||
heuristically chosen amount of time has passed. | ||
1. Atomically transfer all entries in |batcher|'s [=trusted scoring signals batcher/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just a [=list/clone=], and then [=list/empty=] |batcher|'s [=trusted scoring signals batcher/request queue=]? I'm not sure if the "Atomically" makes much difference here spec wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not atomic, another thread may append the entry in between the two operations, and the entry would get lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see.
- Add a first step to this algorithm to assert this is running in parallel. (e.g., https://github.com/WICG/turtledove/blob/main/spec.bs#L1635).
Atomically do: 1. [=list/clone=] ... 1. [=list/empty=]...
may be better, since it uses standard methods, instead of "transfer entries", which does not seem clear spec wise.- Do you know if there's a good way to spec "atomically"? I vaguelly remembered that you asked about this in one PR. If not, I may want to tag Dominic about their ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. No idea on 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@domfarolino Is there a way to spec the concept of "atomically"? Not sure if the term itself is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah https://html.spec.whatwg.org/multipage/infrastructure.html#starting-a-new-parallel-queue (and the following example) may be also related.
Well only if the two execution contexts are actually the "same", in which they don't need to be distinct in parallel to each other. But they do need to be distinct I suppose, given @morlovich's comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, just saw Maks's comment (that comment didn't show up yet when I commented 😂 ). So ignore my previous comment then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note added. Using an actual queue like the parallel queue infra was considered, but it feels awkward to combine with the "just sleep for a bit, too, that's OK" stuff...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what you mean by "just sleep for a bit, too, that's OK"? And what bearing it would have on your adoption of a parallel queue? (I'm not saying you should use a parallel queue here I'm just trying to make sure you don't think using one requires arbitrary synchronous sleep steps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I wasn't really speaking of not using a parallel queue --- this does after all --- but rather of modelling the fetch request queue it uses on how the "start a new parallel queue" algorithm Qingxin linked to is written --- which is basically continuously spin-dequeueing. On further thought, that approach can actually work here, too, but I think it's less clear than talking about sleeping, since conceptually what this is saying is that implementations waiting for a few milliseconds to merge requests is an acceptable implementation choice. (As is not waiting!)
spec.bs
Outdated
empty [=list=]. | ||
1. [=list/Append=] |request| to |batcher|'s [=trusted scoring signals batcher/request map=] | ||
[|key|]. | ||
1. Some number of times, heuristically, select a |key| and a non-empty sublist of |batcher|'s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"some number of times", maybe "[=iteration/While=] |batcher|'s [=trusted scoring signals batcher/request map=] [=map/is not empty=]"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't do what's desired --- essentially an implementation can choose how much to batch, it doesn't have to handle everything it's got pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then my understanding of the sentence here is:
for i in range(0, a heuristically chosen number) {
select a key and a sublist of requestMap[key];
handle the batch;
}
where after the loop, the request map does not need to be empty. Is that correct?
I thought after running the loop, we wanted to have all requests in requestMap handled (sure not in a single batch, but in whatever number of batches they chose). Otherwise, I'm looking for a place that ensures all pending requests are fetched in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, since it might want to wait a bit to collect more stuff in request map to get a bigger batch.
Hmm, I guess we technically are not requiring eventual progress here. Not immediately sure how to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a note about this does not need to be drained during the for loop and can wait for more requests to batch, but just need to make sure all requests are batched (requestMap is empty) in the end ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased the note there somewhat (not pushed yet, though, working on Dominic's suggestion, too).
Hmm, build failure talking about [=fenced frame config/effective sandbox flags=] |
Looks to be because of WICG/fenced-frame@00676a5 |
almost there. Just tagged Dom for one question, and #1230 (comment) is the last comment I have. |
to trusted seller signals fetches.
Looks like another change in fenced frame spec breaking out build, looking into it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
SHA: 5af3f7f Reason: push, by JensenPaul Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
) SHA: 5af3f7f Reason: push, by brusshamilton Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This explicitly represents a parallel process picking sets of requests to merge, and what rules such merging must follow.
(It does not specify how such subsets are selected, since implementations have considerable freedom in how to do it).
Preview | Diff