Consider to support retry mechanism #40

mingyc · 2022-09-30T03:37:43Z

Offline discussion sugggests that the API should support some retry mechanism otherwise PendingBeacon is too unreliable.

bool field option {retry: true}: this option let the browser decide how to retry the failed beacon request.
number field option {retries: 10}: this option let the user specify how many times the browser should retry the failed request. However, it might be very difficult to implement as: by default the beacons are sent out on page discard, and there really isn't much time left for a page to retry that many number of times, unless we are willing to let browser queue a failed request longer enough (privacy concern).

Note to @fergald : Looking into the spec of Reporting API, it doesn't really support retry mechanism.

We don’t specify any retry mechanism here for failed reports. We may want to add one here, or provide some indication that the delivery failed.

The text was updated successfully, but these errors were encountered:

horo-t · 2023-10-13T08:07:50Z

FYI: Chromium has a retry mechanism of Reporting API. See this CL and this max_report_attempts flag.

fergald · 2023-10-13T13:10:26Z

Retries are difficult. We are restricted in when we are allowed start fetches. If there are no instances of the the origin alive or just recently closed, the it could violate user expectations for retry network request created by that origin. If background-sync is enabled for the origin, Chrome privacy team are OK with it but on other browsers we can only initiate a request while the origin has a live instance or just as one is closing.

It is not possible to retry a fetch initiated from an unload handler so this is equally (un)reliable.

I think the TL;DR of my argument is that if you show me some existing code that does a delayed fetch, I can show you how to add a fetchLater that will make it more reliable and never less reliable.

For sites that have not already implemented retries using fetch. I think replacing them with fetchLater results in an increase in reliability.

If someone cares about maximizing reliability, they presumably currently they are already

doing everything via a service worker and maybe even storage
manually managing timeouts and doing fetches with logic to retry and also paying attention to visibilitychange events

We will ignore 1 since these people don't need fetchLater

For 2, with fetchLater available, for every piece of data that they want to send after a timeout or visibilitychange event they would create a fetchLater that they later cancel. The fetchLater would only ever actually fetch in a crash.

So for 2, this gives a strict increase in reliability.

Adding automatic retries would allow those in 2 to remove retry logic. That seems nice but is not a requirement.

mreinstein · 2024-01-08T22:14:34Z

If you show me some existing code that does a delayed fetch, I can show you how to add a fetchLater that will make it more reliable and never less reliable.

That is accurate but it's also a false dichotomy; delayed fetch vs fetchLater are not the only 2 strategies out there. Another option is people that are spamming events UDP-style to maximize delivery reliability. That's the one I've seen actually used because delayed send (on the web) doesn't really work given all the problems with page lifecycle combined with network failures.

The problem is network failures do happen, even in non-crash scenarios, especially in mobile contexts. Losing data sent via pending beacon (or the new fetchLater API) due to network failures isn't always recoverable.

In my use case, I need to report on how long someone stays on a given page. The solution I've inherited is the page sends a "heartbeat" gif request every X seconds. On the analytics side I look at the last one delivered and compare that to the first. It is extremely reliable, because it works similar to UDP (we spam so many .gif requests and most end up delivered.)

Obviously this sucks for performance reasons. I have (not exaggerating) a few billion of these requests that happen per year. I would love to stop doing this, but every attempt to replace it with beacon transport fails:

The page lifecycle events are not reliable, even with clever, well designed polyfills like https://github.com/GoogleChromeLabs/page-lifecycle
The 64kb origin limit means I can't just replace each http .gif request with a beacon, especially when there are several 3rd party scripts running on a page and sharing that 64kb quota.
I can't build my own retry logic because my integration is as a 3rd party script, and I can't declare service workers or put a lot of data into local storage on these domains. I also can't depend on a user returning to a given domain, some people may only visit once and not return.

I've run analytics for gif based and beacon based transports side-by-side, and it varies from day to day but the beacon messages are always less than their .gif counterparts, and it ranges anywhere from 1-10% when summing up billion messages. It's a big enough data loss that I can't get this solution past the data/management people.

If there are no instances of the the origin alive or just recently closed, the it could violate user expectations for retry network request created by that origin

I can appreciate that sentiment, but in the case of people building telemetry/analytics on a large scale, especially as a 3rd party vendor on a web page, I don't think pending beacon or fetchLater can work without having the browser make it's best effort to ensure these things send, and that includes some kind of retry mechanism related to network failures.

mreinstein · 2024-01-08T22:40:10Z

If there are no instances of the the origin alive or just recently closed, the it could violate user expectations for retry network request created by that origin

I think it's also important to add that the current constraints of the new fetchLater API already violate this expectation:

(from https://chromium.googlesource.com/chromium/src/+/main/docs/experiments/fetch-later.md#what_s-not-supported )

mingyc · 2024-01-19T07:09:15Z

If there are no instances of the the origin alive or just recently closed, the it could violate user expectations for retry network request created by that origin

I think it's also important to add that the current constraints of the new fetchLater API already violate this expectation:

There has been a long discussion around this topic (user expectation), and the current decision (#30 (comment)) is that fetchLater API should not send out any requests if no other same-origin site is open (in other tabs/iframes etc).

In Chromium implementation for OT, it is even stricter that the browser will flush out all pending fetchLater requests for a document on it entering BFCache.

Hence, the fact that it is not observable in DevTools (after the initiator document is closed) should not be related to a privacy issue.

mingyc added the api Issue with API specs label Oct 20, 2022

mingyc mentioned this issue Feb 16, 2023

Early Design Review: Pending Beacon API w3ctag/design-reviews#776

Closed

mingyc mentioned this issue Jul 4, 2023

Deferred fetching whatwg/fetch#1647

Open

4 tasks

mingyc mentioned this issue Oct 5, 2023

API Reliability #83

Open

mreinstein mentioned this issue Jan 8, 2024

Crash recovery #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider to support retry mechanism #40

Consider to support retry mechanism #40

mingyc commented Sep 30, 2022 •

edited

Loading

horo-t commented Oct 13, 2023

fergald commented Oct 13, 2023

mreinstein commented Jan 8, 2024 •

edited

Loading

mreinstein commented Jan 8, 2024

mingyc commented Jan 19, 2024

Consider to support retry mechanism #40

Consider to support retry mechanism #40

Comments

mingyc commented Sep 30, 2022 • edited Loading

horo-t commented Oct 13, 2023

fergald commented Oct 13, 2023

mreinstein commented Jan 8, 2024 • edited Loading

mreinstein commented Jan 8, 2024

mingyc commented Jan 19, 2024

mingyc commented Sep 30, 2022 •

edited

Loading

mreinstein commented Jan 8, 2024 •

edited

Loading