Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(auto-update): make bundle rollback more robust #502

Closed
hakant opened this issue Dec 23, 2024 · 30 comments
Closed

feat(auto-update): make bundle rollback more robust #502

hakant opened this issue Dec 23, 2024 · 30 comments

Comments

@hakant
Copy link

hakant commented Dec 23, 2024

Feature Request

Description

In auto-update scenarios, new bundles are often set while the app is running in the background. During this time, the JavaScript execution of the webview is usually deprioritized by the mobile operating system due to backgrounding. This leads to slow or delayed JavaScript execution, which in turn necessitates higher appReadyTimeout configurations (typically 10–20 seconds).

The issue with prolonged appReadyTimeout durations is that, if a bundle contains a bug causing a critical crash, recovery takes too long. In real-world usage, most users won’t wait 10–20 seconds staring at a white screen. Instead, they are likely to kill the app and give up, meaning recovery might never occur.

I propose two solutions to address this problem:

  1. Expose a plugin API that allows developers to programmatically change a bundle's status to an error state. This would enable developers to detect a critical crash, mark the bundle as erroneous, and reset to the last successful bundle without waiting for the appReadyTimeout. Resetting is already possible, but without marking the current bundle as erroneous it doesn't seem very useful as it locks up in a cycle of update -> fail -> reset.
  2. Introduce a configuration property that allows developers to disable setting new bundles while the app is in the background. New bundle activations would then only occur during app launches. Enforcing bundle activation when the app is in the foreground (e.g., after a launch) would make JavaScript execution more predictable and allow for reduced appReadyTimeout durations.

Platform(s)

iOS & Android

Preferred Solution

Both solutions 1 and 2 would be generally useful. If I had to prioritize, solution 1 would likely be sufficient. However, I’m uncertain if updating the bundle while the app (and therefore the webview) is in the background introduces other edge case consequences. For instance, if the webview receives no execution time, it might lead to false rollbacks.

Alternatives

An alternative would be to abandon auto-updates altogether and implement a custom manual update logic. However, this approach would sacrifice the auto-update feature's implicit disaster recovery benefit thanks to its native implementation, which is independent of JavaScript execution in the bundle. Losing this feature would be unfortunate.

Additional Context

Here’s some additional context: we’re aiming to move notifyAppReady to the end of the critical application load path, covering bundle startup and the display of the first screen. This process should take no more than 3–5 seconds. However, when the app is in the background, the mobile operating system may arbitrarily freeze JavaScript execution in the webview depending on device activity. This forces us to increase the timeout, ultimately reducing the effectiveness of recovery mechanisms.

Curious about your thoughts and whether there are any other solutions we might have missed in our analysis. Thank you!

@riderx
Copy link
Collaborator

riderx commented Dec 30, 2024

Hello thanks for the feedback, we just released a new Android version who use WorkerManager.
WorkerManager allows us to keep the app alive the time works happens.
It's currently used for download only but maybe we could extend it to "SET" method this would probably solve your issue?
on your point 1 you mean in JS in the app or in backend on your side with API to capgo ?
2. this option is requested since long but our current implementation doesn't allow us to do it easily

@hakant
Copy link
Author

hakant commented Jan 2, 2025

Hi, thanks for the response!

About WorkerManager in Android. Would that also keep the WebView and its Javascript executions alive? My gut feeling says that it would only make sure the task defined in the WorkerManager would be alive. What do you think? And regardless, iOS would continue to remain as a problem.

In solution 1, I indeed mean JS in the app. Just like we can list bundles, get the current bundle info or reset to a different bundle. It would be useful if we can mark a bundle as faulty (BundleStatus = 'error').

For solution 2, yes it would be very nice to have that possibility. App launch is a perfect moment to switch to a new bundle. Backgrounding is not so much. But I understand that it might be a difficult change for you.

So to sum up, if we can have a JS plugin method where we can change the status of a Bundle to "error", that would at least allow us to run our own recovery logic without waiting for appReadyTimeout.

@riderx
Copy link
Collaborator

riderx commented Jan 6, 2025

To make a bit of context, you are the only one reporting this, so there is a chance it's more a misuse than a real issue :)
Where do you put AppReady?
It should be done as early as possible.
That maybe your main problem, if you call it earlier as expected, there are chances you will have no issue.

We can still have the set error method, but I think in your use case this is not the problem. It will be useful for manual mode.
@WcaleNieWolny could also have a look to see how (2) is hard to achieve

@WcaleNieWolny
Copy link
Contributor

Just without looking at the code, I don't really see why (2) is hard to achieve.

@WcaleNieWolny
Copy link
Contributor

Right, so currently we call installNext while the app is in the background. This should, IN THEORY, make for a smooth transition between two bundles. Calling installNext in the foreground would likely provide an undesirable user experience.

I sort of believe that the fix for the timeout could be as simple as doing

if (!isForeground()) {
  this.checkAppReady();
}

in the _reload() function.

image

I sort of believe that currently, we call checkAppReady twice (once in _reload() and once in appMovedToForeground).
Perhaps if the plugin only called checkAppReady on appMovedToForeground, it would resolve the problem?

Or perhaps I am misreading the code. I am going to check.

@WcaleNieWolny
Copy link
Contributor

It does, in fact, at least on Android, trigger checkAppReady twice, as I suggested above. I will check iOS. Perhaps making this optional would be enough to solve the timeout problem for checkAppReady?

@WcaleNieWolny
Copy link
Contributor

iOS does the same. @riderx, please let me know if you would like to see a flag to make this checkAppReady on _reload optional.

@riderx
Copy link
Collaborator

riderx commented Jan 7, 2025

The problem is realod can be called also by JS and then it need to start the check .
One advantage of doing the check in background is that the all could return to previous without even the user notice something failed

@WcaleNieWolny
Copy link
Contributor

Yeah, that is why I suggested making this a flag. By default, we would keep the current behavior as is, while users who experience this timeout would be able to set the flag that disables checkAppReady on _reload()

@riderx
Copy link
Collaborator

riderx commented Jan 7, 2025

Yea, before that i want to be able to reproduce as we never saw that happening in any of other users or any of our test, the only time i saw timeout was when our method was called after http request or others stuff who take time.
And that is a misuse, so we shouldn’t fix this but document it better

@WcaleNieWolny
Copy link
Contributor

I don't really know how to reproduce it, but after asking Claude a bit, I think we might benefit from the following
image

If we were to put checkAppReady ONLY when we are 100% that JS is active, we could avoid potential timeouts.

@WcaleNieWolny
Copy link
Contributor

The main magic is here: webView.evaluateJavascript("javascript:void(0);", new ValueCallback<String>() { This will only return IF the webview is allowed to run JS code

@riderx
Copy link
Collaborator

riderx commented Jan 7, 2025

@hakant can you provide a reproduction?

@hakant
Copy link
Author

hakant commented Jan 9, 2025

Hey @riderx and @WcaleNieWolny, thanks for thinking along!

Our understanding of the purpose of notifyAppReady is that it's a way for the bundle to report itself as healthy. Every SPA would have some sort of initialization logic and then displaying of the landing page. The amount of code that runs during initialization can depend on the size of the project but this indeed needs to be fast, otherwise not only notifyAppReady but also the users would be waiting too long. That is undesirable in any case.

We want to call notifyAppReady after critical initialization is done and preferably after the first page is successfully rendered. And in the foreground this is done in a matter of a few seconds. When the app is in the background however, this can easily take 15 seconds or more depending on what the device is doing at that moment.

I stumbled upon something the other day, maybe we can use that as an easy reproduction. One of our developers was working on an animation for the landing page and he introduced this piece of code to smoothen the animation:

await new Promise(resolve => requestAnimationFrame(resolve));

It turned out requestAnimationFrame calls are paused in browsers when they're not in the foreground.

requestAnimationFrame() calls are paused in most browsers when running in background tabs or hidden <iframe>s, in order to improve performance and battery life.

So if my notifyAppReady happened to be after this line, all my bundles would timeout in the background, but not in the foreground.

So overall point is, how the JS code executes in foreground vs background can be very different in terms of timing and behaviour. So treating them to the same notifyAppReady criteria can cause issues.

Last but not least, I understand that refreshing the bundle in the background sounds logical as user is not using the app. But quick background foreground operations lead to bad experience because user's journey gets broken. If I could configure turning off bundle updates during backgrounding, I would do that in my project.

But as @WcaleNieWolny mentioned, if we can turn off checking app ready in background that also solves this issue.

@riderx, what are your thoughts on when notifyAppReady should be called? We all agree it should be fast. But should it be the first line of Javascript in the bundle and not cover any critical initialization code ? In that case we would only be testing if we can execute javascript. What do you think?

@WcaleNieWolny
Copy link
Contributor

If I could configure turning off bundle updates during backgrounding, I would do that in my project

You can, please check the setMultiDelay call.

@hakant
Copy link
Author

hakant commented Jan 10, 2025

I was hoping the same and tried that. But this only helps with delaying. When the delay is over, update takes place with the same rules. Later I noticed the docs is clear about that:

Sets a {@link DelayCondition} array containing conditions that the Plugin will use to delay the update. After all conditions are met, the update process will run start again as usual, so update will be installed after a backgrounding or killing the app.

@riderx
Copy link
Collaborator

riderx commented Jan 10, 2025

@hakant yes we totally recommend to be the first line in your JS.
This check is here to say the bundle “load” in the app context, it’s not made to check bug in the JS execution.
That was is for capgo healthy bundle.
And we do it in the background to not break the user journey, if for any reason the bundle do not “load” we revert it before the user see a white screen.
The plugin is not made to ensure there is no bug at runtime.

@hakant
Copy link
Author

hakant commented Jan 13, 2025

@riderx that settles things then for notifyAppReady. I'll change it to make it the first line to execute and see if I can reduce appReadyTimeout by doing so.

In that case the solution 1 that I proposed in the original issue would still be useful for us to handle critical issues in our bundles. Let's say we executed notifyAppReady and everything is OK from Capgo perspective. But soon after, something crashed in my bundle on a certain device and I'm able to detect it. Would be nice if I can revert to the previous bundle and mark the current bundle as unhealthy.

So, a plugin method to set the status of a bundle to 'error' in this case. Does that make sense and doable for you ?

@WcaleNieWolny
Copy link
Contributor

@hakant do you use capgo cloud?

@WcaleNieWolny
Copy link
Contributor

The idea of allowing the plugin to reject an auto update on its own is something we could consider I think. It does, however, require some planing to ensure that this plays well with both manual mode and auto update

@hakant
Copy link
Author

hakant commented Jan 13, 2025

@WcaleNieWolny we're running our own backend and testing this solution at the moment. Auto updates are enabled on the plugin side. A manual bundle reset doesn't play nicely with auto updates as it will continuously find the new version and try to update. Not sure what happens with 'error' bundles in manual mode.

@riderx
Copy link
Collaborator

riderx commented Jan 13, 2025

If the bundle is marked errored it will stop update it

@WcaleNieWolny
Copy link
Contributor

@riderx do you think we should maintain a separate list of 'banned' versions? If, for example, the user bans the 1.0.0 version, auto-update and manual mode will NEVER download the banned version. We could also try to implement it in the existing data structures, but I am for creating a new list

@riderx
Copy link
Collaborator

riderx commented Jan 13, 2025

We have already something with the errored version, i wonder what happen in manual mode if you try to add again a version present it auto remove?

@hakant
Copy link
Author

hakant commented Jan 13, 2025

Imho, manual mode should allow setting a bundle that was earlier marked as 'error'. Folks might be using this in their bundle testing process (we do). We let testers download any bundle they want and test it within an app shell. On prod scenarios devs can list bundles and check their statuses and do what's necessary ?

On the other hand it makes sense that auto updates don't download and set those error bundles anymore.

My 2 cents if that makes sense..

@WcaleNieWolny
Copy link
Contributor

WcaleNieWolny commented Jan 16, 2025

We have already something with the errored version, i wonder what happen in manual mode if you try to add again a version present it auto remove?

Well... I am glad you asked, because I just found a bug.
In manual mode, if the download fails, the bundle will be saved in preferences. (saveBundleInfo). On Android it also gets saved gets added into DownloadWorkerManager.activeVersions. If you try redownloading the same bundle, it will fail in a very interesting way. Here is what the 2nd download call will return:

{
    "id": "O7oUbtTLv7",
    "version": "1.0.0-manual",
    "downloaded": "2025-01-16T05:03:58.182+0000",
    "checksum": "",
    "status": "downloading"
}

Well, in fact, the version isn't 'downloading.' In logcat, I see the log: Version 1.0.0-manual is already downloading. It will just return, forever leaving the bundle in a downloading state.

If you kill the app and try downloading again, the app will retry downloading, and if it fails again, it will save the bundle in DownloadWorkerManager.activeVersions

If you were to take a look at the preferences database, you would find the following:
image

Once a bundle download fails, it will NEVER be removed from preferences. Frankly, I think that the current implementation of list is atrocious. We should have a way to clean up the preferences.
image

Why would we ever limit the list to bundles that are successfully downloaded?

@WcaleNieWolny
Copy link
Contributor

iOS will not care and always try redownload, even in manual mode. However, it also will leak the preferences (UserDefaults.standard). I just checed via print(UserDefaults.standard.dictionaryRepresentation()) and I found multiple key-value pais coresponding to failed bundle info

@WcaleNieWolny
Copy link
Contributor

At some point, we need to clean this system up. We keep adding new features that make the bundle info storage more and more complex. Multiple service-like components of the updater plugin interact with one another in unpredictable ways.

@hakant
Copy link
Author

hakant commented Jan 21, 2025

@riderx @WcaleNieWolny we spoke about a whole bunch of stuff in this issue 😄. Shall I create a new focused issue about the item below ? I was going to suggest closing this one but I see @WcaleNieWolny started linking it to other places. What do you folks prefer ?

Expose a JS plugin method that allows developers to programmatically change a bundle's status to an error state.

@riderx
Copy link
Collaborator

riderx commented Jan 21, 2025

Correct, i close this one, and made one for the feature requested.
We don't have bandwith to do it this week but probably next week

@riderx riderx closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants