Defer initialization and wait for defaultKeyStates #91

kidroca · 2021-07-22T22:35:05Z

Details

Make connections wait for Onyx.init

Some connections start really early before Onyx.init is even called
They should wait as Onyx.init adds some initial values that might be needed
Also waiting helps as every call uses getAllKeys and the keys are retrieved during init - if connections wait, every following connection would instantly retrieve them from cache, otherwise even though they hook to the same storage read they still create a new promise

Read defaults in one go and keep them in cache

Instead of reading default keys, then merging default values to them and then writing them back in storage
We read the default keys in one go, and then write the merged result to cache
We also make sure to never remove the default keys from cache - they don't take much space and are used often

Related Issues

Expensify/App#2667

Automated Tests

Covered by existing tests

Linked PRs

Benchmarks

Android Before

	Run 1	Run 2	Run 3	Run 4	Run 5
Total	124.06sec	118.71sec	118.43sec	116.57sec	114.24sec
Last call ended at	8.26sec	7.83sec	7.25sec	7.13sec	7.89sec

Full data: https://docs.google.com/spreadsheets/d/1lEZ1dZD77D6zUQgB5A6-QTlX3JdVcyARThKh7-dru_4/edit?usp=sharing

Android After

	Run 1	Run 2	Run 3	Run 4	Run 5
Total	92.22sec	92.44sec	92.15sec	89.81sec	91.98sec
Last call ended at	7.77sec	7.41sec	7.03sec	7.71sec	7.55sec

Full data: https://docs.google.com/spreadsheets/d/1USrACRs5grnnBZwL9y2eU-QNLr8y1tWzacMCd9hFSWo/edit?usp=sharing

iOS Before

	Run 1	Run 2	Run 3	Run 4	Run 5
Total	87.91sec	87.41sec	87.21sec	86.29sec	86.18sec
Last call ended at	4.96sec	4.47sec	4.47sec	4.77sec	4.71sec

Full data: https://docs.google.com/spreadsheets/d/1quU0R5xhRWmH3xsatU-uUMOIPqSy7v7rlZobjwtJk38/edit?usp=sharing

iOS After

	Run 1	Run 2	Run 3	Run 4	Run 5
Total	66.36sec	61.48sec	58.39sec	67.08sec	65.94sec
Last call ended at	4.63sec	4.76sec	4.72sec	4.78sec	4.90sec

Full data: https://docs.google.com/spreadsheets/d/1B4k4lidFnIZgngKmjI5EW7MTZ8ORqEUqpdG8taoZtJU/edit?usp=sharing

Benchmark info:

Add global.Onyx = Onyx in Eexpensify.js
Open the dev menu and make sure dev=false
Disable all breakpoints in the debugger
Wait for any syncing (top right corner) to end before starting any tests

Test

Reload the app
Wait processes to settle
Run Onyx.printMetrics({ format: 'csv', raw: true })
Copy the results to file
Import file as new sheet in excel

Run 7 times
Remove the fastest and the slowest runs

All benchmarks were performed against this hash in Expensify/App: Expensify/App@fc965f6

`isCollectionKey(mapping.key)` and `isSafeEvictionKey(mapping.key)` are not working correctly in this case since neither `onyxKeys` or `safeEvictionKeys` are set This also helps other logic running during init, as connections would wait a bit and allow the rest of the app to initialize

The variable `defaultKeyStates` is already preserved in memory for the life of the application, there's no need to keep it in hard storage as well Instead of reading the keys, then merging the defaults and then write them back, the logic becomes - read the keys (in one go) and merge with the defaults in cache, skipping the write This way we've prefilled the cache with some storage data that is immediately needed during init

kidroca

Added some explanations for the changes

lib/Onyx.js

marcaaron

These changes look fine and make sense. But I'm curious to get your interpretation of the data collected. Does this improve performance in some way?

lib/Onyx.js

marcaaron · 2021-07-23T00:32:46Z

lib/Onyx.js

+        addAllSafeEvictionKeysToRecentlyAccessedList(),
+        initializeWithDefaultKeyStates()
+    ])
+        .then(deferredInit.resolve);


It took me some time to wrap my head around this but this is a pretty cool trick we are doing with the deferred task stuff. I'm curious, how much better this would be compared to making Onyx.init() itself a promise and calling it before allowing any connections to happen? Does it take a significant amount of time to run this code?

Does it take a significant amount of time to run this code?

Before the change the initial get calls would start at 2ms past app launch and resolve at ~250ms
After the change they would start at 212ms and resolve at 255ms

So even though we delay things by 200ms they resolve at the same time, the speed up comes from the rest of the calls happening and in the end the last storage call is resolved 0.3-0.7 sec earlier

I'm curious, how much better this would be compared to making Onyx.init() itself a promise and calling it before allowing any connections to happen?

Calling Onyx.init before any other Onyx calls would require many changes - every usage that is not a component based withOnyx can happen before Onyx.init is called, depending on script load order
Even if we decide to do so it would not bring anything more to the table in terms of performance

the speed up comes from the rest of the calls happening

Hmm which calls? Sorry not sure I understand. Could you maybe say it another way?

in the end the last storage call is resolved 0.3-0.7 sec earlier

It looks like iOS experienced almost no changes (and in some cases got worse)? Does that sound right? If so, can you account for the inconsistency there?

Calling Onyx.init before any other Onyx calls would require many changes - every usage that is not a component based withOnyx can happen before Onyx.init is called, depending on script load order

👍

lib/Onyx.js

marcaaron · 2021-07-23T00:38:05Z

lib/createDeferredTask.js

+    const deferred = {};
+    deferred.promise = new Promise((res, rej) => {
+        deferred.resolve = res;
+        deferred.reject = rej;


Let's implement this or remove it?

What do you mean the reject functionality?
It should be up to the caller - if they need to reject they should have the option exposed here

All the deferred tasks provide the reject/resolve/promise interface - this is the standard pattern

It should be up to the caller - if they need to reject they should have the option exposed here

I get what you are saying but I only see one caller. As a general rule, we either implement code or we remove it.

kidroca · 2021-07-23T17:41:54Z

Updated, ready for another review

marcaaron · 2021-07-24T00:39:30Z

Left a couple more comments. Changes look good and looks like there will be some improvement for Android.

Not a blocker, but I am concerned about the results on iOS and curious why there is inconsistency there.

@timszot feel free to give this a look 🙇

kidroca · 2021-07-24T20:18:47Z

Not a blocker, but I am concerned about the results on iOS and curious why there is inconsistency there.

If the reason is "Last call ended at" metrics, I find those to depend on network calls and render as well and that they're not a good indication of performance gains/regression - the time there can vary by up to a second for two consecutive runs

Maybe we can identify a point in loading that does not depend on factors as network and use that as a reference

I'll update with the requested changes on Monday

kidroca · 2021-07-26T11:59:27Z

Ready: Addressed all requested changes

marcaaron · 2021-07-26T17:51:19Z

I find those to depend on network calls and render as well and that they're not a good indication of performance gains/regression - the time there can vary by up to a second for two consecutive runs

Ah so the data is not reliable since network calls can interfere with their accuracy? Is the same true for Android?

Maybe we can identify a point in loading that does not depend on factors as network and use that as a reference

Perhaps we can test Onyx in isolation somehow, by creating a sample app or "test" page in our main app for running diagnostics without making API calls? Anyways, I don't doubt that this PR will be helpful and I'm not against merging it so we can move on to the other changes, but feels like the benchmarks could be improved.

kidroca · 2021-07-26T19:52:29Z

Ah so the data is not reliable since network calls can interfere with their accuracy? Is the same true for Android?

Yes, now when I've inspected the data again, the "Last call ended at" is from Onyx.set and sometimes from Onyx.mergeCollection

Perhaps we can test Onyx in isolation somehow, by creating a sample app or "test" page in our main app for running diagnostics without making API calls?

I've made something similar working on the ticket here: Expensify/App#2667 but it only tests AsyncStorage and it turned out it works very fast in isolation
It might help to have a dedicated test page or app, but IMO the problem in Expensify is due to the many things that happen at the same time, and if we improve the parts involved we should see an improvement in general.
E.g. if we just take the ReportScreen (and it's children) and make a test out of it, it would probably render and work a lot faster, because the rest of the app is not there

My next PR is on Onyx.set and preventing duplicate writes.
This reduces (depending on chat/device) the time spent in writes from 30sec to 10sec, which can be measured by the existing benchmark, besides the 30 to 10 improvement this should give some more freedom to the rest of the processes that happen at that time

As a summary I think it would be best to capture more metrics in Expensify, but only such that give consistent result like the total time spent in Onyx.set this is very repeatable and varies only if you use different chat
Maybe we can hook the benchmark for withOnyx time spent waiting and time spent rendering, this should also provide consistent results given the same chat

marcaaron · 2021-07-27T02:18:02Z

it only tests AsyncStorage and it turned out it works very fast in isolation

So, it sounds like at least some of Onyx's "waiting time" is time it spends waiting for other JS calls to finish?

Maybe we can hook the benchmark for withOnyx time spent waiting and time spent rendering, this should also provide consistent results given the same chat

Could work, but seems like we also would still have that same problem.

marcaaron · 2021-07-27T02:19:48Z

the problem in Expensify is due to the many things that happen at the same time, and if we improve the parts involved we should see an improvement in general

I think I'm starting to see this as well. The app tries to do too many things simultaneously and everything ends up waiting for everything else.

timszot · 2021-07-27T21:28:27Z

@marcaaron looked this over and agree with what's being said in the comments. Not sure what the next steps currently are here though.

marcaaron · 2021-07-27T21:47:16Z

I think we should leave this PR open until there is a way to prove it has value in one form or another.
We generally don't merge code that adds complexity without value. I think this change has value but it feels more like it's solving a problem we don't have yet. Which is something like:

Some connections start really early before Onyx.init is even called
They should wait as Onyx.init adds some initial values that might be needed

That makes sense but there's no open issue for that problem and I don't see the relationship to performance.

@kidroca Can we get started with the next PR please? And focus on something that helps performance?

kidroca · 2021-07-28T08:03:44Z

I think we should leave this PR open until there is a way to prove it has value in one form or another.

What do you mean? Isn't 30sec less time spent in Onyx enough of an evidence?

That makes sense but there's no open issue for that problem and I don't see the relationship to performance.

It's a fix by coincidence
Eviction keys are set during init and are used in the connect logic, if connect happens before init eviction related keys are not set yet so checks like isSafeEvictionKey aren't working correctly

@kidroca Can we get started with the next PR please? And focus on something that helps performance?

How would we prove the value of the remaining PRs when they use the same metric (time spent) - if it's unreliable for this PR surely it would be unreliable for the rest?

marcaaron · 2021-07-28T17:56:26Z

What do you mean? Isn't 30sec less time spent in Onyx enough of an evidence?

I went off of how long it takes the last Onyx call to finish and seems like there is not much difference (and in some cases taking longer e.g. iOS). So, I presumed this PR would help reduce boot time (when Onyx init is called) or help in some other obvious way. But it looks like Onyx finishes working at or around the same time as before. Can you help explain what are we achieving?

It's a fix by coincidence Eviction keys are set during init and are used in the connect logic, if connect happens before init eviction related keys are not set yet so checks like isSafeEvictionKey aren't working correctly

Right, I think the change makes sense for those reasons (meaning we can merge it with an appropriate linked issue). It might be a perfectly reasonable thing to do, but doesn't really match up with the problem we set out to solve and our top focus which is performance.

How would we prove the value of the remaining PRs when they use the same metric (time spent) - if it's unreliable for this PR surely it would be unreliable for the rest?

To clarify, are you saying that you agree that the metrics are flawed and that we can't prove that any future changes we're making here are valuable? Or did you want to work together to create some better metrics?

quinthar · 2021-07-28T18:00:38Z

I like the idea of being guided by @kidroca's overall metric of time spent waiting on Onyx, even though it does not directly correlate to user time. Sorta like how when you use top there's a difference between wall clock and system clock time, "wall clock" is how long the user waits -- but even if the user waits the same amount of time, making "system clock" time go down is helpful.

Granted, "wall clock" (ie, user wait time) is by far the more important, but if we can measurably reduce system time without introducing complexity, then I think we should take those wins as we get them with the belief that efficiency floats all boats in the end.

marcaaron · 2021-07-28T18:29:41Z

Sorta like how when you use top there's a difference between wall clock and system clock time, "wall clock" is how long the user waits -- but even if the user waits the same amount of time, making "system clock" time go down is helpful.

I understand this theory, but it seems more practical to prioritize "user time" over "system time".
Why would we make the improvements that are difficult to observe first?

quinthar · 2021-07-28T18:31:46Z

I understand this theory, but it seems more practical to prioritize "user time" over "system time".
Why would we make the improvements that are difficult to observe first?

I agree we should prioritize user time over system time, but I'm suggesting we want both. However, that assumes there's no real downside to merging this. Is there a downside you are concerned about?

I 100% agree we shouldn't ship anything that has no defined/measurable benefit. But this is a measurable benefit, to system time, just not to user time.

marcaaron · 2021-07-28T18:43:33Z

Is there a downside you are concerned about?

If the bar for merging changes is "has no downsides" then we'd be making a lot of useless changes that have no downsides. But the answer is no, I think the changes are great and valuable for other reasons besides performance that may not be priorities. But I'm confused about the metrics and what they are showing.

I'm mainly trying to:

understand how this will make things go faster
get us to prioritize things that will make things go faster

quinthar · 2021-07-28T18:49:22Z

The bar is "has a measurable benefit AND has no downsides". I think we are just debating on whether or not "reducing system time" is a benefit if it doesn't also reduce user time. I think we obviously would prefer reducing user time, but if we have a change that "only" reduces system time and doesn't have any downsides, I don't see why we wouldn't take that small win.

…

On Wed, Jul 28, 2021 at 11:43 AM Marc Glasser ***@***.***> wrote: Is there a downside you are concerned about? If the bar for merging changes is "has no downsides" then we'd be making a lot of useless changes that have no downsides. But the answer is no, I think the changes are great and valuable for other reasons besides performance that may not be priorities. But I'm confused about the metrics and what they are showing. I'm mainly trying to: - understand how this will make things go faster - get us to prioritize things that will make things go faster — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#91 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEMNURDAX45RYRXXF2TGB3T2BFV7ANCNFSM5A2475MA> .

kidroca · 2021-07-28T19:12:31Z

To clarify, are you saying that you agree that the metrics are flawed and that we can't prove that any future changes we're making here are valuable?

Only the "Last call ended at" is not very reliable, it helps but only if you know what you're doing.
For the init tests it can vary due to to everything that's going on
But if you use it to benchmark chat A to B switch time it works well

The "time spent" metric is reliable, both for total and individual method calls. So I see an improvement here even though it doesn't seem to improve boot time

Or did you want to work together to create some better metrics?

I'm trying different things to capture better metrics. When I find something that works I'll bring it up.

marcaaron · 2021-07-28T19:38:52Z

I think we are just debating on whether or not "reducing system time" is a benefit if it doesn't also reduce user time

I agree that reducing "system time" is valuable on its own and I think you agree that it it's more valuable to reduce "user time".

if we have a change that "only" reduces system time and doesn't have any downsides, I don't see why we wouldn't take that small win.

The problem I'm experiencing is that I'm not sure how to prioritize things that only reduce system time and don't add complexity.

If this is not the highest value performance improvement we could make right now or its value is not well understood my instinct is to suggest we find something better to work on first. We can take any small wins that have no downsides, but it feels like we are digging in our couch for spare change or something.

kidroca · 2021-07-28T20:06:47Z

The task I'm working on is to optimize AsyncStorage usage. This won't necessarily fix boot time, but helps with everything in general.

The total time spent was decreased by 1/4.
3 more PRs like this and it will be 0 😁

I've made many experiments and managed to speed up reading - with batching, but even then boot time didn't drop significantly, but things like chat switching are becoming faster.
Onyx or at least data access might not be what's causing the slow down. I'm leaning more and more towards withOnyx like the other research discovered. But in the mean time why not take the improvements we have.

The thing is measuring chat switch is hard to do manually. But since app init is loading a chat from storage, benchmarking it provides data that captures improvements in that direction as well

marcaaron · 2021-07-28T20:21:39Z

things like chat switching are becoming faster

Now you are speaking my language! Where is that PR? Let's do it!! 😂

The thing is measuring chat switch is hard to do manually

I agree it's hard to do (and also time consuming if you are doing release builds - which I think is the most reliable way). I've spent the entire week doing benchmarks with the production API and release builds on iOS and Android. So, let me know if I can help test anything.

kidroca · 2021-07-28T20:59:28Z

Have you tried the hash 93c30be with all the changes from this PR: #88

Unfortunately since then "total time" have increased, a lot due to withLocalize, but the hash should give you a glimpse of the end result and chat switch speed

I think all the changes complement each other, but for chat switching the set and cache improvement should be the most valuable

What could speed boot time is persisting the recent access list and loading it with a single get during init (@quinthar had the idea some time ago). I can try it (the changes needed are in one of the following PRs) and report what I've found. But then we'll definitely have to defer init so that we capture the full benefit of pre-filling cache, so we might as well merge the defer logic now

marcaaron · 2021-07-28T21:11:31Z

Merged this since I don't have any reason to block and agree with the deferred Onyx connections stuff regardless of performance.

I think all the changes complement each other, but for chat switching the set and cache improvement should be the most valuable

Sorry if I'm coming off as skeptical here. I just prefer to have a clear idea which single action we are taking has the greatest impact on performance.

we'll definitely have to defer init so that we capture the full benefit of pre-filling cache

That's true. I'm also curious to do an experiment where we do a deferred init then a big multiGet with ALL of the available data and never clear the cache at all. I'd imagine that would reduce virtually all movement across the bridge. Not sure if that would be a good idea or not but I'd expect massive improvements in chat switching since we will effectively spend 0 time crossing the bridge. If Onyx's biggest bottleneck is AsyncStorage - should we try to make a version of Onyx that doesn't ever interact with AsyncStorage and see how the app runs?

kidroca · 2021-07-28T21:41:42Z

Yes we can get all keys and then multi get the values. That's pretty easy to test and should review how much the app can improve with just changes in Onyx

That's pretty much how Redux works with local storage

During init the Redux store is created and hydrated with everything from storage
Then when a change happens it's applied to data in memory and scheduled to be persisted in storage

Now with Expensify there are concerns that data might become too much, so a hybrid approach can be made where we hydrate everything but chat messages...

If Onyx's biggest bottleneck is AsyncStorage - should we try to make a version of Onyx that doesn't ever interact with AsyncStorage and see how the app runs?

I think if we read evening at start and then only write to storage the effect would be the same
The bottleneck IMO is caused by withOnyx and this "get all during init" might prove it if even then boot does not improve

quinthar · 2021-07-28T22:19:55Z

Yes, we are aiming to build a platform that has shit tons of data stored locally -- let's not limit ourselves to what fits in RAM.

…

On Wed, Jul 28, 2021 at 2:41 PM Peter Velkov ***@***.***> wrote: Yes we can get all keys and then multi get the values. That's pretty easy to test and should review how much the app can improve with just changes in Onyx That's pretty much how Redux works with local storage During init the Redux store is created and hydrated with everything from storage Then when a change happens it's applied to data in memory and scheduled to be persisted in storage Now with Expensify there are concerns that data might become too much, so a hybrid approach can be made where we hydrate everything but chat messages... If Onyx's biggest bottleneck is AsyncStorage - should we try to make a version of Onyx that doesn't ever interact with AsyncStorage and see how the app runs? I think if we read evening at start and then only write to storage the effect would be the same The bottleneck IMO is caused by withOnyx and this "get all during init" might prove it if even then boot does not improve — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#91 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEMNUTGINFG6AJFCMNDI4DT2B2SFANCNFSM5A2475MA> .

marcaaron · 2021-07-28T22:32:54Z

Yes, we are aiming to build a platform that has shit tons of data stored locally -- let's not limit ourselves to what fits in RAM.

Ah sorry, to clarify, the suggestion was to experimentally do this. Then we can see how the app runs without a dependency on AsyncStorage and deduce how much room for improvement there is.

marcaaron · 2021-07-29T18:31:20Z

lib/Onyx.js

 */
 function initializeWithDefaultKeyStates() {
-    _.each(defaultKeyStates, (state, key) => merge(key, state));
+    return AsyncStorage.multiGet(_.keys(defaultKeyStates))


Just caught this. But do the "after" metrics in the description include this time at all?

Yes, because we decorate and track this method here:
https://github.com/kidroca/react-native-onyx/blob/9549e4057abaf2a8f96bad1f255acd1c0b002cd2/lib/Onyx.js#L809
It's tracked under Onyx:defaults and adds time against "total time"

Ah ok cool. Thanks.

kidroca added 5 commits July 23, 2021 01:05

refactor: extract a createDeferredTask helper

333b8db

perf: Don't clean defaults from cache

73cbac4

Wait for both adding all safe eviction keys and initialize with defaults

fa1f370

kidroca requested a review from a team as a code owner July 22, 2021 22:35

MelvinBot requested review from timszot and removed request for a team July 22, 2021 22:35

kidroca commented Jul 22, 2021

View reviewed changes

lib/Onyx.js Show resolved Hide resolved

lib/Onyx.js Outdated Show resolved Hide resolved

lib/Onyx.js Show resolved Hide resolved

lib/Onyx.js Show resolved Hide resolved

marcaaron requested changes Jul 23, 2021

View reviewed changes

kidroca requested a review from marcaaron July 23, 2021 17:41

marcaaron assigned kidroca Jul 24, 2021

kidroca added 2 commits July 26, 2021 14:38

Update deferred init definition

96faca1

refactor: Remove deferred.reject since it's not used

9549e40

kidroca force-pushed the kidroca/perf-default-key-states branch from 7960695 to 9549e40 Compare July 26, 2021 11:44

marcaaron changed the title ~~Defer initialization and wait for defaultKeyStates~~ [HOLD] Defer initialization and wait for defaultKeyStates Jul 27, 2021

marcaaron approved these changes Jul 28, 2021

View reviewed changes

marcaaron changed the title ~~[HOLD] Defer initialization and wait for defaultKeyStates~~ Defer initialization and wait for defaultKeyStates Jul 28, 2021

marcaaron merged commit 61e6888 into Expensify:master Jul 28, 2021

marcaaron reviewed Jul 29, 2021

View reviewed changes

Defer initialization and wait for defaultKeyStates #91

Defer initialization and wait for defaultKeyStates #91

Conversation

kidroca commented Jul 22, 2021 • edited Loading

Details

Make connections wait for Onyx.init

Read defaults in one go and keep them in cache

Related Issues

Automated Tests

Linked PRs

Benchmarks

Android Before

Android After

iOS Before

iOS After

Benchmark info:

Test

kidroca left a comment

Choose a reason for hiding this comment

marcaaron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kidroca commented Jul 23, 2021

marcaaron commented Jul 24, 2021

kidroca commented Jul 24, 2021

kidroca commented Jul 26, 2021

marcaaron commented Jul 26, 2021

kidroca commented Jul 26, 2021 • edited Loading

marcaaron commented Jul 27, 2021

marcaaron commented Jul 27, 2021

timszot commented Jul 27, 2021

marcaaron commented Jul 27, 2021

kidroca commented Jul 28, 2021

marcaaron commented Jul 28, 2021

quinthar commented Jul 28, 2021

marcaaron commented Jul 28, 2021

quinthar commented Jul 28, 2021

marcaaron commented Jul 28, 2021

quinthar commented Jul 28, 2021 via email

kidroca commented Jul 28, 2021

marcaaron commented Jul 28, 2021

kidroca commented Jul 28, 2021

marcaaron commented Jul 28, 2021

kidroca commented Jul 28, 2021 • edited Loading

marcaaron commented Jul 28, 2021

kidroca commented Jul 28, 2021

quinthar commented Jul 28, 2021 via email

marcaaron commented Jul 28, 2021

Choose a reason for hiding this comment

kidroca Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kidroca commented Jul 22, 2021 •

edited

Loading

kidroca commented Jul 26, 2021 •

edited

Loading

kidroca commented Jul 28, 2021 •

edited

Loading

kidroca Jul 29, 2021 •

edited

Loading