Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer initialization and wait for defaultKeyStates #91

Merged
merged 7 commits into from
Jul 28, 2021

Conversation

kidroca
Copy link
Contributor

@kidroca kidroca commented Jul 22, 2021

@marcaaron

Details

Make connections wait for Onyx.init

Some connections start really early before Onyx.init is even called
They should wait as Onyx.init adds some initial values that might be needed
Also waiting helps as every call uses getAllKeys and the keys are retrieved during init - if connections wait, every following connection would instantly retrieve them from cache, otherwise even though they hook to the same storage read they still create a new promise

Read defaults in one go and keep them in cache

Instead of reading default keys, then merging default values to them and then writing them back in storage
We read the default keys in one go, and then write the merged result to cache
We also make sure to never remove the default keys from cache - they don't take much space and are used often

Related Issues

Expensify/App#2667

Automated Tests

Covered by existing tests

Linked PRs

Benchmarks

Android Before
Run 1 Run 2 Run 3 Run 4 Run 5
Total 124.06sec 118.71sec 118.43sec 116.57sec 114.24sec
Last call ended at 8.26sec 7.83sec 7.25sec 7.13sec 7.89sec
Android After
Run 1 Run 2 Run 3 Run 4 Run 5
Total 92.22sec 92.44sec 92.15sec 89.81sec 91.98sec
Last call ended at 7.77sec 7.41sec 7.03sec 7.71sec 7.55sec
iOS Before
Run 1 Run 2 Run 3 Run 4 Run 5
Total 87.91sec 87.41sec 87.21sec 86.29sec 86.18sec
Last call ended at 4.96sec 4.47sec 4.47sec 4.77sec 4.71sec
iOS After
Run 1 Run 2 Run 3 Run 4 Run 5
Total 66.36sec 61.48sec 58.39sec 67.08sec 65.94sec
Last call ended at 4.63sec 4.76sec 4.72sec 4.78sec 4.90sec

Benchmark info:

Add global.Onyx = Onyx in Eexpensify.js
Open the dev menu and make sure dev=false
Disable all breakpoints in the debugger
Wait for any syncing (top right corner) to end before starting any tests

Test  
  1. Reload the app
  2. Wait processes to settle
  3. Run Onyx.printMetrics({ format: 'csv', raw: true })  
  4. Copy the results to file
  5. Import file as new sheet in excel
     
    Run 7 times
    Remove the fastest and the slowest runs

All benchmarks were performed against this hash in Expensify/App: Expensify/App@fc965f6

kidroca added 5 commits July 23, 2021 01:05
`isCollectionKey(mapping.key)` and `isSafeEvictionKey(mapping.key)` are not working
correctly in this case since neither `onyxKeys` or `safeEvictionKeys` are set

This also helps other logic running during init, as connections would wait a bit
and allow the rest of the app to initialize
The variable `defaultKeyStates` is already preserved in memory
for the life of the application, there's no need to keep it
in hard storage as well

Instead of reading the keys, then merging the defaults and then write
them back, the logic becomes - read the keys (in one go) and
merge with the defaults in cache, skipping the write

This way we've prefilled the cache with some storage data
that is immediately needed during init
@kidroca kidroca requested a review from a team as a code owner July 22, 2021 22:35
@MelvinBot MelvinBot requested review from timszot and removed request for a team July 22, 2021 22:35
Copy link
Contributor Author

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some explanations for the changes

lib/Onyx.js Show resolved Hide resolved
lib/Onyx.js Outdated Show resolved Hide resolved
lib/Onyx.js Show resolved Hide resolved
lib/Onyx.js Show resolved Hide resolved
Copy link
Contributor

@marcaaron marcaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look fine and make sense. But I'm curious to get your interpretation of the data collected. Does this improve performance in some way?

lib/Onyx.js Outdated Show resolved Hide resolved
lib/Onyx.js Outdated
addAllSafeEvictionKeysToRecentlyAccessedList(),
initializeWithDefaultKeyStates()
])
.then(deferredInit.resolve);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me some time to wrap my head around this but this is a pretty cool trick we are doing with the deferred task stuff. I'm curious, how much better this would be compared to making Onyx.init() itself a promise and calling it before allowing any connections to happen? Does it take a significant amount of time to run this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it take a significant amount of time to run this code?

Before the change the initial get calls would start at 2ms past app launch and resolve at ~250ms
After the change they would start at 212ms and resolve at 255ms

So even though we delay things by 200ms they resolve at the same time, the speed up comes from the rest of the calls happening and in the end the last storage call is resolved 0.3-0.7 sec earlier

I'm curious, how much better this would be compared to making Onyx.init() itself a promise and calling it before allowing any connections to happen?

Calling Onyx.init before any other Onyx calls would require many changes - every usage that is not a component based withOnyx can happen before Onyx.init is called, depending on script load order
Even if we decide to do so it would not bring anything more to the table in terms of performance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the speed up comes from the rest of the calls happening

Hmm which calls? Sorry not sure I understand. Could you maybe say it another way?

in the end the last storage call is resolved 0.3-0.7 sec earlier

It looks like iOS experienced almost no changes (and in some cases got worse)? Does that sound right? If so, can you account for the inconsistency there?

Calling Onyx.init before any other Onyx calls would require many changes - every usage that is not a component based withOnyx can happen before Onyx.init is called, depending on script load order

👍

lib/Onyx.js Outdated Show resolved Hide resolved
const deferred = {};
deferred.promise = new Promise((res, rej) => {
deferred.resolve = res;
deferred.reject = rej;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's implement this or remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean the reject functionality?
It should be up to the caller - if they need to reject they should have the option exposed here

All the deferred tasks provide the reject/resolve/promise interface - this is the standard pattern

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be up to the caller - if they need to reject they should have the option exposed here

I get what you are saying but I only see one caller. As a general rule, we either implement code or we remove it.

@kidroca kidroca requested a review from marcaaron July 23, 2021 17:41
@kidroca
Copy link
Contributor Author

kidroca commented Jul 23, 2021

Updated, ready for another review

@marcaaron
Copy link
Contributor

Left a couple more comments. Changes look good and looks like there will be some improvement for Android.

Not a blocker, but I am concerned about the results on iOS and curious why there is inconsistency there.

@timszot feel free to give this a look 🙇

@kidroca
Copy link
Contributor Author

kidroca commented Jul 24, 2021

Not a blocker, but I am concerned about the results on iOS and curious why there is inconsistency there.

If the reason is "Last call ended at" metrics, I find those to depend on network calls and render as well and that they're not a good indication of performance gains/regression - the time there can vary by up to a second for two consecutive runs

Maybe we can identify a point in loading that does not depend on factors as network and use that as a reference

I'll update with the requested changes on Monday

@kidroca kidroca force-pushed the kidroca/perf-default-key-states branch from 7960695 to 9549e40 Compare July 26, 2021 11:44
@kidroca
Copy link
Contributor Author

kidroca commented Jul 26, 2021

Ready: Addressed all requested changes

@marcaaron
Copy link
Contributor

I find those to depend on network calls and render as well and that they're not a good indication of performance gains/regression - the time there can vary by up to a second for two consecutive runs

Ah so the data is not reliable since network calls can interfere with their accuracy? Is the same true for Android?

Maybe we can identify a point in loading that does not depend on factors as network and use that as a reference

Perhaps we can test Onyx in isolation somehow, by creating a sample app or "test" page in our main app for running diagnostics without making API calls? Anyways, I don't doubt that this PR will be helpful and I'm not against merging it so we can move on to the other changes, but feels like the benchmarks could be improved.

@kidroca
Copy link
Contributor Author

kidroca commented Jul 26, 2021

Ah so the data is not reliable since network calls can interfere with their accuracy? Is the same true for Android?

Yes, now when I've inspected the data again, the "Last call ended at" is from Onyx.set and sometimes from Onyx.mergeCollection

Perhaps we can test Onyx in isolation somehow, by creating a sample app or "test" page in our main app for running diagnostics without making API calls?

I've made something similar working on the ticket here: Expensify/App#2667 but it only tests AsyncStorage and it turned out it works very fast in isolation
It might help to have a dedicated test page or app, but IMO the problem in Expensify is due to the many things that happen at the same time, and if we improve the parts involved we should see an improvement in general.
E.g. if we just take the ReportScreen (and it's children) and make a test out of it, it would probably render and work a lot faster, because the rest of the app is not there

My next PR is on Onyx.set and preventing duplicate writes.
This reduces (depending on chat/device) the time spent in writes from 30sec to 10sec, which can be measured by the existing benchmark, besides the 30 to 10 improvement this should give some more freedom to the rest of the processes that happen at that time

As a summary I think it would be best to capture more metrics in Expensify, but only such that give consistent result like the total time spent in Onyx.set this is very repeatable and varies only if you use different chat
Maybe we can hook the benchmark for withOnyx time spent waiting and time spent rendering, this should also provide consistent results given the same chat

@marcaaron
Copy link
Contributor

it only tests AsyncStorage and it turned out it works very fast in isolation

So, it sounds like at least some of Onyx's "waiting time" is time it spends waiting for other JS calls to finish?

Maybe we can hook the benchmark for withOnyx time spent waiting and time spent rendering, this should also provide consistent results given the same chat

Could work, but seems like we also would still have that same problem.

@marcaaron
Copy link
Contributor

the problem in Expensify is due to the many things that happen at the same time, and if we improve the parts involved we should see an improvement in general

I think I'm starting to see this as well. The app tries to do too many things simultaneously and everything ends up waiting for everything else.

@timszot
Copy link
Contributor

timszot commented Jul 27, 2021

@marcaaron looked this over and agree with what's being said in the comments. Not sure what the next steps currently are here though.

@marcaaron
Copy link
Contributor

I think we should leave this PR open until there is a way to prove it has value in one form or another.
We generally don't merge code that adds complexity without value. I think this change has value but it feels more like it's solving a problem we don't have yet. Which is something like:

Some connections start really early before Onyx.init is even called
They should wait as Onyx.init adds some initial values that might be needed

That makes sense but there's no open issue for that problem and I don't see the relationship to performance.

@kidroca Can we get started with the next PR please? And focus on something that helps performance?

@marcaaron marcaaron changed the title Defer initialization and wait for defaultKeyStates [HOLD] Defer initialization and wait for defaultKeyStates Jul 27, 2021
@kidroca
Copy link
Contributor Author

kidroca commented Jul 28, 2021

I think we should leave this PR open until there is a way to prove it has value in one form or another.

What do you mean? Isn't 30sec less time spent in Onyx enough of an evidence?

That makes sense but there's no open issue for that problem and I don't see the relationship to performance.

It's a fix by coincidence
Eviction keys are set during init and are used in the connect logic, if connect happens before init eviction related keys are not set yet so checks like isSafeEvictionKey aren't working correctly

@kidroca Can we get started with the next PR please? And focus on something that helps performance?

How would we prove the value of the remaining PRs when they use the same metric (time spent) - if it's unreliable for this PR surely it would be unreliable for the rest?

@marcaaron
Copy link
Contributor

What do you mean? Isn't 30sec less time spent in Onyx enough of an evidence?

I went off of how long it takes the last Onyx call to finish and seems like there is not much difference (and in some cases taking longer e.g. iOS). So, I presumed this PR would help reduce boot time (when Onyx init is called) or help in some other obvious way. But it looks like Onyx finishes working at or around the same time as before. Can you help explain what are we achieving?

It's a fix by coincidence Eviction keys are set during init and are used in the connect logic, if connect happens before init eviction related keys are not set yet so checks like isSafeEvictionKey aren't working correctly

Right, I think the change makes sense for those reasons (meaning we can merge it with an appropriate linked issue). It might be a perfectly reasonable thing to do, but doesn't really match up with the problem we set out to solve and our top focus which is performance.

How would we prove the value of the remaining PRs when they use the same metric (time spent) - if it's unreliable for this PR surely it would be unreliable for the rest?

To clarify, are you saying that you agree that the metrics are flawed and that we can't prove that any future changes we're making here are valuable? Or did you want to work together to create some better metrics?

@quinthar
Copy link

I like the idea of being guided by @kidroca's overall metric of time spent waiting on Onyx, even though it does not directly correlate to user time. Sorta like how when you use top there's a difference between wall clock and system clock time, "wall clock" is how long the user waits -- but even if the user waits the same amount of time, making "system clock" time go down is helpful.

Granted, "wall clock" (ie, user wait time) is by far the more important, but if we can measurably reduce system time without introducing complexity, then I think we should take those wins as we get them with the belief that efficiency floats all boats in the end.

@marcaaron
Copy link
Contributor

Sorta like how when you use top there's a difference between wall clock and system clock time, "wall clock" is how long the user waits -- but even if the user waits the same amount of time, making "system clock" time go down is helpful.

I understand this theory, but it seems more practical to prioritize "user time" over "system time".
Why would we make the improvements that are difficult to observe first?

@quinthar
Copy link

I understand this theory, but it seems more practical to prioritize "user time" over "system time".
Why would we make the improvements that are difficult to observe first?

I agree we should prioritize user time over system time, but I'm suggesting we want both. However, that assumes there's no real downside to merging this. Is there a downside you are concerned about?

I 100% agree we shouldn't ship anything that has no defined/measurable benefit. But this is a measurable benefit, to system time, just not to user time.

@marcaaron
Copy link
Contributor

Is there a downside you are concerned about?

If the bar for merging changes is "has no downsides" then we'd be making a lot of useless changes that have no downsides. But the answer is no, I think the changes are great and valuable for other reasons besides performance that may not be priorities. But I'm confused about the metrics and what they are showing.

I'm mainly trying to:

  • understand how this will make things go faster
  • get us to prioritize things that will make things go faster

@quinthar
Copy link

quinthar commented Jul 28, 2021 via email

@kidroca
Copy link
Contributor Author

kidroca commented Jul 28, 2021

To clarify, are you saying that you agree that the metrics are flawed and that we can't prove that any future changes we're making here are valuable?

Only the "Last call ended at" is not very reliable, it helps but only if you know what you're doing.
For the init tests it can vary due to to everything that's going on
But if you use it to benchmark chat A to B switch time it works well

The "time spent" metric is reliable, both for total and individual method calls. So I see an improvement here even though it doesn't seem to improve boot time

Or did you want to work together to create some better metrics?

I'm trying different things to capture better metrics. When I find something that works I'll bring it up.

@marcaaron
Copy link
Contributor

I think we are just debating on whether or not "reducing system time" is a benefit if it doesn't also reduce user time

I agree that reducing "system time" is valuable on its own and I think you agree that it it's more valuable to reduce "user time".

if we have a change that "only" reduces system time and doesn't have any downsides, I don't see why we wouldn't take that small win.

The problem I'm experiencing is that I'm not sure how to prioritize things that only reduce system time and don't add complexity.

If this is not the highest value performance improvement we could make right now or its value is not well understood my instinct is to suggest we find something better to work on first. We can take any small wins that have no downsides, but it feels like we are digging in our couch for spare change or something.

@kidroca
Copy link
Contributor Author

kidroca commented Jul 28, 2021

The task I'm working on is to optimize AsyncStorage usage. This won't necessarily fix boot time, but helps with everything in general.

The total time spent was decreased by 1/4.
3 more PRs like this and it will be 0 😁

I've made many experiments and managed to speed up reading - with batching, but even then boot time didn't drop significantly, but things like chat switching are becoming faster.
Onyx or at least data access might not be what's causing the slow down. I'm leaning more and more towards withOnyx like the other research discovered. But in the mean time why not take the improvements we have.

The thing is measuring chat switch is hard to do manually. But since app init is loading a chat from storage, benchmarking it provides data that captures improvements in that direction as well

@marcaaron
Copy link
Contributor

things like chat switching are becoming faster

Now you are speaking my language! Where is that PR? Let's do it!! 😂

The thing is measuring chat switch is hard to do manually

I agree it's hard to do (and also time consuming if you are doing release builds - which I think is the most reliable way). I've spent the entire week doing benchmarks with the production API and release builds on iOS and Android. So, let me know if I can help test anything.

@marcaaron marcaaron changed the title [HOLD] Defer initialization and wait for defaultKeyStates Defer initialization and wait for defaultKeyStates Jul 28, 2021
@marcaaron marcaaron merged commit 61e6888 into Expensify:master Jul 28, 2021
@kidroca
Copy link
Contributor Author

kidroca commented Jul 28, 2021

Have you tried the hash 93c30be with all the changes from this PR: #88

Unfortunately since then "total time" have increased, a lot due to withLocalize, but the hash should give you a glimpse of the end result and chat switch speed

I think all the changes complement each other, but for chat switching the set and cache improvement should be the most valuable

What could speed boot time is persisting the recent access list and loading it with a single get during init (@quinthar had the idea some time ago). I can try it (the changes needed are in one of the following PRs) and report what I've found. But then we'll definitely have to defer init so that we capture the full benefit of pre-filling cache, so we might as well merge the defer logic now

@marcaaron
Copy link
Contributor

Merged this since I don't have any reason to block and agree with the deferred Onyx connections stuff regardless of performance.

I think all the changes complement each other, but for chat switching the set and cache improvement should be the most valuable

Sorry if I'm coming off as skeptical here. I just prefer to have a clear idea which single action we are taking has the greatest impact on performance.

we'll definitely have to defer init so that we capture the full benefit of pre-filling cache

That's true. I'm also curious to do an experiment where we do a deferred init then a big multiGet with ALL of the available data and never clear the cache at all. I'd imagine that would reduce virtually all movement across the bridge. Not sure if that would be a good idea or not but I'd expect massive improvements in chat switching since we will effectively spend 0 time crossing the bridge. If Onyx's biggest bottleneck is AsyncStorage - should we try to make a version of Onyx that doesn't ever interact with AsyncStorage and see how the app runs?

@kidroca
Copy link
Contributor Author

kidroca commented Jul 28, 2021

Yes we can get all keys and then multi get the values. That's pretty easy to test and should review how much the app can improve with just changes in Onyx

That's pretty much how Redux works with local storage

During init the Redux store is created and hydrated with everything from storage
Then when a change happens it's applied to data in memory and scheduled to be persisted in storage

Now with Expensify there are concerns that data might become too much, so a hybrid approach can be made where we hydrate everything but chat messages...

If Onyx's biggest bottleneck is AsyncStorage - should we try to make a version of Onyx that doesn't ever interact with AsyncStorage and see how the app runs?

I think if we read evening at start and then only write to storage the effect would be the same
The bottleneck IMO is caused by withOnyx and this "get all during init" might prove it if even then boot does not improve

@quinthar
Copy link

quinthar commented Jul 28, 2021 via email

@marcaaron
Copy link
Contributor

Yes, we are aiming to build a platform that has shit tons of data stored locally -- let's not limit ourselves to what fits in RAM.

Ah sorry, to clarify, the suggestion was to experimentally do this. Then we can see how the app runs without a dependency on AsyncStorage and deduce how much room for improvement there is.

*/
function initializeWithDefaultKeyStates() {
_.each(defaultKeyStates, (state, key) => merge(key, state));
return AsyncStorage.multiGet(_.keys(defaultKeyStates))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just caught this. But do the "after" metrics in the description include this time at all?

Copy link
Contributor Author

@kidroca kidroca Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we decorate and track this method here:
https://github.com/kidroca/react-native-onyx/blob/9549e4057abaf2a8f96bad1f255acd1c0b002cd2/lib/Onyx.js#L809
It's tracked under Onyx:defaults and adds time against "total time"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok cool. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants