Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition with clearing Onyx and the SyncQueue #220

Merged
merged 23 commits into from
Jan 12, 2023

Conversation

tgolen
Copy link
Collaborator

@tgolen tgolen commented Jan 3, 2023

The problem is explained in detail in this comment.

Related Issues

Related to Expensify/App#13884

Automated Tests

Mostly handled in existing unit tests. Some new tests being added.

Linked PRs

Will be merged into App with Expensify/App#13886

@yuwenmemon
Copy link
Contributor

@tgolen have you pushed the latest updates to this branch? Was looking into creating the test laid out here: Expensify/App#13884 (comment) but unsure if we're supposed to have any changes in LocalForage or not?

@tgolen
Copy link
Collaborator Author

tgolen commented Jan 5, 2023 via email

@@ -104,4 +104,26 @@ describe('storage/providers/LocalForage', () => {
});
});
});

it('clear', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidroca - @tgolen asked me to help with implementing a test like what you described here - just want to double-check check I'm capturing this right since I'm a bit new to jest and it's "fake time"...

Copy link
Contributor

@kidroca kidroca Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there.
What you've got right now won't resolve the promise after 1000ms
Also setItemTimedOut should be a function - right now it's just the result of a setTimeout() call - a numeric ID
(Probably the reason for the TypeError: callback is not a function errors)

I think you just need to change the mock like so:

// Given an implementation of setItem that resolves after 1000ms
localForage.setItem = jest.fn(() => new Promise(
  (resolve) =>  setTimeout(resolve, 1000))
);
  1. setItem should return a promise
  2. we make the returned promise resolve after 1000ms - it would only resolve after we advance timers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hm... in actuality, we want to measure how many times the promise is resolved then, right? But there's no other way to do that other than having a callback get called when the promise resolves?

So my thought was to do something like this instead:

    it('clear', () => {
        // Use fake timers, so we can manipulate time at our will for this test.
        jest.useFakeTimers();

        // Given a mocked implementation of setItem that calls a callback function after 1000ms
        const callback = jest.fn();
        localforage.setItem = jest.fn(() => new Promise(resolve => setTimeout(resolve, 1000)).then(callback));

        // When we call setItem 5 times, but then call clear after only 1000ms
        for (let i = 0; i < 5; i++) {
            StorageProvider.setItem(`key${i}`, `value${i}`);
        }

        // We should not have called the mocked callback for setItem yet because we have not advanced the timer.
        expect(callback).not.toHaveBeenCalled();

        jest.advanceTimersByTime(1000);
        StorageProvider.clear();
        jest.advanceTimersByTime(4000);

        // The mocked callback for setItem should only have been called once since all other calls were aborted when we called clear()
        expect(callback).toHaveBeenCalledTimes(1);
    });

However I'm seeing that's not quite working as I would expect 🤔

Error: expect(jest.fn()).toHaveBeenCalledTimes(expected)

Expected number of calls: 1
Received number of calls: 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I noticed you had the comment here noting that "fake timers cause promises to hang" - wonder if that might be the same reason for the issue I'm seeing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Btw, just tested this again using real timers and I don't see this problem. So I do have a suspicion this is somehow related to FakeTimers)

Copy link
Contributor

@kidroca kidroca Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hm... in actuality, we want to measure how many times the promise is resolved then, right?

I guess it might look like that from a certain point of view, but it's not necessary

If we make 5 StorageProvider.setItem calls they should be queued (this is what we test)
This should immediately call localForage.setItem (our mock) 1 time
then wait for the returned promise to resolve in order to move to the next item in the queue
In our test the queue would be emptied so then next 4 calls to localForage.setItem shouldn't happen

Also, I noticed you had the comment here noting that "fake timers cause promises to hang" - wonder if that might be the same reason for the issue I'm seeing?

I'm can't remember what lead me to that conclusion...
Maybe we can try and update jest

This might also need calls to waitForPromisesToResolve

(Btw, just tested this again using real timers and I don't see this problem. So I do have a suspicion this is somehow related to FakeTimers)

I'm not sure how you test this with real timers, but if you don't wait for the timers to expire the test should appear as passing successfully


There's an alternative way to test this without using timers

const task = createDeferredTask();

localforage.setItem = jest.fn()
   . mockReturnValue(Promise.resolve()) // default value
   . mockReturnValueOnce(task.promise) // first call

for (let i = 0; i < 5; i++) {
  StorageProvider.setItem(`key${i}`, `value${i}`);
}

// at this point `localForage.setItem` should be called once, but we control when it resolves 
// and we keep it unresolved 

// this should queue any calls that follow, so we don't expect more than 1 `localForage.setItem` call
StorageProvider.clear();

// we simulate the 1st setItem resolves after we called clear
task.resolve()

return waitForPromisesToResolve().then(() => {
  expect(localForage.setItem).toHaveBeenCalledTimes(1);
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yes I do see some issues with Jest's fake timers and Promises being reported by others like here and here.

Although I can't quite wrap my head around your alternative way... is there a way to demonstrate the "before" case (i.e. clear() doesn't call abort and thus setItem is called 5 times?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to demonstrate the "before" case (i.e. clear() doesn't call abort and thus setItem is called 5 times?)

Running this test against the pre-PR code should produce exactly that error
expected setItem to be called 1 time but it was called 5 times

I can't quite wrap my head around your alternative way

  1. We create a promise which we programmatically control when to resolve
  2. We configure localforage.setItem to return this promise the first time it's called and to otherwise return resolved promises
  3. We make 5 StorageProvider.setItem calls
  4. This adds 5 items to the queue and starts executing the first localForage.setItem
  5. This simulates the 1st localForage.setItem taking a random time.
  6. The important bit is we call StorageProvider.clear() while the first localForage.setItem isn't completed yet (and the queue is not empty, because the queue would only go to the next item after the 1st completes)
  7. We call StorageProvider.clear() and only then we resolve the first localForage.setItem call
  8. waitForPromisesToResolve() makes jest wait for any promises (even promises returned as the result of a promise) to resolve
    9.If StorageProvider.clear() does not abort the queue, more localForage.setItem would be executed because StorageProvider.setItem works like this
    • StorageProvider.setItem (5th call)
      • Queue -> push -> process ->
        • return promise for 5th call
          • return promise for 4th call
            • return promise for 3th call
              • return promise for nth call

So the promise of the 5th call waits the promise of the 4th call and only then calls localForage.setItem and the promise of the 4th call waits the promise of the 3rd and so on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, amazing explanation - thanks for taking the time to explain that to me I appreciate it! Okay, let's go with this then since it doesn't require us to update jest in this PR.

Copy link
Contributor

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few notes after examining the changes

lib/SyncQueue.js Outdated
Comment on lines 28 to 30
if (!this.isProcessing || this.queue.length === 0) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary to make this check

It looks like it might lead to a bug where the queue is not cleared when abort is called
I haven't dived into what causes this.isProcessing to be false, but if it is indeed false and we have some items in the queue - they won't be removed from the queue, because this check would make us exit

Then when the queue resumes processing it will continue with the uncleared queue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good. I'll go ahead and remove it entirely. I agree with you that it probably doesn't make sense to have it and is pretty useless.

Comment on lines 144 to 145
// When Onyx is cleared, since it uses multiSet() to clear out all the values, the keys will remain with null for all the values
expect(localforageMock.storageMap).toEqual({
test_1: null,
test_2: null,
test_3: null,
});
expect(localforageMock.storageMap).toEqual({});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer correct. Onyx does not use multiSet to clear out all the values, and it seems it's expected that no keys would remain, or at least no keys with null values

const keysToPreserve = _.map(keyValuesToPreserve, keyValueToPreserve => keyValueToPreserve[0]);
return _.filter(allKeys, key => !_.contains(keysToPreserve, key));
})
.then(keysToRemove => _.map(keysToRemove, key => raiseStorageSyncEvent(key)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are all the keys to be removed, why do we raiseStorageSyncEvent for each, before they are even cleared?

Why do we use _.map to iterate a list?
raiseStorageSyncEvent does not return a promise, but even if it did, you'd still need to have something like

.then(keysToRemove => {
  const promises = _.map(keysToRemove, key => raiseStorageSyncEvent(key));
  return Promise.all(promises);
})

I think it would be better to create a CLEAR_ONYX event, similarly to how SYNC_ONYX works

Instead of having this.clear work with keyValuesToPreserve it can work with just the keysToPreserve
This would allow us to raise a CLEAR_ONYX event with keysToPreserve as value
Then before clearing storage

  1. storage would multi get keysToPreserve
  2. then call storage clear
  3. and finally multi set the result of multiget at point 1

This should be simpler and more efficient that iterating all the keys

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use _.map to iterate a list?

You're right, it's not necessary, it's just what we were using before and it only needs to be a loop at a bare minimum.

raiseStorageSyncEvent does not return a promise, but even if it did, you'd still need to have something like

Yes, that's almost exactly what was there before, as you can see in the diff, and it doesn't make any sense considering that raiseStorageSyncEvent doesn't return anything.

I'm not totally following the idea of CLEAR_ONYX, but the rest of what you said gave me some thoughts of a different way I can code this to keep this logic from having to deal with keysToPreserve at all. Let me think on it and I'll send in an update.

Copy link
Collaborator Author

@tgolen tgolen Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we raiseStorageSyncEvent for each, before they are even cleared?

Oh, to answer this... I am not sure. It's what was there before so I was trying to preserve the previous functionality (which I think was all added by yourself 😁 ).

I think it's so that other browser tabs will know that all the keys have been removed. Technically, it could happen either before or after calling clear(). If this logic was removed, then other open tabs wouldn't sign out and they would retain all their existing data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's so that other browser tabs will know that all the keys have been removed. Technically, it could happen either before or after calling clear()

Yeah I was just confused about that part
I thought that because we call it before we clear storage, other tabs might have a chance to re-sync keys from storage and read values before we delete them

So I think it would be safer to first call Storage.clear() and only then raise the storage sync events

I'm not totally following the idea of CLEAR_ONYX

What you've got now seems to be less changes overall, so if it there are no issues it doesn't matter

Comment on lines 84 to 88
clear(keyValuesToPreserve) {
this.setItemQueue.abort();
return localforage.clear()
.then(() => Storage.multiSet(keyValuesToPreserve));
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following my prior comment this can become

    clear(keysToPreserve) {
        this.setItemQueue.abort();
        return this.multiGet(keysToPreserve)
            .then(keyValuesToPreserve =>
                localforage.clear()
                    .then(() => this.multiSet(keyValuesToPreserve))
             )
    },

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I missed this comment earlier, but I pushed a small refactoring which would make this unnecessary. Would you mind taking a fresh look at the current changes to see if it makes more sense? My approach is basically to avoid having the data providers deal with keysToPreserve at all.

@@ -50,26 +50,6 @@ describe('Set data while storage is clearing', () => {
return Onyx.clear();
});

it('should persist the value of Onyx.merge when called between the cache and storage clearing', () => {
Copy link
Collaborator Author

@tgolen tgolen Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neil-marcellini I saw you added these tests, and as far as I can tell, they are no longer necessary because the behavior of clear() has been changed. Can you please take a look at this and make sure that I am not introducing regressions that I'm not aware of?

Basically, the way I read this is that the problem only happened because we were done something like this:

Onyx.clear();
Onyx.set(keyValuesToPreserve);

My PR here changes this behavior so that no one should ever be doing that anymore. It should now become:

Onyx.clear(keysToPreserve);

So that's why I think it's OK to remove these tests since they no longer test a valid use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests make sure that the following flow works for keys with a default key state, like the session.

  1. Sign out the user with Onyx.clear()
  2. Sign in a new user (before Onyx.clear finishes), and merge new data on the session key
  3. Ensure that Onyx.clear doesn't set a default key state on the session key and wipe out the value saved in step 2.

I think it might still be valuable to test calling Onyx.clear() and then Onyx.merge on a key with a default state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks! My point is that this...

Sign in a new user (before Onyx.clear finishes),

...isn't possible to do. Onyx.clear() takes less that 200ms to complete now on an account with 4000 reports, as well as we are no longer optimistically redirecting to the sign-on page. We are waiting until Onyx.clear() is finished before the redirect happens.

If there is a valid use case of calling clear -> merge, I could see having a test, but I don't know if any time that would ever be a valid case. Do you have a specific one in mind? If that does exist in the code, it should be rewritten like I showed above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Onyx.clear() takes less that 200ms to complete now on an account with 4000 reports,

Ah ok nice. If that's the case for all platforms then I agree that we don't need this test any more 👍.

we are no longer optimistically redirecting to the sign-on page. We are waiting until Onyx.clear() is finished before the redirect happens.

I don't think we are based on the code here. As soon as the session key is cleared we are signed out, and that can happen before Onyx finishes clearing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as the session key is cleared we are signed out, and that can happen before Onyx finishes clearing.

Ah yes, you're technically right. However, the session key is removed at the same time as all other keys now (because it is once again using LocalForage.clear() under the hood rather than using a multiSet() or multiRemove()).

@tgolen tgolen self-assigned this Jan 5, 2023
@tgolen tgolen changed the title Tgolen implement multiremove2 Fix race condition with clearing Onyx and the SyncQueue Jan 5, 2023
@tgolen tgolen marked this pull request as ready for review January 5, 2023 22:16
@tgolen tgolen requested a review from a team as a code owner January 5, 2023 22:16
@melvin-bot melvin-bot bot requested review from yuwenmemon and removed request for a team January 5, 2023 22:16
@tgolen
Copy link
Collaborator Author

tgolen commented Jan 7, 2023 via email

@tgolen tgolen requested a review from kidroca January 9, 2023 19:05
@tgolen
Copy link
Collaborator Author

tgolen commented Jan 9, 2023

OK, I've updated it to switch the order in which the storage event is raised

Copy link
Contributor

@kidroca kidroca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few NAB comments

Comment on lines 37 to 51
this.clear = () => {
let allKeys;

// They keys must be retreived before storage is cleared or else the list of keys would be empty
return Storage.getAllKeys()
.then((keys) => {
allKeys = keys;
})
.then(() => Storage.clear())
.then(() => {
// Now that storage is cleared, the storage sync event can happen so that it is more of an atomic
// action
_.each(allKeys, raiseStorageSyncEvent);
});
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the closure variable by doing something like

        this.clear = () => {
            return Storage.getAllKeys()
                .then((allKeys) => {
                    return Storage.clear()
                        .then(() => _.each(allKeys, raiseStorageSyncEvent))
                });
        };

Although code has more nesting, it conveys the information you have to explain using comments here

I guess this is opinion based and there's no hard rule on how to style such concepts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, yeah. I prefer the clarity of my current code than trying to reason about those nested promises. Thanks!

.then(() => Storage.clear())
.then(() => {
// Now that storage is cleared, the storage sync event can happen so that it is more of an atomic
// action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's only me, but it bothers me that this has to be a 2 line comments just because of 1 word

lib/Onyx.js Outdated
Comment on lines 1053 to 1059
// Get all the values for the keys that need to be preserved. These key/value pairs will be set
// in Onyx after the database is cleared().
const keyValuesToPreserve = _.reduce(keysToPreserve, (finalArray, keyToPreserve) => {
finalArray.push([keyToPreserve, cache.getValue(keyToPreserve)]);
return finalArray;
}, []);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using _.reduce it's simpler to just use _.map

const keyValuesToPreserve = _.map(keysToPreserve, key => [key, cache.getValue(key)]);

We remap every key we want to preserve to a key-value pair, in the end we still end up with the same list of key-value pairs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, makes things much easier to read.

lib/Onyx.js Outdated
// are preserved in storage. This makes sure to always leave storage in a state that contains
// all the default values and any additional values that we want to remain after the database is cleared.
return Storage.clear()
.then(() => Storage.multiSet(defaultKeyValuePairs.concat(keyValuesToPreserve)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use .concat instead of spreading (like we do with props)?

Storage.multiSet([...defaultKeyValuePairs, ...keyValuesToPreserve])

both .concat and spreading result in a new array

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to stay away from using the spread operator on code that needs to be super performant. Babel will transpile to something that's pretty slow. It's probably not really necessary here, but it's the habit I've been in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be true for objects, if some of the keys are destructed and the rest are ...rest, but for Arrays, babel works like this

Code

const arr3 = [...arr1, ...arr2];

Transpiled

var arr3 = [].concat(arr1, arr2);

You can check that out in the live https://babeljs.io/repl though I think babel stopped transpiling the spread operator as it is supported natively by all alive browsers: https://caniuse.com/?search=spread%20operator

I think if we tell webpack/babel to ignore IE11 it would just stop transpiling the spread operator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that would be great to stop transpiling it altogether. We are not interested in supporting IE11. I didn't know that about arrays. I'll go ahead and switch this to a spread operator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that would be great to stop transpiling it altogether. We are not interested in supporting IE11. I didn't know that about arrays. I'll go ahead and switch this to a spread operator.

Here's a small PR regarding that: #226

Copy link
Contributor

@yuwenmemon yuwenmemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment about using _.map

@tgolen
Copy link
Collaborator Author

tgolen commented Jan 10, 2023

Updated with _.map, a change to comments, and swapping concat for spread operator 👍

Copy link
Contributor

@yuwenmemon yuwenmemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally, works like a dream

@tgolen
Copy link
Collaborator Author

tgolen commented Jan 12, 2023

I'm going to go ahead and merge this then. Thanks!

@tgolen tgolen merged commit 4820972 into main Jan 12, 2023
@tgolen tgolen deleted the tgolen-implement-multiremove2 branch January 12, 2023 16:22
@yuwenmemon
Copy link
Contributor

Is there an App PR to update this there?

@tgolen
Copy link
Collaborator Author

tgolen commented Jan 17, 2023

Oops, I linked the wrong PR in the description. The App PR for this is Expensify/App#13886

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants