Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] Change optimistic/server update ordering #2482

Closed
wants to merge 1 commit into from

Conversation

mattkrick
Copy link

@mattkrick mattkrick commented Jun 27, 2018

tl;dr My app shouldn't break just because someone has slow internet.

Relay's PublishQueue has a race condition built into it.
Let's take an example from a test that was passing, but definitely should not have been:

      const increaseVolumeUpdater = {
        storeUpdater: storeProxy => {
          const amp = storeProxy.get('84872');
          amp.setValue(amp.getValue('volume') + 1, 'volume');
        },
      };

      const setVolumeTo10Updater = storeProxy => {
        const amp = storeProxy.get('84872');
        amp.setValue(10, 'volume');
      };

queue.applyUpdate(increaseVolumeUpdater);
queue.commitUpdate(setVolumeTo10Updater);

As written, one would expect the volume to be increased by 1, and then set to 10.
In the test, it is set to 10 and then increased to 11.
That's pretty awful. I called increase back when the volume was 3, now it's not safe to call. What if the max safe volume is 10? Broken app.

This is because internally, the publish queue runs all committed updates first, then all client updates, then optimistic updates. Here's a list of problems with that:

  • Not linearizable. if 2 mutations go out, the first MUST return before the 2nd or your app can break. see pending optimistic updaters always run after local updaters #2481 for proof.
  • Your server updates don't have access to the current state (any optimistic updates or client updates that were run after it was sent).
  • Your client updates don't have access to optimistic updates, at best causing extra renders, at worst, causing errors. <-- This is how i discovered the error. My app broke when latency increased.
  • Your optimistic updates are run when they shouldn't be, requiring all your safety checks to occur inside the optimistic updater instead of before the mutation is called.

This is the same approach used in operational transformations, databases, and even in the world of redux (https://github.com/mattkrick/redux-optimistic-ui).

fixes #2481

@sibelius sibelius requested review from kassens and JenniferWang and removed request for kassens June 27, 2018 19:51
@mattkrick
Copy link
Author

While this goes through review, I've made it available on npm.

To install:

  1. yarn add @mattkrick/react-relay @mattkrick/relay-runtime
  2. add this to your webpack config's resolve :
alias: {
      'react-relay': '@mattkrick/react-relay',
      'relay-runtime': '@mattkrick/relay-runtime',
    },

I'll remove the packages once this is merged & published.

@@ -148,8 +149,19 @@ class RelayRecordSourceMutator {
}

copyFieldsFromRecord(record: Record, sinkID: DataID): void {
this.copyFields(RelayModernRecord.getDataID(record), sinkID);
Copy link
Author

@mattkrick mattkrick Jul 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling copyFields is not desirable because as the code is written, the sink gets overwritten with the base, then the mutated sink overwrites itself. This itself is a 2nd bug that is worthy of its own PR, but since this PR depends on fixing the bug, I kept them at 1 PR.

See new test in RelayRecordSourceMutator-test.js.

Copy link
Contributor

@dminkovsky dminkovsky Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be the same/related issue i'm experiencing here #2489?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like it. those tests should pass in this PR. if they don't let me know.

if (optimisticUpdate) {
const updateIdx = findUpdaterIdx(this._pendingUpdates, optimisticUpdate);
if (updateIdx !== -1) {
this._pendingBackupRebase = true;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that before with arbitrary transform application order, it had to rebase each time a commit happened (from server, source, or local). now, we only rebase if it comes in & is replacing an optimistic update. net win for performance.

@sibelius
Copy link
Contributor

sibelius commented Jul 9, 2018

@kassens does this makes sense?

response,
} = optimisticUpdate;
const selectorStore = store.commitPayload(operation, response);
// TODO: Fix commitPayload so we don't have to run normalize twice
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up logic so no more double normalizing. performance win!

@@ -1334,11 +1348,14 @@ describe('RelayPublishQueue', () => {
);
// Query payload sets name to 'zuck'
queue.commitPayload(createOperationSelector(NameQuery, {id: '4'}), {
me: {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't even a valid source. createOperationSelector supplied id: 4 but all that was in the store was me.
now you can't notify the store with garbage payloads 😄

@mattkrick
Copy link
Author

mattkrick commented Aug 7, 2018

@kassens any time for a review? Without this PR, it's unsafe to commit a local update while there is a pending server update. I think that's a pretty great feature.

@mattkrick
Copy link
Author

@kassens @sibelius @JenniferWang any plans on merging this?

@mattkrick
Copy link
Author

@josephsavona anything I can do to help get this merged?
general concept: instead of committing server updates, then local updates, then optimistic updates; simply apply updates as they're received. When server updates arrive, they take the place in line of their respective optimistic updates, which eliminates the race condition between server updates & local/optimistic updates.

aside from removing the race condition, it greatly reduces the number of rebases & cuts optimistic normalization calls by 50%. Wins all around!

@josephsavona
Copy link
Contributor

josephsavona commented Feb 1, 2019

@mattkrick Thanks for flagging this. Commit/update ordering is a complex issue and it's important to note that the system is working as designed[1]. There are a lot of tradeoffs involved and there isn't a single objectively "correct" approach. With that in mind there are two main considerations: what to do with the specific changes proposed in this PR and what (if any) changes to make around commit/update ordering in general.

As for this PR, the changes proposed here constitute a non-trivial breaking change that would likely require a careful staged rollout. As such we can't move forward directly with landing the PR: even if we decide that this is the exact right direction, we would have to figure out a way to incrementally implement the change without breaking Relay apps.

As for next steps:

  • I'll raise this with the team and we'll investigate this topic more. We can consider, for example, whether to change the default behavior along the lines proposed here or whether this should be a user-space option.
  • If you're interested, it could be helpful for you or other interested community members to implement the desired behavior in user-space by re-implementing the Environment interface via composition of existing relay-runtime primitives (including possibly composing the default Environment). That would allow users to opt-in to the new behavior without forking Relay, gather feedback about it, etc while the team considers this topic.

[1] caveat that #2489 is related and needs more investigation

EDIT: I just want to be clear that we appreciate you taking the time to investigate and raise this issue. It's definitely worth further investigation!

@mattkrick
Copy link
Author

As a first step, would you be open to accepting a PR that only includes the changes to RelayModernRecord.js and RelayRecordSourceMutator.js? I'd love to reach consensus that those 2 changes fix existing bugs & do not introduce any breaking changes. Plus, if you check out the new test case that those fix you'll see the root cause of #2489.

As for breaking changes, the only one I can think of is that an updater could get called twice if there were an existing pending server update. That'd only be a problem if folks had side effects (eg popping a toast) in their updater functions.

If you can think of any other breaking changes please let me know! I want this to be a smooth transition for all. I know you said that ordering is subjective, but I really hope the example mentioned in the first comment might sway your mind. Lamport's happened-before relation is a staple in just about every distributed system & it'd be great to see it in Relay!

@josephsavona josephsavona changed the title Fix broken linearizability Change optimistic/server update ordering Feb 1, 2019
@josephsavona josephsavona changed the title Change optimistic/server update ordering [RFC] Change optimistic/server update ordering Feb 1, 2019
@josephsavona josephsavona changed the title [RFC] Change optimistic/server update ordering [discussion] Change optimistic/server update ordering Feb 1, 2019
@josephsavona
Copy link
Contributor

As a first step, would you be open to accepting a PR that only includes the changes to RelayModernRecord.js and RelayRecordSourceMutator.js?

Thanks for offering! However there's a simple fix for #2489 so I'm just going to import that PR and apply the fix there for simplicity. One note about the RelayModernRecord changes - it's invalid to update __typename when copying fields, that field is intentionally being skipped.

As for breaking changes, the only one I can think of is that an updater could get called twice if there were an existing pending server update.

By breaking change, I'm referring to the fact the current behavior is by design and existing Relay apps may depend on it for correctness. Although the change may "fix" some use-cases, it could easily break others. As an example, in your first comment the author could just as easily be implementing a UI where there is a base volume in addition to a relative volume adjustment control (i'm thinking of cameras where you can dial exposure up/down relative to the camera's automatic metering). The first applyUpdate could be used to implement the relative adjustment. In that case the app would break under the change proposed here. (There's a separate discussion about if that code would be the best way to implement the intent i'm describing, but the point stands that it's possible someone could write code with that intent)

@mattkrick
Copy link
Author

it's invalid to update __typename when copying fields, that field is intentionally being skipped.

I'm really interested in this one. Consider a subscription that returns a Union type. Receiving 2 subscription payloads that are 2 different concrete types would result in the same auto-generated storageKey. How can relay maintain entity integrity?

The first applyUpdate could be used to implement the relative adjustment. In that case the app would break under the change proposed here.

If you don't mind, I'd like to prove that the the code today offers no guaranteed order. If you agree, then I hope we can agree there's no breaking change 🙂

Consider a counter with the value of 1:

setTimeout(() => commitLocalUpdate(environment, (store) => {
  const counter = store.get('viewer').getValue('counter')
  const nextValue = counter === 1 ? counter * 100 : counter + 1
  store.get('viewer').setValue(nextValue, 'counter')
}), 1000)

commitMutation(environment, {
  updater: (store) => {
    const counter = store.get('viewer').getValue('counter')
    const nextValue = counter === 1 ? counter + 1 : counter * 100
    store.get('viewer').setValue(nextValue, 'counter')
  }
})

If the server responds in less than 1 second, then the final result is 3. If the server response takes more than 1 second, the result is 10000.

Assuming no one is calling commitUpdate directly, the only apps that could break are the ones that are already broken due to built-in race conditions. At least that's my belief. If I'm wrong lemme know!

@josephsavona
Copy link
Contributor

josephsavona commented Feb 1, 2019

If you don't mind, I'd like to prove that the the code today offers no guaranteed order. If you agree, then I hope we can agree there's no breaking change

The current implementation applies updates in the order they are received from the network by design. This reflects the fact that typical applications are not implemented to provide guarantees that the server will reflect the ordering in which actions were first initiated on the client. Relay therefore leaves the ordering up to the user - for example, a network layer could choose from a variety of strategies such as processing operations serially or executing in parallel but resolving them in execution order. Relay isn't dictating the ordering, it's respecting the application's ordering.

The end result for a typical setup - where the network implementation processes requests independently - is that later server responses overwrite earlier responses. This is a reasonable heuristic that assumes more recent responses are more likely to correspond to the current server state than older responses. But again - the choice of heuristic is really in the user-defined network layer.

Therefore this is a breaking change in the sense that it chooses to stop respecting the network ordering and instead change to an imposed, serial ordering. Even for typical network implementations that don't guarantee an ordering, network response times generally follow a distribution as opposed to being purely random, this is absolutely an observable change (if only probabilistically so).

Again, we definitely agree that this topic is worthy of further exploration. But I hope I've clarified that this is a breaking change and that we have to proceed carefully with any changes. As I mentioned earlier, we'd definitely be interested to see exploration around this in user-space: an implementation of the Environment interface that behaves as you've described here, or something that can compose a Network and add serial processing and/or resolution serialization.

@mattkrick
Copy link
Author

mattkrick commented Feb 2, 2019

Relay isn't dictating the ordering, it's respecting the application's ordering.

I think we found a good disagreement! I have a local update that needs to execute after an optimistic update. How can I accomplish this? (with the state being the output of the optimistic update)

@josephsavona
Copy link
Contributor

I think we found a good disagreement

@mattkrick Your language in this discussion (here and in #2481) repeatedly focuses on actively trying to find disagreements and proving that you're right, while not acknowledging the counterpoints raised by others. In case I wasn't clear, we agree that this topic is worthy of more exploration precisely because there are behaviors that are difficult to achieve in Relay today (e.g. linear ordering of updates) because of the tradeoffs that it makes. The only disagreement I see is that we cannot proceed with this PR as-is since it constitutes a breaking change (at least I assume it's a disagreement since you haven't actually responded to my comments around this).

I have a local update that needs to execute after an optimistic update. How can I accomplish this? (with the state being the output of the optimistic update)

Yup, this is one of two examples you've brought up that are difficult to achieve (technically speaking you can use lookup() to read the optimistic state inside your local updater - so it's possible but non-obvious). I'd encourage you to explore how to achieve these behaviors in user-space using the existing primitives - either it can be done and users can opt-in to using an extension, or it can't be done and we should consider changes to make it possible. Again, I believe this can all be achieved by some combination of composing the existing Environment, composing a Network to force linear resolution, etc. The one thing I can imagine is that it may be easiest to achieve this by reusing RelayModernEnvironment but passing a custom PublishQueue instance - in that case, we could consider making PublishQueue an argument (again, it would help to see some exploration here).

@mattkrick
Copy link
Author

hey, sorry if I came across as argumentative, that certainly wasn't my intent.
My goal is seeking to understand. Resolving disagreements helps achieve understanding & finding them is wonderful because it's the first step. If you say Relay respects the application's ordering & I disagree, then I'd like to find out what you know so we can find agreement. If we agree that Relay doesn't respect the application ordering, we can move to the next step of asking if it should & again checking consensus. Being right for the sake of being right doesn't do me much good if it means this PR doesn't land. If I can do better, please let me know.

a user-defined publish queue would be great! Unfortunately, this PR is really 4 parts:

  • custom publish queue
  • custom environment (pass in optimistic response to couple with server response)
  • custom copyFieldsFromRecord (so base record doesn't overwrite sink)
  • custom copyFields (to guarantee entity integrity for union types)

I agree putting this in user space without a fork would be great, but fixing the last 2 items would require rewriting the imports for all of their dependents, which would touch the majority of the package.

@josephsavona
Copy link
Contributor

I guess I just don't see any disagreements :-) I'm just trying to explain that the behaviors you're pointing out are by design -and the reasoning - for your and others context. We agree that it would be interesting to explore alternatives!

a user-defined publish queue would be great! Unfortunately, this PR is really 4 parts:

This type of concrete list is really helpful - I agree that the first two things can hopefully be explored in user-space. If there is anything that prevents you/others from doing so, the next step would be to file an issue about the specific issue so that we can discuss remediations (and then move on to a PR if we agree on the approach).

custom copyFieldsFromRecord (so base record doesn't overwrite sink)

This is orthogonal to the core issue being discussed, but regardless I included the fix in 55ce137.

custom copyFields (to guarantee entity integrity for union types)

You're right that __typename should be updated in this case - I misunderstood when replying about this earlier. We'd be happy to review a PR that addresses just this specific issue.

@mattkrick
Copy link
Author

mattkrick commented Feb 5, 2019

Awesome! Thanks for understanding 🙂 That commit solved part 3 & PR #2637 is up to solve part 4.

I've updated the PR to solve the remaining 2 issues. Let me know if this works.

@mattkrick mattkrick force-pushed the fix-pub-queue-master branch from b0afd37 to 35f4deb Compare March 6, 2019 03:38
@mattkrick
Copy link
Author

mattkrick commented Mar 7, 2019

NOTE: requires #2637 (excluded that in latest rebase to master)

@mattkrick
Copy link
Author

friendly bump, pretty tiny PR for a swappable publish queue

@mattkrick
Copy link
Author

closing in favor of #2791, please see new PR

@mattkrick mattkrick closed this Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pending optimistic updaters always run after local updaters
5 participants