Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add idempotency #6748

Merged
merged 60 commits into from
Jul 15, 2020
Merged

Add idempotency #6748

merged 60 commits into from
Jul 15, 2020

Conversation

mtrezza
Copy link
Member

@mtrezza mtrezza commented Jun 24, 2020

Adds idempotency enforcement for client requests as new middleware for select routes.

Related to issue: #6744

TODO:

  • add middleware to enforce idempotency rules on routes
  • add Parse Server config for idempotency rules
  • add new system class _Idempotency to schema for tracking client requests with unique and TTL index
  • introduce new Parse Error 159: DUPLICATE_REQUEST (only referenced; has to be added to Parse JS SDK via add idempotency error "duplicate request" Parse-SDK-JS#1189)
  • implement for MongoDB storage adapter (introduce TTL index creation)
  • add test cases
  • add in-code docs
  • add Github doc
  • add change log
  • add Parse Server docs (experimental feature)

Potential Future Improvements:

  • add Postgres support (feature is enabled only for MongoDB because Postgres does not seem to have a concept of TTL index(?); anyone with Postgres expertise feel free to pick this up)
  • remove fields _created_at, _updated_at from _Idempotency class to optimize performance

Notes:

  • idempotency is enforced only on POST, PUT routes as GET, DELETE are already idempotent
  • TTL of request history is set to 5 minutes by default (in my experience, the shortest time between duplicate requests I have observed was ~50ms, the longest ~10s)
  • options allow to enforce idempotency only for certain paths so that for example classes with heavy write loads or where enforcing is not required can be excluded

Configuration example:

let api = new ParseServer({
    idempotencyOptions: {
        paths: [".*"], // enforce for all paths
        ttl: 120,      // keep history of requests for 120s
    }
}

Other path examples:

  • functions/.* = enforce for all functions
  • jobs/.* = enforce for all jobs
  • classes/.* = enforce for all classes
  • users = enforce for user creation and update (because user has custom path)
  • installations = enforce for installation creation and update (because installation has custom path)

@mtrezza mtrezza mentioned this pull request Jun 24, 2020
1 task
@mtrezza
Copy link
Member Author

mtrezza commented Jul 9, 2020

@dplewis Thanks, that worked.

@mtrezza
Copy link
Member Author

mtrezza commented Jul 10, 2020

This PR is ready for review. It's rather extensive, but the added docs and test cases provide a good entry point for review I think.

dplewis
dplewis previously approved these changes Jul 10, 2020
Copy link
Member

@dplewis dplewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Lets see how this goes. @davimacedo You want to give it a look over?

@dplewis
Copy link
Member

dplewis commented Jul 13, 2020

@mrmarcsmith If you have time can you take a look at the PR? I saw you worked on
https://github.com/panda-clouds/idempotency

@mtrezza
Copy link
Member Author

mtrezza commented Jul 13, 2020

@dplewis, that's a good point, I should add a quick comparison for future readers:

Parse Server Idempotency (this PR):

  • Cost per request: 1 write; tries to write key to DB with unique index and if key already exists the write fails
  • Accuracy: Unique index in DB prevents guarantees to prevent duplicate keys
  • Implementation: Out of the box as middleware in Parse Server
  • Server configuration: Configured as Parse Server option, flexible as one can specify functions, jobs and classes by their path in regex
  • Client configuration: SDKs have option to turn on "request deduplication" to send the necessary header (x-parse-request-id) for request deduplication
  • Performance: Identifying duplicate requests early in routing process prevents unnecessary code execution early on
  • Duplicate identifier: Uses the Parse client SDK header (x-parse-request-id)

PCIdempotency:

  • Cost per request: 1 read + 1 write; first checks whether key exists in DB, if not then writes key
  • Accuracy: 2-step approach (read key, then write key) can miss duplicate requests that arrive within milliseconds due to race conditions when reading / writing.
  • Implementation: Separate node module, has to be manually added and maintained
  • Server configuration: Needs to be manually added to each Cloud Code function, job or beforeSave trigger for classes
  • Client configuration: A unique ID has to be added manually to the SDK request by the developer
  • Performance: Identifying duplicate requests later in the actual cloud code function / beforeSave trigger causes more code to be executed until request deduplication
  • Duplicate identifier: Can be set to use any parameter to identify a duplicate request, therefore could be used for specific use cases, maybe related to different concepts of "idempotency"

@mrmarcsmith
Copy link
Contributor

@mtrezza As @dplewis pointed I authored the PCIdempotency module. I was only able to spend a couple minutes looking at the code but this looks like a really well thought out implementation. Here is what I really liked:

  1. Your usage of Mongo unique index to prevent duplication I definitely think that's the cleanest.
  2. The TTL index is a great way to ensure we don't have to have a background job that goes through and cleans up the old itempotency objects
  3. The requestId is passed as a header instead of a parameter to a cloud function which allows for generic usage

I'm planning on taking a closer look tonight but overall, great job man! I will happily deprecate PCIdempotency in favor of this solution in the module README when this is prod ready.

@mtrezza
Copy link
Member Author

mtrezza commented Jul 14, 2020

@mrmarcsmith Thanks for your work on PCIdempotency; it provided a good starting point when I researched on how to design this solution.

@dplewis
Copy link
Member

dplewis commented Jul 14, 2020

@mrmarcsmith Do you think we should ignore the duplicate request or throw an error. Every request could throw this error in theory. @mtrezza This is my only concern now that I re-reviewed it.

@mtrezza
Copy link
Member Author

mtrezza commented Jul 14, 2020

Do you think we should ignore the duplicate request or throw an error.

In my research for a design, it was consensus that one needs to send a response. The whole network chain is waiting for a http response and not sending one, may create more complexity down the line, where people need custom solutions according to their architecture. To prevent this complexity, I found two common approaches:

  • Send the same response for a duplicate request as has been sent for the initial request. This requires caching the response, which opens another can of worms. This could be considered the "ideal" approach, but also the most complex one, and I am not sure it is even favorable in every use case of Parse Server. That could be an improvement down the line, but requires further research. Responding with stale data can create other challenges for the developer.
  • Send an error response. This is something the developer can handle specifically on the client side, according to which request failed. See the example of a chat list. As you mentioned, in case that every request throws this error, because the the initial response by Parse Server did not reach the client, then the client would have to handle this accordingly.

I think even if a cached response will be implemented, either approach should be optional in Parse Server, so the developer can choose what best fits their use case.

@mrmarcsmith
Copy link
Contributor

@dplewis of those two options, Throwing an error makes sense to me especially if the first response was lost in cyber space which might have been the whole reason the request was sent again. @mtrezza Is right though that the "best" way to do it is to cache and send the response again and as long as the parameters are the same (mismatched parameters would be an error).

this is the way stripe implements their idempotency and it makes the most sense to me. @mtrezza we can brainstorm but I can't think of many "cans of worms" that this would open if we are only using this for network idempotecy. Especially since the TTL will ensure the data won't be too stale. It would essentially just allow the client to say "I missed that could you say that again?" without creating two objects etc...

As I understand the way we use requests, only one response is honored per http request so no matter which response is actually received the app should behave as expected.

@dplewis
Copy link
Member

dplewis commented Jul 14, 2020

@mtrezza Can you resolve the conflict?

@mrmarcsmith Can you approve it?

@mtrezza
Copy link
Member Author

mtrezza commented Jul 14, 2020

@mrmarcsmith The challenges with caching responses I see are:

  • Costs. Even though it is possible to enable idempotency only for specific routes, it makes sense to enable it for all routes. That means every response that is received from the DB is written back to the DB (for POST and PUT). That causes an increase in data traffic costs. In addition, it increases write load on the DB which is often a performance bottleneck in hosted MongoDB clusters and one of the main reason to upgrade to higher plans which often comes with significant cost jumps. One aspect that may discourage from activating the feature, so even if we decide to implement this, it should only be an optional alternative to sending an error response.

  • Stale data. Sending back cached data means the caching mechanism would probably also need a parameter to define what level of staleness is too much, because it may not reflect the current state of data accurately anymore. Then it's back to square one - it should probably return an error.

[response caching] is the way stripe implements their idempotency and it makes the most sense to me.

Stripe can adapt their idempotency strategy specifically to their architecture and cost/benefit evaluation, Parse Server can find itself in various scenarios where it may or may not be desired to send cached data for the above mentioned implications.

the TTL will ensure the data won't be too stale

Generally yes, but this is not the purpose of the TTL index. The TTL index says "how long do you want me to remember a request for deduplication". The question here is "What level of data staleness is acceptable for your use case?" I can't see introducing response caching without including a separate parameter and mechanism to manage data staleness. I have implemented this feature already for some weeks in a production system of Parse Server and I observe that duplicate requests can arrive with a delay of several seconds, in a rare case it was almost 1 min. That is why I set the default TTL of the index to 5min, but data that is 5min old may be considered unusable.

I'm all for adding a cached response enhancement (in a separate PR, once we have some data points on this experimental feature), but I think it should be optional and not replace the current approach of returning an error.

mrmarcsmith
mrmarcsmith previously approved these changes Jul 14, 2020
Copy link
Contributor

@mrmarcsmith mrmarcsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you @mtrezza for your hard work on this PR!

@mtrezza
Copy link
Member Author

mtrezza commented Jul 14, 2020

@mrmarcsmith Thanks for the discussion, I think it would be great if we could put our heads together at some point in the future to brainstorm how to go about response caching.

@dplewis
Copy link
Member

dplewis commented Jul 14, 2020

@mtrezza Can you regenerate the cli definitions to remove the conflict? I can merge it in after.

@mtrezza mtrezza dismissed stale reviews from mrmarcsmith and dplewis via bb2409a July 14, 2020 22:37
@mtrezza
Copy link
Member Author

mtrezza commented Jul 14, 2020

@dplewis ready for merge, the coverage change must be related to the merged master, because this PR is fully covered.

@dplewis dplewis merged commit 3bd5684 into parse-community:master Jul 15, 2020
@cbaker6 cbaker6 mentioned this pull request Dec 28, 2021
5 tasks
@mtrezza mtrezza deleted the add-idempotency branch March 24, 2022 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants