Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature/multi_tenancy] Add ifSeqNo and ifPrimaryTerm for update concurrency #2605

Conversation

dbwiddis
Copy link
Member

@dbwiddis dbwiddis commented Jul 2, 2024

Description

Tracks SeqNo and PrimaryTerm used for strong concurrency for updates.

  • Adds to update requests
  • Implements tracking of this version in DynamoDB

Still TODO (@arjunkumargiri will do in another PR):

  • DDB: Initialize SeqNo to 0 for new documents, but increment on overwrite
  • DDB: Increment on successful update
  • DDB: Return -2 on delete

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 2, 2024 19:57 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 02:51 — with GitHub Actions Failure
@dbwiddis dbwiddis marked this pull request as ready for review July 3, 2024 03:57
@dbwiddis dbwiddis force-pushed the strong-concurrency branch from 87f8223 to 1cf4c43 Compare July 3, 2024 04:19
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 04:19 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 04:19 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 04:19 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 04:19 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 04:19 — with GitHub Actions Failure
Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis force-pushed the strong-concurrency branch from 1cf4c43 to aa83319 Compare July 3, 2024 20:36
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 20:37 — with GitHub Actions Failure
Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis force-pushed the strong-concurrency branch from aa83319 to 4cc2070 Compare July 3, 2024 21:54
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env July 3, 2024 21:54 — with GitHub Actions Error
Comment on lines +255 to +258
// TODO need to return SEQ_NO here
// If document doesn't exist, increment and return highest seq no ever seen, but we would have to track seqNo here
// If document never existed, return -2 (unassigned) for seq no (probably what we have to do here)
// If document exists, increment and return SEQ_NO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of delete, why does seq no be updated? can we safely delete document as long as seq no matches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunkumargiri This is somewhat of an artifact of how "soft deletes" work in OpenSearch.

When you call the Delete API, it only marks the document as deleted in the metadata but it doesn't actually remove it until the segment is merged. It's actually still available in search results unless you use the refresh parameter.

So it still tracks the seq_no as a change to the document for replication/tracking purposes until the document actually goes away on a merge, at which point I assume that tracking disappears.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we distinguish between 'If the document doesn't exist' and 'If the document never existed'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhrubo-os @arjunkumargiri

How can we distinguish between 'If the document doesn't exist' and 'If the document never existed'?

We can only do so on OpenSearch for some time period following the deletion.

However, this TODO is in the context of other data stores, and I'm not sure we can, or that we want to.

My intent in writing these comments was to document OpenSearch behavior, specifically:

  1. The seq_no is incremented when a document is deleted
  2. The seq_no is maintained in the segment metadata for at least as long as it takes for all merge and indexing and replication operations to safely complete, as well as remaining part of the translog.
  3. At some point, depending on usage of the cluster, the translog is eventually trimmed, and the reference to the document's seq_no is deleted.

From a practical standpoint, the only difference would be seen if someone created a new document with the same document ID as a previously deleted document, within the time period that the old seq_no remains. In that case the value would be a number higher than 0.

I'd assume that a reasonable implementation for a deletion would be to increment and return the seq_no, but then not track it any more, so the next creation would start again at 0. THat would match current behavior except in the rare edge case of deleting and recreating the same doc-id.

@dhrubo-os
Copy link
Collaborator

May be not a part of this PR, but we need to address this use case. @dbwiddis could you please track this action item in your end? Thanks.

@dbwiddis
Copy link
Member Author

dbwiddis commented Jul 8, 2024

May be not a part of this PR, but we need to address this use case. @dbwiddis could you please track this action item in your end? Thanks.

I'm not clear what the action item is? The code currently throws an exception and returns an error to the user; wouldn't it be exactly the same error? Are you referring to per-tenant keys or something else?

@dhrubo-os
Copy link
Collaborator

May be not a part of this PR, but we need to address this use case. @dbwiddis could you please track this action item in your end? Thanks.

I'm not clear what the action item is? The code currently throws an exception and returns an error to the user; wouldn't it be exactly the same error? Are you referring to per-tenant keys or something else?

So currently Optype is Index by default. But for config key creation, we want OpType.CREATE specifically. From current sdkClient interface we can't achieve this.

@dbwiddis
Copy link
Member Author

dbwiddis commented Jul 9, 2024

we want OpType.CREATE specifically. From current sdkClient interface we can't achieve this.

Got it... I'm not sure we want to use "OpType" generaly. "Upsert" is the standard language here. We probably want to implement conditional writes on the DynamoDB version.

Let's align on a name for it... will track separately.

@dbwiddis dbwiddis mentioned this pull request Jul 9, 2024
5 tasks
@dhrubo-os dhrubo-os merged commit a421f49 into opensearch-project:feature/multi_tenancy Jul 9, 2024
7 of 10 checks passed
@dbwiddis
Copy link
Member Author

May be not a part of this PR, but we need to address this use case. @dbwiddis could you please track this action item in your end? Thanks.

So currently Optype is Index by default. But for config key creation, we want OpType.CREATE specifically. From current sdkClient interface we can't achieve this.

@dhrubo-os @arjunkumargiri #2640

@dbwiddis dbwiddis deleted the strong-concurrency branch July 11, 2024 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants