Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in IdealStateGroupCommit #14237

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Oct 15, 2024

This fix is inspired by #14214

Handles the failure scenario gracefully.

Test

Enhance the test to update 2000 times for each of the 20 table IdealStates.
All updates are sent from 100 processors to test the race condition for no loss of the events.

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 61.29032% with 12 lines in your changes missing coverage. Please review.

Project coverage is 63.83%. Comparing base (59551e4) to head (4643467).
Report is 1188 commits behind head on master.

Files with missing lines Patch % Lines
...inot/common/utils/helix/IdealStateGroupCommit.java 61.29% 9 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14237      +/-   ##
============================================
+ Coverage     61.75%   63.83%   +2.08%     
- Complexity      207     1535    +1328     
============================================
  Files          2436     2623     +187     
  Lines        133233   144452   +11219     
  Branches      20636    22108    +1472     
============================================
+ Hits          82274    92211    +9937     
- Misses        44911    45432     +521     
- Partials       6048     6809     +761     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.79% <61.29%> (+2.08%) ⬆️
java-21 63.73% <61.29%> (+2.10%) ⬆️
skip-bytebuffers-false 63.83% <61.29%> (+2.08%) ⬆️
skip-bytebuffers-true 63.68% <61.29%> (+35.95%) ⬆️
temurin 63.83% <61.29%> (+2.08%) ⬆️
unittests 63.83% <61.29%> (+2.08%) ⬆️
unittests1 55.52% <0.00%> (+8.63%) ⬆️
unittests2 34.35% <61.29%> (+6.62%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xiangfu0 xiangfu0 force-pushed the fix-ideal-state-group-commit branch 5 times, most recently from 0772b24 to b7a6b40 Compare October 15, 2024 16:30
IdealState response = updateIdealState(helixManager, mergedResourceName, idealState -> {
IdealState updatedIdealState = first._updater.apply(idealState);
first._updatedIdealState = updatedIdealState;
first._exception = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be made to never be non-null in this line. For example when picking first we might want to skip (or remove) any Entries with exceptions (if this case is even possible)?

Namely I think we should avoid some case where the requesting thread gets an exception but some other thread still picks up the Entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is retriable exception and non-retriable. E.g. Update IS might be timed out.

You can refer to HelixHelperTest.

Here we rely on the RetryPolicy to fail eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, yes that makes sense :) This is what I get for reviewing under-caffeinated. Thanks!

throw ex;
}
} catch (Throwable e) {
// If the update failed, we should re-add all entries to the queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the intent here with this comment? It doesn't seem to re-add these to the queue (and I feel it shouldn't)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me fix the comments

IdealState finalUpdatedIdealState = updatedIdealState;
updateIdealState(helixManager, resourceName, anyIdealState -> finalUpdatedIdealState,
retryPolicy, noChangeOk);
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only throw if the mergedResourceName matches the one originally requested perhaps?
Otherwise some "bad" znode update would cause totally unrelated failures.

Plus the "owning" thread of the future Entry would have returned a failure but the Entry might eventually still be processed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the exception will directly throw, here we just set exception for all the batch entries. So other threads waiting for these entries won't be blocking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think directly throwing was wrong previously too.

The current thread may be initially processing some other resource -- my suggestion is to just remove this throw e and let the return/raise at the bottom handle this instead if it is appropriate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the resourceName is always the same.

In the catch block, for all the entries inside the processed ArrayList, they are all belonging to the same resourceName.

The first entry is held all the way from the external to internal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so I mean to say --

The caller's interface is this --

public IdealState commit(HelixManager helixManager, String resourceName, Function<IdealState, IdealState> updater, RetryPolicy retryPolicy, boolean noChangeOk);

Since we hash out of 100 buckets, there exists some chance (though quite low) that we may need to process some other resource before the Entry we are interested in.

So the caller might be interested in resourceName <- A but mergedResourceName <- B. So an exceptioni raised while trying to perform B should not be thrown to A since it is not related to the callers intent.

Copy link
Contributor Author

@xiangfu0 xiangfu0 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that why the method first get first entry to extract mergedResourceName then in the iterator loop, it will skip entries with other resource name by:

if (!ent._resourceName.equals(mergedResourceName)) {
    continue;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit call may process other resource, but because of the while loop here:
image

It will eventually process the entry it pushes, or the entry got processed by other threads already, then this commit thread could just read the result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so imagine your queue slot looks like [A]

Some call to commit(_, B, _) is made -- [A, B].
Since nothing is running, our thread drives the execution, starting with A.

Now, consider if A fails and raises an exception.
The queue will look like: [B], but our thread which "owns" the Entry would have already failed.
Worse, some other thread will come and execute our commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, changed the logic to only operate on its own resource.

Copy link
Contributor

@mayankshriv mayankshriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add logging and even metrics)on batch commit, if we don't already have it:

  • How many IS updates in the batch?
  • Time for update
  • Retry count

@xiangfu0
Copy link
Contributor Author

It would be good to add logging and even metrics)on batch commit, if we don't already have it:

  • How many IS updates in the batch?
  • Time for update
  • Retry count

Yes, metrics are here: https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java#L294

@xiangfu0 xiangfu0 force-pushed the fix-ideal-state-group-commit branch from b7a6b40 to f5f9f5c Compare October 15, 2024 17:44
@xiangfu0 xiangfu0 force-pushed the fix-ideal-state-group-commit branch 2 times, most recently from 6b98d1b to d3233ba Compare October 16, 2024 01:03
Entry ent = it.next();
if (!ent._resourceName.equals(mergedResourceName)) {
continue;
IdealState response = updateIdealState(helixManager, resourceName, idealState -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern with handling only the resource for the current thread is that it breaks the fifo behavior of the queue. So I wonder if some requests would be unfairly starved if they get unlucky?

Copy link
Contributor Author

@xiangfu0 xiangfu0 Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the fifo commiter should handle the corresponding entry as well. From the caller perspective, this call is still sync. But you are also right, if unluck then the external caller might be timeout or drop the requests.

@xiangfu0 xiangfu0 force-pushed the fix-ideal-state-group-commit branch from d3233ba to 4c46620 Compare October 16, 2024 01:49
@xiangfu0 xiangfu0 force-pushed the fix-ideal-state-group-commit branch from 4c46620 to 4643467 Compare October 16, 2024 02:08
@xiangfu0 xiangfu0 merged commit 6061b89 into apache:master Oct 16, 2024
21 checks passed
@xiangfu0 xiangfu0 deleted the fix-ideal-state-group-commit branch October 16, 2024 12:25
@ankitsultana
Copy link
Contributor

@xiangfu0 : this might be causing UTs to fail like in this PR: #14251 (the PR was forked from 76b219b)

I see similar issue with this PR: #14249.

I am unable to figure out the exact root-cause for the test failure because the unit-test logs are only partially loading.. when they loaded I was seeing a ton of IdealStateCommit related logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants