-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in IdealStateGroupCommit #14237
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14237 +/- ##
============================================
+ Coverage 61.75% 63.83% +2.08%
- Complexity 207 1535 +1328
============================================
Files 2436 2623 +187
Lines 133233 144452 +11219
Branches 20636 22108 +1472
============================================
+ Hits 82274 92211 +9937
- Misses 44911 45432 +521
- Partials 6048 6809 +761
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
0772b24
to
b7a6b40
Compare
IdealState response = updateIdealState(helixManager, mergedResourceName, idealState -> { | ||
IdealState updatedIdealState = first._updater.apply(idealState); | ||
first._updatedIdealState = updatedIdealState; | ||
first._exception = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be made to never be non-null in this line. For example when picking first we might want to skip (or remove) any Entries with exceptions (if this case is even possible)?
Namely I think we should avoid some case where the requesting thread gets an exception but some other thread still picks up the Entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is retriable exception and non-retriable. E.g. Update IS might be timed out.
You can refer to HelixHelperTest
.
Here we rely on the RetryPolicy to fail eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, yes that makes sense :) This is what I get for reviewing under-caffeinated. Thanks!
throw ex; | ||
} | ||
} catch (Throwable e) { | ||
// If the update failed, we should re-add all entries to the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the intent here with this comment? It doesn't seem to re-add these to the queue (and I feel it shouldn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me fix the comments
IdealState finalUpdatedIdealState = updatedIdealState; | ||
updateIdealState(helixManager, resourceName, anyIdealState -> finalUpdatedIdealState, | ||
retryPolicy, noChangeOk); | ||
throw e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should only throw if the mergedResourceName matches the one originally requested perhaps?
Otherwise some "bad" znode update would cause totally unrelated failures.
Plus the "owning" thread of the future Entry
would have returned a failure but the Entry
might eventually still be processed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the exception will directly throw, here we just set exception for all the batch entries. So other threads waiting for these entries won't be blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think directly throwing was wrong previously too.
The current thread may be initially processing some other resource -- my suggestion is to just remove this throw e
and let the return/raise at the bottom handle this instead if it is appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the resourceName is always the same.
In the catch block, for all the entries
inside the processed
ArrayList, they are all belonging to the same resourceName.
The first
entry is held all the way from the external to internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm so I mean to say --
The caller's interface is this --
public IdealState commit(HelixManager helixManager, String resourceName, Function<IdealState, IdealState> updater, RetryPolicy retryPolicy, boolean noChangeOk);
Since we hash out of 100 buckets, there exists some chance (though quite low) that we may need to process some other resource before the Entry we are interested in.
So the caller might be interested in resourceName <- A
but mergedResourceName <- B
. So an exceptioni raised while trying to perform B
should not be thrown to A
since it is not related to the callers intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that why the method first get first
entry to extract mergedResourceName
then in the iterator loop, it will skip entries with other resource name by:
if (!ent._resourceName.equals(mergedResourceName)) {
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so imagine your queue slot looks like [A]
Some call to commit(_, B, _)
is made -- [A, B]
.
Since nothing is running, our thread drives the execution, starting with A
.
Now, consider if A
fails and raises an exception.
The queue will look like: [B]
, but our thread which "owns" the Entry would have already failed.
Worse, some other thread will come and execute our commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, changed the logic to only operate on its own resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add logging and even metrics)on batch commit, if we don't already have it:
- How many IS updates in the batch?
- Time for update
- Retry count
Yes, metrics are here: https://github.com/apache/pinot/blob/master/pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java#L294 |
b7a6b40
to
f5f9f5c
Compare
6b98d1b
to
d3233ba
Compare
pinot-common/src/main/java/org/apache/pinot/common/utils/helix/IdealStateGroupCommit.java
Outdated
Show resolved
Hide resolved
Entry ent = it.next(); | ||
if (!ent._resourceName.equals(mergedResourceName)) { | ||
continue; | ||
IdealState response = updateIdealState(helixManager, resourceName, idealState -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern with handling only the resource for the current thread is that it breaks the fifo behavior of the queue. So I wonder if some requests would be unfairly starved if they get unlucky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the fifo commiter should handle the corresponding entry as well. From the caller perspective, this call is still sync. But you are also right, if unluck then the external caller might be timeout or drop the requests.
d3233ba
to
4c46620
Compare
4c46620
to
4643467
Compare
@xiangfu0 : this might be causing UTs to fail like in this PR: #14251 (the PR was forked from 76b219b) I see similar issue with this PR: #14249. I am unable to figure out the exact root-cause for the test failure because the unit-test logs are only partially loading.. when they loaded I was seeing a ton of IdealStateCommit related logs. |
This fix is inspired by #14214
Handles the failure scenario gracefully.
Test
Enhance the test to update 2000 times for each of the 20 table IdealStates.
All updates are sent from 100 processors to test the race condition for no loss of the events.