-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27487: fix slow meta pathological feedback loop with multigets #4900
HBASE-27487: fix slow meta pathological feedback loop with multigets #4900
Conversation
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good, assuming pre-commit checks out. looks like we have a check style warning to fix (long line in the exception string). just a few other small things as well.
Worth nothing that the failAll
pattern was modeled after how AsyncTable handles this, though we had to use exceptions to control the flow due to all of the abstractions in sync Table.
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java
Show resolved
Hide resolved
hbase-client/src/main/java/org/apache/hadoop/hbase/client/CancellableRegionServerCallable.java
Outdated
Show resolved
Hide resolved
...e-client/src/main/java/org/apache/hadoop/hbase/client/OperationTimeoutExceededException.java
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
reformat comments & add additional context to OperationTimeoutExceededException message
🎊 +1 overall
This message was automatically generated. |
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestClientOperationTimeout.java
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
...e-client/src/main/java/org/apache/hadoop/hbase/client/OperationTimeoutExceededException.java
Show resolved
Hide resolved
…tigets (#4900) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…tigets (apache#4900) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…tigets (#4900) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org>
…with multigets (apache#4900)" This reverts commit 7c999c1.
…tigets (apache#4900) Signed-off-by: Bryan Beaudreault <bbeaudreault@apache.org> Signed-off-by: Duo Zhang <zhangduo@apache.org> (cherry picked from commit fea54b6) Change-Id: I677331205f56b97c677398452349f347578b79c1
This only affects the Table implementation in 2.x releases.
This change to the exception thrown and failure response during an operation timeout for multigets ensures we do not create a feedback loop that is impossible to recover from by clearing the meta cache. We skip over the cache clear and simply set each get as failed.
If meta is overloaded, or you send any sufficiently large batch of actions, the resolving of HRegionLocations (which happens sequentially) may take a while. Depending on the operation timeout configured for the client, that duration may already exceed that timeout before even reaching the CancellableRegionServerCallable.call(). When the timeout is exceeded there, a DoNotRetryIOException is thrown. This is considered a cache clearing exception, so any locations that may have been slowly resolved earlier up the chain will be thrown away. If done with enough concurrency, this can create a feedback loop that is impossible to recover from.
cc: @bbeaudreault