Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28672 Ensure large batches are not indefinitely blocked by quotas #6003

Merged
merged 4 commits into from
Jul 8, 2024

Conversation

rmdmattingly
Copy link
Contributor

https://issues.apache.org/jira/browse/HBASE-28672

At my day job we are trying to implement default quotas for a variety of access patterns. We began by introducing a default read IO limit per-user, per-machine — this has been very successful in reducing hotspots, even on clusters with thousands of distinct users.

While implementing a default writes/second throttle, I realized that doing so would put us in a precarious situation where large enough batches may never succeed. If your batch size is greater than your TimeLimiter's max throughput, then you will always fail in the quota estimation stage; this will stick the client in a backoff-and-retry loop until it times out. Meanwhile IO estimates are more optimistic, deliberately, which can let large requests do targeted oversubscription of an IO quota:

// assume 1 block required for reads. this is probably a low estimate, which is okay
readConsumed = numReads > 0 ? blockSizeBytes : 0;

This is okay because the Limiter's availability will go negative and force a longer backoff on subsequent requests. I believe this is preferable UX compared to a doomed throttling loop.

In this PR I introduce batch estimation that takes the limiter's max throughput into account, and I've added unit tests to validate the intended behavior.

@ndimiduk @hgromer @bozzkar

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Member

@ndimiduk ndimiduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this improvement and the behavior you demonstrate in the tests. It seems like a fair kind of latching mechanism for letting requests trickle through.

Previously, this limiting system acted as a gate keeper in terms of placing an effective upper bound on a user. This patch removes that upper bound. Should we consider also introducing explicitly a hard-cap/upper bound?

@rmdmattingly
Copy link
Contributor Author

That's a good question. Despite the existing behavior, I think the intention was always for this to be more of a soft limit — the TimedQuota proto actually calls the field soft_limit:

message TimedQuota {
  required TimeUnit time_unit = 1;
  optional uint64 soft_limit  = 2;
  optional float share = 3;
  optional QuotaScope scope  = 4 [default = MACHINE];
}

That said, intention is one thing and reality is another. I could also see people missing the hard limits if that's how they're using quotas.

I'd probably prefer to make the addition of a hard limit a separate effort, though, because my guess is that it will actually be quite a bit of work relative to its value. For example, coming up with good enough estimations of IO-heavy request load to earnestly guarantee any sort of "hard limit" is going to be tricky/impossible

@bbeaudreault
Copy link
Contributor

bbeaudreault commented Jun 21, 2024

Ideally for a multi request it would be able to execute whatever actions fit within the available quota. This maintains the hard limit but also gives a user more likelihood to eventually finish their request through multiple rpc calls. Similar to the handle of MultiActionResultTooLarge.

@rmdmattingly
Copy link
Contributor Author

I agree with you, and considered the MARTL approach internally but didn't do a good job of documenting this publicly.

My pushback is not that we shouldn't add support for some sort of MultiActionBatchTooLarge exception, but rather that it would be a pretty significant lift and is, imo, not necessarily in the scope of this issue. It's always been possible to oversubscribe IO quotas, and this is advertised as a soft limit. Despite it being a soft limit, there's a bug here which throws certain request shapes into indefinite backoff/retry loops.

It's definitely righteous work to also think about improving the server's respect of the limit via request chunking, but I think that's sort of its own task and, given the difference in changeset required, I don't think a lack of MABTL exceptions should block our fixing of this bug

Copy link
Contributor

@bbeaudreault bbeaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I agree we shouldn't let this create an impossible to succeed request. If we wanted to do that, we should throw a DoNotRetryException instead of a throttling exception.

// the next request should be rejected
assertThrows(RpcThrottlingException.class, () -> quota.checkBatchQuota(0, 1));

Thread.sleep(1000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's preferable not to sleep. Instead usually you can influence the timestamp by adding 1s to it or whatever.

The reason not to sleep is that you're more subject to random Jenkins slowdowns or other problems. We do it in a lot of places of course, but iirc the quota stuff already can do some timestamp manipulation?

@bbeaudreault
Copy link
Contributor

Would we have the same problem with write size?

@rmdmattingly
Copy link
Contributor Author

Oh yeah, good point. I bet we would. I can look into a fix there too

@rmdmattingly
Copy link
Contributor Author

rmdmattingly commented Jun 21, 2024

Actually, there's no realistic problem with write size. At least, no more than read size. Both make very naive lowball estimates based on the number of reads/writes:

protected void updateEstimateConsumeBatchQuota(int numWrites, int numReads) {
writeConsumed = estimateConsume(OperationType.MUTATE, numWrites, 100);
if (useResultSizeBytes) {
readConsumed = estimateConsume(OperationType.GET, numReads, 100);
} else {
// assume 1 block required for reads. this is probably a low estimate, which is okay
readConsumed = numReads > 0 ? blockSizeBytes : 0;
}
writeCapacityUnitConsumed = calculateWriteCapacityUnit(writeConsumed);
readCapacityUnitConsumed = calculateReadCapacityUnit(readConsumed);
}

It's theoretically possible for the estimate to exceed the max IO allowed though, so maybe it's worth putting limits in place for both the read and write size estimates here? What do you guys think?

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@rmdmattingly
Copy link
Contributor Author

I pushed an addition which ensures we don't block requests indefinitely for read, write, or request size estimates

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 35s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 53s master passed
+1 💚 compile 3m 37s master passed
+1 💚 checkstyle 0m 51s master passed
+1 💚 spotbugs 1m 51s master passed
+1 💚 spotless 0m 53s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 51s the patch passed
+1 💚 compile 2m 56s the patch passed
+1 💚 javac 2m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 34s the patch passed
+1 💚 spotbugs 1m 37s the patch passed
+1 💚 hadoopcheck 5m 14s Patch does not cause any errors with Hadoop 3.3.6.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
32m 13s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6003/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6003
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 54b12a07b010 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 15b5866
Default Java Eclipse Adoptium-17.0.10+7
Max. process+thread count 87 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6003/4/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 59s master passed
+1 💚 compile 0m 54s master passed
+1 💚 javadoc 0m 29s master passed
+1 💚 shadedjars 5m 11s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 46s the patch passed
+1 💚 compile 0m 54s the patch passed
+1 💚 javac 0m 54s the patch passed
+1 💚 javadoc 0m 28s the patch passed
+1 💚 shadedjars 5m 13s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 220m 30s hbase-server in the patch passed.
244m 29s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6003/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6003
Optional Tests javac javadoc unit compile shadedjars
uname Linux 85783adace03 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 15b5866
Default Java Eclipse Adoptium-17.0.10+7
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6003/4/testReport/
Max. process+thread count 4848 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6003/4/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@ndimiduk ndimiduk merged commit 5082212 into apache:master Jul 8, 2024
1 check passed
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jul 8, 2024
…as (apache#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jul 8, 2024
…as (apache#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit to ndimiduk/hbase that referenced this pull request Jul 8, 2024
…as (apache#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jul 10, 2024
…as (#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jul 10, 2024
…as (#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
ndimiduk pushed a commit that referenced this pull request Jul 10, 2024
…as (#6003)

Co-authored-by: Ray Mattingly <rmattingly@hubspot.com>
Signed-off-by: Bryan Beaudreault < bbeaudreault@apache.org>
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
@ndimiduk ndimiduk deleted the HBASE-28672 branch July 10, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants