-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-24779 Report on the WAL edit buffer usage/limit for replication #2193
Conversation
🎊 +1 overall
This message was automatically generated. |
|
||
@Override | ||
public void setWALReaderEditsBufferBytes(long usage) { | ||
//noop. Global limit, tracked globally. Do not need per-source metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wouldn't it still be useful to know if particular sources were eating up the global limit? or do we not have enough information tracked already to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we chuck something into this usage, we have zero insight back to which source put it there. Wellington was working on this buffer tracking in HBASE-24813, but I don't think per-source tracking was "in scope"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we chuck something into this usage, we have zero insight back to which source put it there.
I had the same question. I was wondering if a drill down by source would be helpful in addition to the global usage. Correct me if I'm wrong, looks like ReplicationSource class has access to the MetricsSource object.. we can just update the byte usage for that source? (and the global too at the same time). That way we can also get rid of the special logic to update one metric setWALReaderEditsBufferBytes(). Does that not work for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like ReplicationSource class has access to the MetricsSource object.. we can just update the byte usage for that source? (and the global too at the same time). That way we can also get rid of the special logic to update one metric setWALReaderEditsBufferBytes()
You are correct that we could do that. I wanted to keep this change scoped on "make what we currently have reportable". I am all for doing a per-source tracking in addition to the globally-scoped tracking. I'd rather just keep these two things separate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Looks like you pushed the metrics update logic into the source, the patch looks clean now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! Just made another interface to isolate the new additions which cleans this up a little. I think your suggestion is still good for better, fine-grained tracking. However, since HBASE-20417, hopefully no one else runs into this ;)
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java
Show resolved
Hide resolved
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationSourceSource.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just small nits, but good to go without it. There might be some new checkstyle issues, fine to commit it once the build reports ok.
...a/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationGlobalSourceSource.java
Show resolved
Hide resolved
...c/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to merge imho after fixing the license header
...a/org/apache/hadoop/hbase/replication/regionserver/MetricsReplicationGlobalSourceSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 once the qabot is green.
Thanks folks! Just pushed a couple more commits for cleanup on QA and wellington's suggestions. I'll merge when QA is happy. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Interesting. TestWALEntryStream timed out for me too. Will dig in. |
Ah, a mocking issue. Fixing. |
* Fix up metrics source interface hierarchy * Docs for test replication endpoint * Push down metrics updating to mutation of the buffer usage variable (rather than on a timer)
e2c75b5
to
7695a70
Compare
I got 99 problems and all of them are due to mocking. The second UT failure was also due to a mock returning a value of 0 instead of the |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
TestReplicationSource failure appears to be due to HBASE-24817 (#2198)
We end up filtering out an edit because it's empty, but we expect to not filter it because it's not a system table edit. I'll make the same mocking changes to prevent future pain in TestReplicationSource (like I did for TestWALEntryStream) in a follow-on, as well as fix this test. FYI @saintstack Pushing this to make some progress. |
Closes #2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Filed https://issues.apache.org/jira/browse/HBASE-24834 to address the TestReplicationSource failure mentioned above. |
Closes apache#2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> (cherry picked from commit 303db63) Change-Id: If1b3662792304747d2942dbc52338e2108b1a764
Closes apache#2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> (cherry picked from commit 303db63)
Closes apache#2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> (cherry picked from commit 303db63) Change-Id: If1b3662792304747d2942dbc52338e2108b1a764
Closes apache#2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org> (cherry picked from commit 303db63) Change-Id: If1b3662792304747d2942dbc52338e2108b1a764
Closes apache#2193 Signed-off-by: Bharath Vissapragada <bharathv@apache.org> Signed-off-by: Sean Busbey <busbey@apache.org> Signed-off-by: Wellington Chevreuil <wchevreuil@apache.org>
Testing done:
And then to observe:
replication.stats.thread.period.seconds
to15
in hbase-site.xml and tail regionserver logwatch -n 15 "curl -s http://mizar.local:16030/jmx | jq '.beans[] | select(.name == \"Hadoop:service=HBase,name=RegionServer,sub=Replication\") | .\"source.walReaderEditsBufferUsage\"'"
Can see the buffer grow. When the puts run out, the buffer eventually drains.