Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-24813 ReplicationSource should clear buffer usage on Replicatio… #2546

Merged
merged 4 commits into from
Jan 5, 2021

Conversation

wchevreuil
Copy link
Contributor

…nSourceManager upon termination (rebased after HBASE-25117)

…nSourceManager upon termination (rebased after HBASE-25117)
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@wchevreuil wchevreuil requested a review from infraio October 22, 2020 15:44
Copy link
Contributor

@esteban esteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments.

Comment on lines 360 to 363
} catch (InterruptedException e) {
LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch: {}",
this.source.getPeerId(), this.getName(), e);
Thread.currentThread().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be just INFO? Also, I think it might be better tho handle those InterruptedException inside ReplicationSource.terminate().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as WARN because it aborts the flow without effectively updating the buffer usage, which is the fundamental issue we are trying to solve here.

for (ReplicationSourceShipper worker : workers) {
worker.stopWorker();
if (worker.entryReader != null) {
worker.entryReader.setReaderRunning(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a worker is doing some async work when it is asked to stop and can take time. then I think we should keep the implementation as it was done before, like ask all to stop at once and then wait. because if no. of workers gets large due to backlog and someone changes wait time config to 10s of seconds, then removePeer command/procedure has to wait for a long time (no. of workers * (sleep time + time for clearWalEntryBatch) ) to terminate the replication source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not following your concern here. I don't see how the extra loop in the same method context just setting two a flag in the shipper and other in the reader can help with the contention scenario described, terminate execution would be stuck in the second for loop anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let me try to explain again.
I was referring to restore this loop.

for (ReplicationSourceShipper worker : workers) {	
      worker.stopWorker();	
      if(worker.entryReader != null) {	
        worker.entryReader.setReaderRunning(false);	
      }	
    }

As your current flow is stopping the worker in a linear manner:-

  • Stop a worker
  • wait for the worker thread to complete.
  • stop another worker
  • wait for it finishes
  • continue for others......
    So in the worst case, you would have to wait for the number of workers * min(time taken by the worker to finish, timeout)

though by restoring the old loop, you are parallelizing the stopping of the workers.

  • ask all worker threads to finish their work by setting their state.
  • then in the second loop, wait for each worker to finish, while you are waiting for 1 worker, others are also completing their work in parallel.
  • so when you are done with one worker it is possible that all other workers are also done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you, thanks for explaining in more details. Will address it on next commit.

LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
+ "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
+ "thread to stop.", this.source.getPeerId());
Thread.currentThread().interrupt();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need additional interrupt here when ReplicationSource.terminate() is already interrupted the worker thread prior to clearWALEntryBatch method call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just interrupting if either shipper or reader thread is still alive. We can't guarantee that the caller will always have stopped these threads, therefore, the extra check here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    1. This method should only be called upon replication source termination.

so what this interrupt will do, how is it handled in the source?

LOG.warn("Interrupting source thread for peer {} without cleaning buffer usage "
            + "because clearWALEntryBatch method timed out whilst waiting reader/shipper "
            + "thread to stop.", this.source.getPeerId());

don't we need to return here as we timed out and not clearing the batch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not been handled. Changing to simply log the exceptional and return back to source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if return, then we do not clean the batch, so replication quota will be leaked.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 36s master passed
+1 💚 checkstyle 1m 5s master passed
+1 💚 spotbugs 1m 59s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 31s the patch passed
-0 ⚠️ checkstyle 1m 3s hbase-server: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 19m 28s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 19s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
41m 44s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2546
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux a35191384864 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f0c430a
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
Max. process+thread count 94 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/console
versions git=2.17.1 maven=3.6.3 spotbugs=3.1.12
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 4s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 36s master passed
+1 💚 compile 0m 55s master passed
+1 💚 shadedjars 6m 33s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 30s the patch passed
+1 💚 compile 0m 56s the patch passed
+1 💚 javac 0m 56s the patch passed
+1 💚 shadedjars 6m 34s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
-1 ❌ unit 146m 15s hbase-server in the patch failed.
172m 54s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2546
Optional Tests javac javadoc unit shadedjars compile
uname Linux 6ee290d5b71b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f0c430a
Default Java AdoptOpenJDK-1.8.0_232-b09
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/testReport/
Max. process+thread count 4596 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 10m 25s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 20s master passed
+1 💚 compile 1m 25s master passed
+1 💚 shadedjars 8m 17s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 45s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 50s the patch passed
+1 💚 compile 1m 20s the patch passed
+1 💚 javac 1m 20s the patch passed
+1 💚 shadedjars 7m 34s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 45s the patch passed
_ Other Tests _
-1 ❌ unit 196m 32s hbase-server in the patch failed.
239m 8s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2546
Optional Tests javac javadoc unit shadedjars compile
uname Linux 347e0454744b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / f0c430a
Default Java AdoptOpenJDK-11.0.6+10
unit https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-server.txt
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/testReport/
Max. process+thread count 3643 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/3/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@ankitsinghal ankitsinghal self-requested a review November 11, 2020 02:31
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 38s master passed
+1 💚 checkstyle 1m 3s master passed
+1 💚 spotbugs 2m 1s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 26s the patch passed
-0 ⚠️ checkstyle 1m 2s hbase-server: The patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 16m 58s Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.
+1 💚 spotbugs 2m 10s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 13s The patch does not generate ASF License warnings.
38m 28s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #2546
Optional Tests dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle
uname Linux 484e01be7b37 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 57d9cae
checkstyle https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/artifact/yetus-general-check/output/diff-checkstyle-hbase-server.txt
Max. process+thread count 94 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/console
versions git=2.17.1 maven=3.6.3 spotbugs=3.1.12
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 34s master passed
+1 💚 compile 0m 55s master passed
+1 💚 shadedjars 6m 35s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 39s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 3m 25s the patch passed
+1 💚 compile 0m 58s the patch passed
+1 💚 javac 0m 58s the patch passed
+1 💚 shadedjars 6m 32s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 37s the patch passed
_ Other Tests _
+1 💚 unit 145m 46s hbase-server in the patch passed.
172m 20s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #2546
Optional Tests javac javadoc unit shadedjars compile
uname Linux 2fb2bc90cca7 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 57d9cae
Default Java AdoptOpenJDK-1.8.0_232-b09
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/testReport/
Max. process+thread count 4250 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 16s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 4m 44s master passed
+1 💚 compile 1m 14s master passed
+1 💚 shadedjars 7m 18s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 41s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 4m 32s the patch passed
+1 💚 compile 1m 12s the patch passed
+1 💚 javac 1m 12s the patch passed
+1 💚 shadedjars 7m 24s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 42s the patch passed
_ Other Tests _
+1 💚 unit 193m 4s hbase-server in the patch passed.
224m 0s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #2546
Optional Tests javac javadoc unit shadedjars compile
uname Linux b6439481f660 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 57d9cae
Default Java AdoptOpenJDK-11.0.6+10
Test Results https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/testReport/
Max. process+thread count 3422 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2546/4/console
versions git=2.17.1 maven=3.6.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

}
} catch (InterruptedException e) {
LOG.warn("{} Interrupted while waiting {} to stop on clearWALEntryBatch. "
+ "Not cleaning buffer usage: {}", this.source.getPeerId(), this.getName(), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please restore interrupt flag here (Thread.currentThread().interrupt();) and then return.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do any handling of interrupt at ReplicationSource. Would you still think we need this here?

@wchevreuil wchevreuil merged commit 8584263 into apache:master Jan 5, 2021
wchevreuil added a commit that referenced this pull request Jan 12, 2021
#2546) (#2849)

Signed-off-by: Ankit Singhal <ankit@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
wchevreuil added a commit that referenced this pull request Jan 12, 2021
#2546) (#2849)

Signed-off-by: Ankit Singhal <ankit@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
wchevreuil added a commit that referenced this pull request Jan 12, 2021
#2546) (#2849)

Signed-off-by: Ankit Singhal <ankit@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
(cherry picked from commit fdae12d)
wchevreuil added a commit to wchevreuil/hbase that referenced this pull request May 24, 2021
apache#2546) (apache#2849)

Signed-off-by: Ankit Singhal <ankit@apache.org>
Signed-off-by: Josh Elser <elserj@apache.org>
(cherry picked from commit fdae12d)
(cherry picked from commit 3242c8a)

Change-Id: I8552da6cb7b37271204e255a6ca96a8af544da48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants