Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Fix MPPTask-Moniter may live longer than TiFlashMetrics #9096

Merged
merged 2 commits into from
May 28, 2024

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented May 28, 2024

What problem does this PR solve?

Issue Number: close #9092, close #9097

Problem Summary:

For #9092

TMTContext will start a thread "MPPTask-Moniter" for running checkLongLiveMPPTasks.

void startMonitorMPPTaskThread(const MPPTaskManagerPtr & manager)
{
newThreadManager()->scheduleThenDetach(false, "MPPTask-Moniter", [monitor = manager->getMPPTaskMonitor()] {
monitorMPPTasks(monitor);
});
}

When the TiFlash is shutting down, the thread is not explicitly stopped. And the thread may live longer than the TiFlashMetrics instance. If the TiFlashMetrics instance is released before checkLongLiveMPPTasks run, then checkLongLiveMPPTasks will access to a random address and cause use-after-free data race when shutting down.

For #9097
Seems the race is reported in backtrace-rs, there is nothing we can do in tiflash code, just ignore

What is changed and how it works?

For #9092
In TMTContext::shutdown, set the MPPTaskMonitor->is_shutdown = true. So the thread is expected to be stopped after TMTContext::shutdown is called and before TiFlashMetrics is release. And when monitor->is_shutdown == true, the thread don't report the metircs to TiFlashMetrics

For #9097
Add race:StackTrace::toString, race:DB::SyncPointCtl::sync to tsan.suppression. And they will be ignored when running with TSAN_OPTIONS="suppressions=/tests/sanitize/tsan.suppression" ./dbms/gtests_dbms --gtest_filter=...


Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
cmake .. -GNinja -DCMAKE_BUILD_TYPE=TSAN -DENABLE_TESTS=ON
ninja gtests_dbms tiflash -j32

./dbms/gtests_dbms
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 28, 2024
Copy link
Contributor

@xzhangxian1008 xzhangxian1008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 28, 2024
Copy link
Contributor

ti-chi-bot bot commented May 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Lloyd-Pottiger, xzhangxian1008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 28, 2024
Copy link
Contributor

ti-chi-bot bot commented May 28, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-05-28 07:09:20.493150242 +0000 UTC m=+2760314.250285814: ☑️ agreed by xzhangxian1008.
  • 2024-05-28 08:55:50.739988071 +0000 UTC m=+2766704.497123644: ☑️ agreed by Lloyd-Pottiger.

@ti-chi-bot ti-chi-bot bot merged commit f6518f1 into pingcap:master May 28, 2024
5 checks passed
@JaySon-Huang JaySon-Huang deleted the fix_tsan_2 branch May 28, 2024 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
3 participants