Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore: Expand the usage of thread-wise alloc/dealloc trace #9003

Merged
merged 31 commits into from
May 10, 2024

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Apr 29, 2024

What problem does this PR solve?

Issue Number: close #8835

Problem Summary:

  1. Make it possible that we use thread-wise alloc/dealloc trace in modules other than KVStore
  2. Record both alloc and dealloc, rather than alloc-dealloc

We put this utility in KVStore/FFI, because it is bound to FFI interface which is between C++ and Rust. That is also why we call is a joint.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 29, 2024
@CalvinNeo
Copy link
Member Author

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/pull-integration-test

@CalvinNeo
Copy link
Member Author

/pull-unit-test

a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/pull-unit-test

CalvinNeo and others added 3 commits April 30, 2024 00:33
a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
z
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

CalvinNeo commented Apr 30, 2024

image

@purelind
Copy link
Collaborator

/retest

a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
std::unordered_map<std::string, uint64_t> agg_deallocate;
for (const auto & [k, v] : kvstore_map)
{
auto agg_thread_name = getThreadNameAggPrefix(std::string_view(k.data(), k.size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto agg_thread_name = getThreadNameAggPrefix(std::string_view(k.data(), k.size()));
auto agg_thread_name = getThreadNameAggPrefix(k);

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo CalvinNeo requested a review from JinheLin April 30, 2024 08:17
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
bool is_terminated{false};
mutable std::mutex monitoring_mut;
std::condition_variable monitoring_cv;
std::thread * monitoring_thread{nullptr};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use pointer here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no specific reason, just follows the original code...

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
LOG_INFO(DB::Logger::get(), "Stop collecting thread alloc metrics");
{
std::unique_lock lk(monitoring_mut);
if (monitoring_thread == nullptr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances will this pointer be empty?

Copy link
Contributor

@JaySon-Huang JaySon-Huang May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when stopThreadAllocInfo may be called more than one time

tiflash_metrics.setProxyThreadMemory("dealloc_" + k, data.dealloc);
}

void JointThreadInfoJeallocMap::stopThreadAllocInfo()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call it in ContextShared::shutdown explicitly to make sure the monitoring_thread is stopped before TiFlashMetrics::instance is released

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I shall move joint_memory_allocation_map from Context to ContextShared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems TiFlashMetrics::instance() is a singleton, I think maybe it actually overlives Context

TiFlashMetrics & TiFlashMetrics::instance()
{
    static TiFlashMetrics inst; // Instantiated on first use.
    return inst;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I shall move joint_memory_allocation_map from Context to ContextShared?

Yes, Context is a session level instance, the long-live instances should be in ContextShared

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 10, 2024
CalvinNeo and others added 4 commits May 10, 2024 12:29
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
a
Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/retest

Signed-off-by: CalvinNeo <calvinneo1995@gmail.com>
@CalvinNeo
Copy link
Member Author

/test pull-unit-test

Copy link
Contributor

ti-chi-bot bot commented May 10, 2024

@CalvinNeo: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-integration-test
  • /test pull-unit-test

Use /test all to run all jobs.

In response to this:

/test pull-unit-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@CalvinNeo
Copy link
Member Author

/retest

Copy link
Contributor

@JaySon-Huang JaySon-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

dbms/src/Common/TiFlashMetrics.cpp Outdated Show resolved Hide resolved
dbms/src/Common/TiFlashMetrics.cpp Outdated Show resolved Hide resolved
@@ -32,7 +32,6 @@
#include <IO/BaseFile/fwd.h>
#include <IO/Buffer/ReadBufferFromFile.h>
#include <IO/FileProvider/FileProvider.h>
#include <Interpreters/Context.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this line is removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I will add that back. however to my surprise, it compiles

CalvinNeo and others added 2 commits May 10, 2024 17:55
Co-authored-by: JaySon <tshent@qq.com>
Co-authored-by: JaySon <tshent@qq.com>
@CalvinNeo
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 10, 2024
Copy link
Contributor

ti-chi-bot bot commented May 10, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,Lloyd-Pottiger]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented May 10, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-05-09 03:37:42.484254096 +0000 UTC m=+1106016.241389662: ☑️ agreed by Lloyd-Pottiger.
  • 2024-05-10 11:16:22.714299086 +0000 UTC m=+1219936.471434658: ☑️ agreed by JaySon-Huang.

@ti-chi-bot ti-chi-bot bot merged commit 22417fd into pingcap:master May 10, 2024
5 checks passed
This was referenced May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support thread-wise memory alloc/dealloc monitor
5 participants