Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【腾讯犀牛鸟开源课题实战】prometheus插件专项建设(PUSH模式支持等) #175

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
2 changes: 2 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ build --copt=-O2
#build --copt=-g --strip=never
build --jobs 16
#test --cache_test_results=no --test_output=errors

build --define trpc_include_prometheus=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

默认关闭prometheus,这行可以删掉

7 changes: 4 additions & 3 deletions docs/zh/prometheus_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ client:
| aApp | 主调app名 |
| aServer | 主调server名 |
| aService | 主调service名 |
| aIp | 主调ip地址 |
| pApp | 被调app名 |
| pServer | 被调server名 |
| pService | 被调service名 |
Expand All @@ -109,6 +110,7 @@ client:
| frame_ret_code | 调用的框架错误码 |
| interface_ret_code | 调用的接口错误码 |


### 被调监控上报

只需在框架配置文件的 `server` 中加上 `prometheus` 拦截器,即可开启被调监控:
Expand All @@ -124,7 +126,6 @@ server:

统计数据:

```mermaid
| 监控名 | 监控类型 | 说明 |
| ------ | ------ | ------ |
| rpc_server_counter_metric | Counter | 服务端收到的请求总次数 |
Expand All @@ -149,7 +150,7 @@ server:
| pConSetId | 被调所属set |
| frame_ret_code | 调用的框架错误码 |
| interface_ret_code | 调用的接口错误码 |
```


## 属性监控上报

Expand Down Expand Up @@ -314,7 +315,7 @@ single_metrics_info.single_attr_info.value = 1;

#### 通用多维属性上报

Prometheus 监控插件支持框架通用的多维属性上报方式,即通过构造 `::trpc::TrpcMultiAttrMetricsInfo` 然后使用`::trpc::metrics::MultiAttrReport`接口来上报。**Prometheus 的单维属性上报是指上报统计标签包含多个键值对的数据。**。
Prometheus 监控插件支持框架通用的多维属性上报方式,即通过构造 `::trpc::TrpcMultiAttrMetricsInfo` 然后使用`::trpc::metrics::MultiAttrReport`接口来上报。**Prometheus 的多维属性上报是指上报统计标签包含多个键值对的数据。**。

设置 `::trpc::TrpcMultiAttrMetricsInfo` 值需要注意:

Expand Down
10 changes: 10 additions & 0 deletions examples/features/prometheus/proxy/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,13 @@ cc_library(
"@trpc_cpp//trpc/metrics/prometheus:prometheus_metrics_api",
],
)

cc_binary(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要push这个文件,去掉与之相关的编译引入

name = "push",
srcs = ["push.cc"],
deps = [
"@trpc_cpp//trpc/metrics/prometheus:prometheus_metrics_api",
"@trpc_cpp//trpc/log:trpc_log",

],
)
6 changes: 6 additions & 0 deletions examples/features/prometheus/proxy/forward_service.cc
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,12 @@ ::trpc::Status ForwardServiceImpl::Route(::trpc::ServerContextPtr context,
"counter_name", "counter_desc", {{"const_counter_key", "const_counter_value"}});
::prometheus::Counter& counter = counter_family->Add({{"counter_key", "counter_value"}});
counter.Increment(random_num);

if (::trpc::prometheus::PushMetricsInfo()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为啥这里还需要手动调用呢?不能配置一下yaml文件就生效吗?

TRPC_FMT_INFO("Successfully pushed metrics to Pushgateway");
} else {
TRPC_FMT_ERROR("Failed to push metrics to Pushgateway");
}
#endif

auto client_context = ::trpc::MakeClientContext(context, greeter_proxy_);
Expand Down
22 changes: 22 additions & 0 deletions examples/features/prometheus/proxy/push.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#include <chrono>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件和框架无关,没必要增加,用法放在文档就好了

#include <thread>
#include "trpc/metrics/prometheus/prometheus_metrics_api.h"
#include "trpc/log/trpc_log.h"



int main(int argc, char** argv) {

while (true) {
if (::trpc::prometheus::PushMetricsInfo())
{
std::cout << "Successfully pushed metrics to Pushgateway" << std::endl;
} else {
std::cerr << "Failed to push metrics to Pushgateway" << std::endl;
}

std::this_thread::sleep_for(std::chrono::seconds(5)); // 每60秒推送一次
}

return 0;
}
5 changes: 5 additions & 0 deletions examples/features/prometheus/proxy/trpc_cpp_fiber.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,11 @@ plugins:
const_labels:
const_key1: const_value1
const_key2: const_value2
push_mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

例子需要演示pull模式和push模式,应该给出2个文件配置

enabled: true
gateway_url: "http://pushgateway:9091"
job_name: "test_job"
push_interval_seconds: 2
log:
default:
- name: default
Expand Down
2 changes: 1 addition & 1 deletion examples/features/prometheus/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ echo "begin"
sleep 1
./bazel-bin/examples/features/prometheus/proxy/forward_server --config=examples/features/prometheus/proxy/trpc_cpp_fiber.yaml &
sleep 1
./bazel-bin/examples/features/prometheus/client/client_config --config=examples/features/prometheus/client/trpc_cpp_fiber.yaml
./bazel-bin/examples/features/prometheus/client/client --client_config=examples/features/prometheus/client/trpc_cpp_fiber.yaml

killall helloworld_svr
if [ $? -ne 0 ]; then
Expand Down
78 changes: 78 additions & 0 deletions profiling/sysvars.data
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
<div id="layer1">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉不需要的文件

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除。

<p class="nonplot-variable">gcc_version: <span id="value-gcc_version">11.4.0</span></p>
<br>
<p class="nonplot-variable">kernel: <span id="value-kernel">Linux 1dab1b7b0173 5.15.0-113-generic #123~20.04.1-Ubuntu SMP Wed Jun 12 17:33:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
</span></p>
<br>
<p class="nonplot-variable">core_nums: <span id="value-core_nums">80</span></p>
<br>
<p class="nonplot-variable">user_name: <span id="value-user_name">unknown</span></p>
<br>
<p class="nonplot-variable">work_directory: <span id="value-work_directory">/home/TRPC/trpc-cpp</span></p>
<br>
<p class="nonplot-variable">command_line: <span id="value-command_line">./bazel-bin/examples/features/prometheus/proxy/forward_server
--config=examples/features/prometheus/proxy/trpc_cpp_fiber.yaml
</span></p>
<br>
<p class="nonplot-variable">running_time: <span id="value-running_time">15.100000(hours)</span></p>
<br>
<p class="variable"><font color='#0000FF'><u>proc_loadavg_1m: <span id="value-proc_loadavg_1m">18.080000</span></u></font></p>
<div class="detail"><div id="proc_loadavg_1m" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_loadavg_5m: <span id="value-proc_loadavg_5m">33.330000</span></u></font></p>
<div class="detail"><div id="proc_loadavg_5m" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_loadavg_15m: <span id="value-proc_loadavg_15m">33.260000</span></u></font></p>
<div class="detail"><div id="proc_loadavg_15m" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_real_time: <span id="value-proc_real_time">1076(secs)</span></u></font></p>
<div class="detail"><div id="proc_real_time" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_sys_time: <span id="value-proc_sys_time">700(secs)</span></u></font></p>
<div class="detail"><div id="proc_sys_time" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_user_time: <span id="value-proc_user_time">376(secs)</span></u></font></p>
<div class="detail"><div id="proc_user_time" class="flot-placeholder"></div></div>
<br>
<p class="nonplot-variable">pgrp: <span id="value-pgrp">1271218</span></p>
<br>
<p class="nonplot-variable">ppid: <span id="value-ppid">1262277</span></p>
<br>
<p class="nonplot-variable">pid: <span id="value-pid">1271218</span></p>
<br>
<p class="variable"><font color='#0000FF'><u>proc_faults_major: <span id="value-proc_faults_major">0</span></u></font></p>
<div class="detail"><div id="proc_faults_major" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_faults_minor_second: <span id="value-proc_faults_minor_second">567953</span></u></font></p>
<div class="detail"><div id="proc_faults_minor_second" class="flot-placeholder"></div></div>
<br>
<p class="nonplot-variable">fd_count: <span id="value-fd_count">21</span></p>
<br>
<p class="variable"><font color='#0000FF'><u>proc_io_read_bytes_second: <span id="value-proc_io_read_bytes_second">8192</span></u></font></p>
<div class="detail"><div id="proc_io_read_bytes_second" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_io_read_second: <span id="value-proc_io_read_second">257383</span></u></font></p>
<div class="detail"><div id="proc_io_read_second" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_io_write_bytes_second: <span id="value-proc_io_write_bytes_second">44539904</span></u></font></p>
<div class="detail"><div id="proc_io_write_bytes_second" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_io_write_second: <span id="value-proc_io_write_second">10908</span></u></font></p>
<div class="detail"><div id="proc_io_write_second" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_mem_drs: <span id="value-proc_mem_drs">171592</span></u></font></p>
<div class="detail"><div id="proc_mem_drs" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_mem_resident: <span id="value-proc_mem_resident">33043</span></u></font></p>
<div class="detail"><div id="proc_mem_resident" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_mem_share: <span id="value-proc_mem_share">3177</span></u></font></p>
<div class="detail"><div id="proc_mem_share" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_mem_trs: <span id="value-proc_mem_trs">3385</span></u></font></p>
<div class="detail"><div id="proc_mem_trs" class="flot-placeholder"></div></div>
<br>
<p class="variable"><font color='#0000FF'><u>proc_mem_size: <span id="value-proc_mem_size">486951</span></u></font></p>
<div class="detail"><div id="proc_mem_size" class="flot-placeholder"></div></div>
<br>
</div>
6 changes: 6 additions & 0 deletions trpc/admin/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -354,13 +354,19 @@ cc_library(
":admin_handler",
":base_funcs",
"//trpc/util:prometheus",
"//trpc/common/config:trpc_config",
"//trpc/log:trpc_log",
"//trpc/util/http:base64",
"//trpc/util/string:string_helper",
] + select({
"//conditions:default": [],
"//trpc:trpc_include_prometheus": [
"@com_github_jupp0r_prometheus_cpp//pull",
"//trpc/metrics/prometheus:prometheus_metrics",
],
"//trpc:include_metrics_prometheus": [
"@com_github_jupp0r_prometheus_cpp//pull",
"//trpc/metrics/prometheus:prometheus_metrics",
],
}),
)
Expand Down
57 changes: 56 additions & 1 deletion trpc/admin/prometheus_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,66 @@

namespace trpc::admin {

PrometheusHandler::PrometheusHandler() { description_ = "[GET /metrics] get prometheus metrics"; }
PrometheusHandler::PrometheusHandler() {
description_ = "[GET /metrics] get prometheus metrics";
bool ret = TrpcConfig::GetInstance()->GetPluginConfig<PrometheusConfig>(
"metrics", trpc::prometheus::kPrometheusMetricsName, prometheus_conf_);
if (!ret) {
TRPC_LOG_WARN(
"Failed to obtain Prometheus plugin configuration from the framework configuration file. Default configuration "
"will be used.");
}
auto& cfg = prometheus_conf_.auth_cfg;
if (cfg.count("username") && cfg.count("password")) {
username_ = cfg["username"];
password_ = cfg["password"];
has_cfg = true;
} else {
TRPC_LOG_INFO("can not found prometheus auth config");
has_cfg = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

构造函数做了太复杂的事情,可以定义一个Init函数,把这部分逻辑放在Init函数里

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

}

void PrometheusHandler::CommandHandle(http::HttpRequestPtr req, rapidjson::Value& result,
rapidjson::Document::AllocatorType& alloc) {
static std::unique_ptr<::prometheus::Serializer> serializer = std::make_unique<::prometheus::TextSerializer>();

if (has_cfg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_cfg变量名不好,看起来只跟鉴权判断相关,同时可以考虑把鉴权相关信息统一放在一个结构体里。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

std::string token = req->GetHeader("Authorization");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块逻辑有啥用?看起来只是判断用户名和密码是否匹配,判断之后有啥用?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只有用户名密码都正确的情况下,才会返回metric数据,否则拒绝请求。

Copy link
Contributor

@weimch weimch Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用户名和密码,在prometheus的gateway服务哪里能配置呢?文档有给出吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

想起来了,这是pull模式的,那用户名和密码在prometheus服务器里哪里能配置呢?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档我还没有写,确认一下,Prometheus鉴权相关的使用方法是直接添加在prometheus_metrics.md吗?

auto splited = Split(token, ' ');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

鉴权部分单独提出一个类私有成员接口

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

if (splited.size() != 2) {
result.AddMember("message", "wrong request without authorization", alloc);
TRPC_LOG_INFO("error token: " << token);
return;
}
if (splited[0] != "Basic") {
result.AddMember("message", "wrong request without right auth", alloc);
TRPC_LOG_INFO("error token: " << token);
return;
}

std::string username_pwd = http::Base64Decode(std::begin(splited[1]), std::end(splited[1]));
auto sp = Split(username_pwd, ':');
if (sp.size() != 2) {
result.AddMember("message", "wrong request without authorization", alloc);
TRPC_LOG_INFO("error token: " << token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用错误日志宏 TRPC_FMT_ERROR

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

return;
weimch marked this conversation as resolved.
Show resolved Hide resolved
}

auto username = sp[0];
if (username != username_) {
result.AddMember("message", "wrong request without right username", alloc);
TRPC_LOG_INFO("error username: " << username << ",right username: " << username_);
return;
}
auto pwd = sp[1];
if (pwd != password_) {
result.AddMember("message", "wrong request without right password", alloc);
TRPC_LOG_INFO("error password: " << pwd << ",right password: " << password_);
return;
}
}


std::string prometheus_str = serializer->Serialize(trpc::prometheus::Collect());
result.AddMember(rapidjson::StringRef("trpc-html"), rapidjson::Value(prometheus_str, alloc).Move(), alloc);
Expand Down
13 changes: 13 additions & 0 deletions trpc/admin/prometheus_handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,13 @@

#include "trpc/admin/admin_handler.h"
#include "trpc/util/prometheus.h"
#include "trpc/util/http/base64.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用clang-format格式化一下,头文件顺序需要按照字母序顺序排列

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。

#include "trpc/util/string/string_helper.h"
#include "trpc/util/time.h"
#include "trpc/log/trpc_log.h"
#include "trpc/metrics/prometheus/prometheus_metrics.h"
#include "trpc/common/config/trpc_config.h"
#include "trpc/metrics/prometheus/prometheus_conf_parser.h"

namespace trpc::admin {

Expand All @@ -26,6 +33,12 @@ class PrometheusHandler : public AdminHandlerBase {

void CommandHandle(http::HttpRequestPtr req, rapidjson::Value& result,
rapidjson::Document::AllocatorType& alloc) override;
private:
PrometheusConfig prometheus_conf_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还是不理解鉴权相关的参数放admin服务的意图,可以描述下

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为Prometheus拉取数据是要走admin服务的,我感觉只有在这里才能拿到http包头中的用户名密码信息,才能进行鉴权。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看实现,只需要填充username和password就好了吧?不需要保留prometheus_conf_,只需要填充CommandHandle里的username和password字段

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改。


std::string username_;
std::string password_;
bool has_cfg;
};

} // namespace trpc::admin
Expand Down
2 changes: 1 addition & 1 deletion trpc/common/plugin.h
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class Plugin : public RefCounted<Plugin> {

/// @brief Stop the runtime environment of the plugin
virtual void Stop() noexcept {}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

记得用clang-format把所有代码文件都格式化一遍(使用项目根目录的.clang-format配置的格式化规范)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里出现了不必要的空格

/// @brief destroy plugin internal resources
virtual void Destroy() noexcept {}

Expand Down
15 changes: 15 additions & 0 deletions trpc/metrics/prometheus/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,16 @@ filegroup(
]),
)

cc_library(
name = "prometheus_pusher",
srcs = ["prometheus_pusher.cc"],
hdrs = ["prometheus_pusher.h"],
deps = [
"//trpc/util/log:logging",
"@com_github_jupp0r_prometheus_cpp//push",
],
)

cc_library(
name = "prometheus_conf",
srcs = ["prometheus_conf.cc"],
Expand Down Expand Up @@ -73,15 +83,20 @@ cc_library(
":prometheus_conf",
":prometheus_conf_parser",
"//trpc/util:prometheus",
"@com_github_jupp0r_prometheus_cpp//core",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不必要的换行,BUILD文件使用 buildifier 格式化一下

"//trpc/common/config:trpc_config",
":prometheus_pusher",
"//trpc/metrics",
] + select({
"//conditions:default": [],
"//trpc:trpc_include_prometheus": [
"@com_github_jupp0r_prometheus_cpp//pull",
"@com_github_jupp0r_prometheus_cpp//push",
],
"//trpc:include_metrics_prometheus": [
"@com_github_jupp0r_prometheus_cpp//pull",
"@com_github_jupp0r_prometheus_cpp//push",
],
}),
)
Expand Down
Loading
Loading