Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] HStore Spring Actuator Metrics Sink Initialization once causes missing metrics #2603

Open
1 task done
JackyYangPassion opened this issue Jul 29, 2024 · 1 comment · May be fixed by #2604
Open
1 task done
Labels
bug Something isn't working raft

Comments

@JackyYangPassion
Copy link
Contributor

Bug Type (问题类型)

logic (逻辑设计问题)

Before submit

  • 我已经确认现有的 IssuesFAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

  • Server Version: master
  • Backend: HStore

Expected & Actual behavior (期望与实际表现)

期望结果

通过Spring Actuator 接口获取正确的 JRaft 监控指标
curl http://ip:8620/actuator/prometheus

BUG 详情

HStore 节点初始化后,通过prometheus 定时拉取指标,此时会造成指标丢失
BUG根本原因是初始化仅一次,JRaft 有些埋点暂时还未注册,因此造成缺失
具体错误逻辑代码如下

@Configuration
public class MetricsConfig {

    @Bean
    public MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
        return (registry) -> registry.config().commonTags("hg", "store");
    }

    @Bean
    public MeterRegistryCustomizer<MeterRegistry> registerMeters() {
        return (registry) -> {
            StoreMetrics.init(registry);
            RocksDBMetrics.init(registry);
            JRaftMetrics.init(registry);
            ProcfsMetrics.init(registry);
            GRpcExMetrics.init(registry);
        };
    }

}

具体指标

append-logs
fsm-apply-tasks
fsm-commit
fsm-leader-stop
fsm-snapshot-load
fsm-snapshot-save
fsm-start-following
fsm-stop-following
handle-append-entries
handle-heartbeat-requests
pre-vote
request-vote
replicate-entries
save-raft-meta
truncate-log-prefix

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

@JackyYangPassion JackyYangPassion added the bug Something isn't working label Jul 29, 2024
@dosubot dosubot bot added the raft label Jul 29, 2024
@JackyYangPassion JackyYangPassion linked a pull request Jul 29, 2024 that will close this issue
11 tasks
Copy link

dosubot bot commented Jul 29, 2024

To resolve the issue with missing JRaft metrics during the HStore Spring Actuator Metrics Sink initialization, you need to ensure that the metrics are registered in time during the initial setup. The provided code already includes a mechanism to register metrics, but it might be missing the timing or sequence to ensure all metrics are registered before they are accessed.

Here is a step-by-step approach to ensure metrics are registered properly:

  1. Ensure init is called early: Make sure the JRaftMetrics.init(meterRegistry) method is called early in your application startup process, before any metrics are accessed.

  2. Check for null or empty metrics map: The getRaftGroupMetrics method returns an empty map if the metrics are not available. Ensure that the metrics are available before proceeding with registration.

  3. Force registration if metrics are not available initially: You can add a retry mechanism to attempt registration if the metrics are not available initially.

Here is an updated version of the init method with a retry mechanism:

public synchronized static void init(MeterRegistry meterRegistry) {
    if (registry == null) {
        registry = meterRegistry;
        registerMeters();
        // Retry mechanism to ensure metrics are registered
        int retries = 5;
        while (retries-- > 0 && groupSet.isEmpty()) {
            try {
                Thread.sleep(1000); // Wait for 1 second before retrying
                registerNodeMetrics();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            }
        }
    }
}

This code ensures that the registerNodeMetrics method is called multiple times with a delay, allowing time for the metrics to be available and registered properly. Adjust the number of retries and the delay as needed based on your application's startup time and metrics availability [1].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working raft
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant