[Bug] [Datasophon-service] When the alarm is restored in AlertActor, the state modification logic is abnormal #402

thomasg19930417 · 2023-09-01T02:12:57Z

Search before asking

I had searched in the issues and found no similar issues.

What happened

Should the turntable here be updated when it is not running? If (roleInstance. getServiceRoleState()!= ServiceRoleState. RUNNING)
ClusterServiceRoleInstanceEntity roleInstance = roleInstanceService.getOneServiceRole(labels.getServiceRoleName(), hostname, clusterId);
if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) {
roleInstance.setServiceRoleState(ServiceRoleState.RUNNING);
if (nodeHasWarnAlertList) {
roleInstance.setServiceRoleState(ServiceRoleState.EXISTS_ALARM);
}
oleInstanceService.updateById(roleInstance);
}

What you expected to happen

Should the turntable here be updated when it is not running? If (roleInstance. getServiceRoleState()!= ServiceRoleState. RUNNING)
ClusterServiceRoleInstanceEntity roleInstance = roleInstanceService.getOneServiceRole(labels.getServiceRoleName(), hostname, clusterId);
if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) {
roleInstance.setServiceRoleState(ServiceRoleState.RUNNING);
if (nodeHasWarnAlertList) {
roleInstance.setServiceRoleState(ServiceRoleState.EXISTS_ALARM);
}
oleInstanceService.updateById(roleInstance);
}

How to reproduce

Should the turntable here be updated when it is not running? If (roleInstance. getServiceRoleState()!= ServiceRoleState. RUNNING)
ClusterServiceRoleInstanceEntity roleInstance = roleInstanceService.getOneServiceRole(labels.getServiceRoleName(), hostname, clusterId);
if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) {
roleInstance.setServiceRoleState(ServiceRoleState.RUNNING);
if (nodeHasWarnAlertList) {
roleInstance.setServiceRoleState(ServiceRoleState.EXISTS_ALARM);
}
oleInstanceService.updateById(roleInstance);
}

Anything else

Should the turntable here be updated when it is not running? If (roleInstance. getServiceRoleState()!= ServiceRoleState. RUNNING)
ClusterServiceRoleInstanceEntity roleInstance = roleInstanceService.getOneServiceRole(labels.getServiceRoleName(), hostname, clusterId);
if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) {
roleInstance.setServiceRoleState(ServiceRoleState.RUNNING);
if (nodeHasWarnAlertList) {
roleInstance.setServiceRoleState(ServiceRoleState.EXISTS_ALARM);
}
oleInstanceService.updateById(roleInstance);
}

Version

dev

Are you willing to submit PR?

Yes I am willing to submit a PR!

Code of Conduct

I agree to follow this project's Code of Conduct

datasophon · 2023-09-01T02:51:10Z

I'm not sure what issue you're trying to clarify. Can you elaborate on it

thomasg19930417 · 2023-09-01T02:56:51Z

dev 分支，当服务宕机告警恢复时，修改状态的逻辑应该有问题，如下代码这里应该是当前是非running状态才去修改状态为running
if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) {
roleInstance.setServiceRoleState(ServiceRoleState.RUNNING);
}

thomasg19930417 · 2023-09-01T02:58:39Z

dev 分支，当服务宕机告警恢复时，修改状态的逻辑应该有问题，如下代码这里应该是当前是非running状态才去修改状态为running if (roleInstance.getServiceRoleState() == ServiceRoleState.RUNNING) { roleInstance.setServiceRoleState(ServiceRoleState.RUNNING); }

这个会导致服务发送 resovled 告警时，无法将异常状态恢复到正常状态，我同步对比了之前版本的代码，这个地方应该是在改造的时候写错了吧

datasophon · 2023-09-01T03:00:49Z

We tested that it is possible to recover from an abnormal state to a normal state. How did the situation you mentioned occur

thomasg19930417 · 2023-09-01T03:03:44Z

如果从页面直接启停应该复现不了这个问题，服务停掉后，后台启动应该能复现问题（情况应该是机器负载高导致prometheus采集的时候异常后续正常的时候回送告警解除信息无法将状态重置为正常状态）

datasophon · 2023-09-01T03:21:02Z

按照你所描述的，我们复现了这个问题，你能帮我们解决它吗？
According to your description, we have reproduced this problem. Can you help us solve it?

thomasg19930417 · 2023-09-01T03:24:52Z

按照你所描述的，我们复现了这个问题，你能帮我们解决它吗？ According to your description, we have reproduced this problem. Can you help us solve it?

I will submit a PR later

* remove redundant initializer * Fix issues #402

thomasg19930417 · 2023-09-01T06:27:42Z

add pr link: #404

thomasg19930417 added the bug Something isn't working label Sep 1, 2023

datasophon assigned thomasg19930417 Sep 1, 2023

datasophon pushed a commit that referenced this issue Sep 1, 2023

Fix issue #402 (#404)

f82d346

* remove redundant initializer * Fix issues #402

thomasg19930417 closed this as completed Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [Datasophon-service] When the alarm is restored in AlertActor, the state modification logic is abnormal #402

[Bug] [Datasophon-service] When the alarm is restored in AlertActor, the state modification logic is abnormal #402

thomasg19930417 commented Sep 1, 2023 •

edited

Loading

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

[Bug] [Datasophon-service] When the alarm is restored in AlertActor, the state modification logic is abnormal #402

[Bug] [Datasophon-service] When the alarm is restored in AlertActor, the state modification logic is abnormal #402

Comments

thomasg19930417 commented Sep 1, 2023 • edited Loading

Search before asking

What happened

What you expected to happen

How to reproduce

Anything else

Version

Are you willing to submit PR?

Code of Conduct

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

datasophon commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023

thomasg19930417 commented Sep 1, 2023 •

edited

Loading