Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service center 2.1.0 的实例注销接口失效, 并且心跳超时机制不符合预期 #1415

Open
yhs0092 opened this issue Jul 1, 2023 · 1 comment

Comments

@yhs0092
Copy link
Member

yhs0092 commented Jul 1, 2023

Describe the bug
Service Center 2.1.0 版本的实例注销接口失效, 导致Java-Chassis框架开发的微服务做完优雅下线后, 实例还能从sc中查询得到. 并且如果将微服务的心跳时间间隔调短, sc的心跳失败自动下线实例的机制并不会对应缩短下线实例的时间, 导致业务仍然需要等待约 120 秒才会看到实例记录消失.(等待120秒下线心跳一直失败的实例是默认配置下的行为)

To Reproduce

  1. service center 2.1.0release note中下载linux-amd64版本的软件包, 并部署.
  2. 用Java-Chassis框架开发一个微服务注册到sc, 然后停止微服务, 触发Java-Chassis框架的优雅下线机制.
  3. 调用 sc 的接口, 查询实例, 观察实例记录在什么时候消失.

Expected behavior
预期在 Java-Chassis 打出日志表示实例已注销之后, 就无法从 sc 查到对应的实例, 但是实际上实例消失的时间要远远晚于实例注销的时间点. (从 sc-frontend 页面上看也是这样)

  • 问题现象1:

    Java-Chassis框架注销实例记录的日志显示为 17:43:47, 但是我使用shell脚本编写一个 while 循环, 每秒钟查询一次这个实例, 一直等到 17:45:44 实例才消失, 说明 Java-Chassis 调用实例注销接口并没有效果.

  • 问题现象2:

    注意看截图查出来的实例记录数据, 我配置了 healthCheck.interval = 5, healthCheck.time = 3, 理论上讲最迟心跳失败 20 秒后实例应该就会由于连续心跳失败而下线, 但实际上 sc 等了 118 秒才下线实例, 这个结果接近于默认的心跳配置(healthCheck.interval = 30, healthCheck.time = 3), 也就是似乎Java-Chassis配置了心跳失败间隔也没生效?

Java-Chassis框架的优雅停机日志:
image

sc的接口查询记录(使用脚本while [[ true ]]; do date && curl 'http://localhost:30100/v4/default/registry/instances?appId=nuwa-sdk-benchmark&serviceName=edge&global=true&version=0.0.0.0%2B&env=development' && echo '' && sleep 1 ; done):

Sat Jul  1 17:45:43 CST 2023
{"instances":[{"instanceId":"495ecf26ec904f07b4ad67c4ec3af28a","serviceId":"67cc7611960001018d36ef95288fd803ce35a2d7","endpoints":["rest://127.0.0.1:31000"],"hostName":"wuhpnuwa000002","status":"DOWN","healthCheck":{"mode":"push","interval":5,"times":3},"timestamp":"1688204588","modTimestamp":"1688204627","version":"3.0.0.101"}]}
Sat Jul  1 17:45:44 CST 2023
{"instances":[{"instanceId":"495ecf26ec904f07b4ad67c4ec3af28a","serviceId":"67cc7611960001018d36ef95288fd803ce35a2d7","endpoints":["rest://127.0.0.1:31000"],"hostName":"wuhpnuwa000002","status":"DOWN","healthCheck":{"mode":"push","interval":5,"times":3},"timestamp":"1688204588","modTimestamp":"1688204627","version":"3.0.0.101"}]}
Sat Jul  1 17:45:45 CST 2023
{}
Sat Jul  1 17:45:47 CST 2023
{}
Sat Jul  1 17:45:48 CST 2023
{}
Sat Jul  1 17:45:49 CST 2023
{}

Platform And Runtime (please complete the following information):

使用 2.1.0 版本 Service Center, 使用 Java-Chassis 1.3.11 开发微服务.

SC版本:

$ curl 'http://localhost:30100/v4/default/registry/version'
{"version":"2.1.0","buildTag":"20220314220818.2.1.0.9ecab25a","goVersion":"go1.15.1","os":"linux","arch":"amd64","apiVersion":"4.0.0"

Additional context

我尝试过维持 Java-Chassis 微服务一直运行, 然后额外调用 sc 的实例注销接口, 发现 Java-Chassis 框架会报心跳失败, 没有实例. 而且查询实例
日志如下:

2023-07-01 17:43:03.396 [Service Center Task [1]] WARN  - [ServiceRegistryClientImpl.java:heartbeat:652] - [] - Bad Request
2023-07-01 17:43:03.396 [Service Center Task [1]] ERROR - [MicroserviceInstanceHeartbeatTask.java:heartbeat:79] - [] - Update heartbeat to service center failed, microservice instance=67cc7611960001018d36ef95288fd803ce35a2d7/495ecf26ec904f07b4ad67c4ec3af28a does not exist
2023-07-01 17:43:03.397 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onMicroserviceInstanceHeartbeatTask:76] - [] - read MicroserviceInstanceHeartbeatTask status is READY
2023-07-01 17:43:03.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:onMicroserviceInstanceHeartbeatTask:61] - [] - read MicroserviceInstanceHeartbeatTask status is READY
2023-07-01 17:43:08.396 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:doRegister:78] - [] - running microservice register task.
2023-07-01 17:43:08.396 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:doRegister:86] - [] - Microservice exists in service center, no need to register. id=[67cc7611960001018d36ef95288fd803ce35a2d7] appId=[nuwa-sdk-benchmark], name=[edge], version=[3.0.0.101], env=[development]
2023-07-01 17:43:08.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:checkSchemaIdSet:149] - [] - SchemaIds are equals to service center. serviceId=[67cc7611960001018d36ef95288fd803ce35a2d7], appId=[nuwa-sdk-benchmark], name=[edge], version=[3.0.0.101], env=[development], schemaIds=[metricsEndpoint, healthEndpoint]
2023-07-01 17:43:08.397 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:registerSchema:194] - [] - schemaId [metricsEndpoint] exists [true], summary exists [true]
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [MicroserviceRegisterTask.java:registerSchema:194] - [] - schemaId [healthEndpoint] exists [true], summary exists [true]
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onRegisterTask:64] - [] - read MicroserviceRegisterTask status is FINISHED
2023-07-01 17:43:08.398 [Service Center Task [1]] INFO  - [MicroserviceInstanceRegisterTask.java:doRegister:59] - [] - running microservice instance register task.
2023-07-01 17:43:08.406 [Service Center Task [1]] INFO  - [MicroserviceInstanceRegisterTask.java:doRegister:81] - [] - Register microservice instance success. microserviceId=67cc7611960001018d36ef95288fd803ce35a2d7 instanceId=495ecf26ec904f07b4ad67c4ec3af28a endpoints=[rest://127.0.0.1:31000] lease 20s
2023-07-01 17:43:08.407 [Service Center Task [1]] INFO  - [ServiceCenterTask.java:onRegisterTask:64] - [] - read MicroserviceInstanceRegisterTask status is FINISHED
2023-07-01 17:43:08.407 [Service Center Task [1]] INFO  - [MicroserviceInstanceStatusSyncTask.java:onMicroserviceRegisterTask:40] - [] - start synchronizing instance status

心跳接口能报失败, 但是用v4.yaml接口里的 find 接口却能查到实例:
https://github.com/apache/servicecomb-service-center/blob/master/docs/openapi/v4.yaml#L1165

就像是 sc 的接口内部数据不一致一样.

@colin-si
Copy link

colin-si commented Jan 2, 2024

@yhs0092 该问题是否还存在呢?是否有ServiceCenter的日志信息呢,需要从日志上确认下SC的状态。需要确认下是触发了defer instance机制,当在2秒内有大量的实例下线的时候,就会被SC延迟下线。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants