feat: endpoint choose by health check #102

tzssangglass · 2020-12-06T14:43:33Z

fix #101
fix #55

tzssangglass · 2020-12-06T14:48:38Z

@membphis @nic-chen pls take a look if you have time

2. add test cases

moonming · 2020-12-08T00:36:49Z

Ping @nic-chen @membphis

nic-chen · 2020-12-08T04:11:07Z

@tzssangglass CI failed. and you need to resolve conflicts first. thanks.

lib/resty/etcd.lua

lib/resty/etcd/v3.lua

tzssangglass · 2020-12-08T06:16:15Z

@tzssangglass CI failed. and you need to resolve conflicts first. thanks.

this PR is working in process, the PR's title can not pass the Semantic check.

I'd like to involve you in ahead to check if this implementation is the right idea.

tzssangglass · 2020-12-08T06:16:53Z

got the Conflicting files, will resolve conflicts.

…IssueNo101 Conflicts: lib/resty/etcd/v3.lua

2. add test cases 3. add doc 4. solve code review

membphis

nice PR, the current way is simpler than before ^_^

need more test cases to confirm it can work fine

cluster_health_check.md

lib/resty/etcd/v3.lua

cluster_health_check.md

lib/resty/etcd/v3.lua

lib/resty/etcd/utils.lua

lib/resty/etcd/v3.lua

cluster_health_check.md

lib/resty/etcd/v3.lua

tzssangglass · 2020-12-09T17:49:28Z

note: save choosed endpoint into self, the changes are significant, please pay attention

cluster_health_check.md

tzssangglass · 2020-12-10T10:17:10Z

note: save choosed endpoint into self, the changes are significant, please pay attention

revert this, conflicting implementations, ignore.

unhealthy endpoint trigger by different etcd client configurations

tzssangglass · 2020-12-13T10:53:10Z

need to support the V2 protocol?

membphis · 2020-12-13T12:56:12Z

need to support the V2 protocol?

we can create a new issue about this feature

…sign

tzssangglass · 2020-12-13T16:55:27Z

note:

the fails count is shared in a worker and stored in a lua_shared_dict tagged with the worker id, with restore depending on their own "max_fails" and "fail_timeout "(reference test case no.8).
why use lua_shared_dict to store? because the init and init_ttl parameters of the incr function are suitable for counting the number of errors that occur at a given window time on a continuous timeline.

spacewander · 2020-12-14T00:53:20Z

lib/resty/etcd/v3.lua

@@ -29,8 +32,57 @@ local mt = { __index = _M }

 -- define local refresh function variable
 local refresh_jwt_token
+local fails


Why use a global variable here?

Oh oh, I made a mistake, I thought this was a module variable, I want to define a worker level variable, can I only use lua-resty-lrucache?

Yes, it is a module variable. But the fails is used like a local variable. Each time in report_fault, a value is assigned to it.

spacewander · 2020-12-14T01:03:00Z

lib/resty/etcd/v3.lua

+            end
+        end
+        utils.log_error("has no health etcd endpoint")
+        return nil


As we don't check if choose_endpoint returns nil, this change will cause an error to throw. It is bad to use throwing error as a control flow. Although APISIX captures the error, other users may not do this.

We should return nil, err here and check it outside of choose_endpoint.

spacewander · 2020-12-14T01:10:23Z

cluster_health_check.md

+
+- `shm_name`: the declarative `lua_shared_dict` is used to store the health status of endpoints.
+- `fail_timeout`: sets the time during which a number of failed attempts must happen for the endpoint to be marked unavailable, and also the time for which the endpoint is marked unavailable(default is 10 seconds).
+- `max_fails`: sets the number of failed attempts that must occur during the `fail_timeout` period for the endpoint to be marked unavailable (default is 1 attempt).


Would be better to document how we count the failure in per worker + per endpoint level. The counter is independent between different etcd clients.

spacewander · 2020-12-14T01:20:47Z

lib/resty/etcd/v3.lua

+
+local function report_fault(self, endpoint)
+    utils.log_info("report an endpoint failure: ", endpoint.http_host)
+    local key = worker_id() .. "-" .. endpoint.http_host


Better to add an obvious prefix to the key.
BTW, when the code is running in privileged agent, worker_id() will be nil.

membphis · 2020-12-14T03:38:40Z

stored in a lua_shared_dict tagged with the worker id @tzssangglass

I think we can store the status without worker id.
Then the status can be shared between the different worker processes.

spacewander · 2020-12-14T06:44:48Z

lib/resty/etcd/v3.lua

+        end
+
+        utils.log_info("restore an endpoint to health: ", endpoint.http_host)
+        endpoint.health_status = 1


Better to use warn log when we change the health_status. It could be helpful when we need to do accounting.

spacewander · 2020-12-14T06:46:13Z

stored in a lua_shared_dict tagged with the worker id @tzssangglass

I think we can store the status without worker id.
Then the status can be shared between the different worker processes.

There is a side effect when we share the counter. The actually try time will be divided by the number of workers. If the workers' number increases, the retry change decreases.

tokers · 2020-12-15T12:17:14Z

cluster_health_check.md

+}
+```
+
+when use `require "resty.etcd" .new` to create a connection, you can override the default configuration like


when use require("resty.eycd").new

tzssangglass · 2020-12-30T16:00:32Z

work on a new branch

[WIP] feature: endpoint choose by health check

1529774

tzssangglass added 3 commits December 6, 2020 23:45

fix error

c2cd9e3

add some test cases

1d67b9e

1. optimization code

5562167

2. add test cases

nic-chen reviewed Dec 8, 2020

View reviewed changes

lib/resty/etcd.lua Show resolved Hide resolved

lib/resty/etcd/v3.lua Outdated Show resolved Hide resolved

lib/resty/etcd/v3.lua Outdated Show resolved Hide resolved

nic-chen requested review from tokers, membphis and spacewander December 8, 2020 04:27

tzssangglass added 4 commits December 8, 2020 22:27

Merge branch 'master' of https://github.com/api7/lua-resty-etcd into …

bfa664f

…IssueNo101 Conflicts: lib/resty/etcd/v3.lua

1. resolve conflicts first

8a63a57

2. add test cases 3. add doc 4. solve code review

solve CI error

a209590

update doc

bc141bb

membphis requested changes Dec 9, 2020

View reviewed changes

cluster_health_check.md Outdated Show resolved Hide resolved

cluster_health_check.md Show resolved Hide resolved

lib/resty/etcd/v3.lua Outdated Show resolved Hide resolved

lib/resty/etcd/v3.lua Outdated Show resolved Hide resolved

tzssangglass added 3 commits December 9, 2020 11:26

solve code review

1837fab

keep etcd.lua code style

ef6ce28

optimized code style

a1ed237

membphis requested changes Dec 9, 2020

View reviewed changes

tokers reviewed Dec 9, 2020

View reviewed changes

tzssangglass added 3 commits December 9, 2020 22:35

save

8c5f8e2

solve code style

7c0fdc0

save

818057f

revert

7b2abb0

tzssangglass force-pushed the IssueNo101 branch from e67dc53 to 7b2abb0 Compare December 10, 2020 07:05

add test case

7790f96

tokers reviewed Dec 10, 2020

View reviewed changes

cluster_health_check.md Outdated Show resolved Hide resolved

fails count shared in worker, store in lua_shared_dict

7a65f69

unhealthy endpoint trigger by different etcd client configurations

tzssangglass changed the title ~~[WIP] feature: endpoint choose by health check~~ feature: endpoint choose by health check Dec 13, 2020

tzssangglass changed the title ~~feature: endpoint choose by health check~~ feat: endpoint choose by health check Dec 13, 2020

remove disable_duration, followed by nginx passive health checks de…

a44c264

…sign

spacewander suggested changes Dec 14, 2020

View reviewed changes

spacewander reviewed Dec 14, 2020

View reviewed changes

tokers reviewed Dec 15, 2020

View reviewed changes

tzssangglass closed this Dec 30, 2020

tzssangglass deleted the IssueNo101 branch January 30, 2021 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: endpoint choose by health check #102

feat: endpoint choose by health check #102

tzssangglass commented Dec 6, 2020 •

edited

Loading

tzssangglass commented Dec 6, 2020

moonming commented Dec 8, 2020

nic-chen commented Dec 8, 2020

tzssangglass commented Dec 8, 2020

tzssangglass commented Dec 8, 2020

membphis left a comment

tzssangglass commented Dec 9, 2020

tzssangglass commented Dec 10, 2020

tzssangglass commented Dec 13, 2020

membphis commented Dec 13, 2020

tzssangglass commented Dec 13, 2020

spacewander Dec 14, 2020

tzssangglass Dec 14, 2020

spacewander Dec 14, 2020

spacewander Dec 14, 2020

spacewander Dec 14, 2020

spacewander Dec 14, 2020

membphis commented Dec 14, 2020

spacewander Dec 14, 2020

spacewander commented Dec 14, 2020

tokers Dec 15, 2020

tzssangglass commented Dec 30, 2020 •

edited

Loading

feat: endpoint choose by health check #102

feat: endpoint choose by health check #102

Conversation

tzssangglass commented Dec 6, 2020 • edited Loading

tzssangglass commented Dec 6, 2020

moonming commented Dec 8, 2020

nic-chen commented Dec 8, 2020

tzssangglass commented Dec 8, 2020

tzssangglass commented Dec 8, 2020

membphis left a comment

Choose a reason for hiding this comment

tzssangglass commented Dec 9, 2020

tzssangglass commented Dec 10, 2020

tzssangglass commented Dec 13, 2020

membphis commented Dec 13, 2020

tzssangglass commented Dec 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

membphis commented Dec 14, 2020

Choose a reason for hiding this comment

spacewander commented Dec 14, 2020

Choose a reason for hiding this comment

tzssangglass commented Dec 30, 2020 • edited Loading

tzssangglass commented Dec 6, 2020 •

edited

Loading

tzssangglass commented Dec 30, 2020 •

edited

Loading