-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: endpoint choose by health check #102
Conversation
@tzssangglass CI failed. and you need to resolve conflicts first. thanks. |
this PR is working in process, the PR's title can not pass the Semantic check. I'd like to involve you in ahead to check if this implementation is the right idea. |
got the Conflicting files, will resolve conflicts. |
…IssueNo101 Conflicts: lib/resty/etcd/v3.lua
2. add test cases 3. add doc 4. solve code review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice PR, the current way is simpler than before ^_^
need more test cases to confirm it can work fine
note: save choosed endpoint into self, the changes are significant, please pay attention |
e67dc53
to
7b2abb0
Compare
revert this, conflicting implementations, ignore. |
unhealthy endpoint trigger by different etcd client configurations
need to support the V2 protocol? |
we can create a new issue about this feature |
|
@@ -29,8 +32,57 @@ local mt = { __index = _M } | |||
|
|||
-- define local refresh function variable | |||
local refresh_jwt_token | |||
local fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a global variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oh, I made a mistake, I thought this was a module variable, I want to define a worker level variable, can I only use lua-resty-lrucache
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a module variable. But the fails
is used like a local variable. Each time in report_fault
, a value is assigned to it.
end | ||
end | ||
utils.log_error("has no health etcd endpoint") | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we don't check if choose_endpoint
returns nil
, this change will cause an error to throw. It is bad to use throwing error as a control flow. Although APISIX captures the error, other users may not do this.
We should return nil, err
here and check it outside of choose_endpoint
.
|
||
- `shm_name`: the declarative `lua_shared_dict` is used to store the health status of endpoints. | ||
- `fail_timeout`: sets the time during which a number of failed attempts must happen for the endpoint to be marked unavailable, and also the time for which the endpoint is marked unavailable(default is 10 seconds). | ||
- `max_fails`: sets the number of failed attempts that must occur during the `fail_timeout` period for the endpoint to be marked unavailable (default is 1 attempt). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to document how we count the failure in per worker + per endpoint
level. The counter is independent between different etcd clients.
|
||
local function report_fault(self, endpoint) | ||
utils.log_info("report an endpoint failure: ", endpoint.http_host) | ||
local key = worker_id() .. "-" .. endpoint.http_host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to add an obvious prefix to the key.
BTW, when the code is running in privileged agent, worker_id()
will be nil.
I think we can store the status without worker id. |
end | ||
|
||
utils.log_info("restore an endpoint to health: ", endpoint.http_host) | ||
endpoint.health_status = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use warn log when we change the health_status. It could be helpful when we need to do accounting.
There is a side effect when we share the counter. The actually try time will be divided by the number of workers. If the workers' number increases, the retry change decreases. |
} | ||
``` | ||
|
||
when use `require "resty.etcd" .new` to create a connection, you can override the default configuration like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when use require("resty.eycd").new
work on a new branch |
fix #101
fix #55