Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: endpoint choose by health check #102

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 84 additions & 0 deletions cluster_health_check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
Etcd Cluster Health Check
========

Synopsis
========

```nginx
http {
# required declares a shared memory zone to store endpoints's health status
lua_shared_dict healthcheck_shm 1m;

server {
tzssangglass marked this conversation as resolved.
Show resolved Hide resolved
location = /healthcheck {
content_by_lua_block {
local etcd, err = require "resty.etcd" .new({
protocol = "v3",
http_host = {
"http://127.0.0.1:12379",
"http://127.0.0.1:22379",
"http://127.0.0.1:32379",
},
user = 'root',
password = 'abc123',

# the health check feature is optional, and can be enabled with the following configuration.
health_check = {
shm_name = 'healthcheck_shm',
fail_timeout = 1,
max_fails = 1,
}
})
}
}
}
}
```

Description
========

Implement a passive health check mechanism, when the connection/read/write fails occurs, recorded as an endpoint' failure.

In a `fail_timeout`, if there are `max_fails` consecutive failures, the endpoint is marked as unhealthy, the unhealthy endpoint will not be choosed to connect for a `fail_timeout` time in the future.

Health check mechanism would switch endpoint only when the previously choosed endpoint is marked as unhealthy.

Config
========

The default configuration is as follows:

```lua
health_check = {
shm_name = "healthcheck_shm",
fail_timeout = 1,
max_fails = 1,
}
```

when use `require "resty.etcd" .new` to create a connection, you can override the default configuration like
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when use require("resty.eycd").new


```lua
local etcd, err = require "resty.etcd" .new({
protocol = "v3",
http_host = {
"http://127.0.0.1:12379",
"http://127.0.0.1:22379",
"http://127.0.0.1:32379",
},
user = 'root',
password = 'abc123',
health_check = {
tzssangglass marked this conversation as resolved.
Show resolved Hide resolved
shm_name = "etcd_cluster_health_check",
fail_timeout = 3,
max_fails = 2,
},
})
```

configurations that are not overridden will use the default configuration.

- `shm_name`: the declarative `lua_shared_dict` is used to store the health status of endpoints.
- `fail_timeout`: sets the time during which a number of failed attempts must happen for the endpoint to be marked unavailable, and also the time for which the endpoint is marked unavailable(default is 10 seconds).
- `max_fails`: sets the number of failed attempts that must occur during the `fail_timeout` period for the endpoint to be marked unavailable (default is 1 attempt).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to document how we count the failure in per worker + per endpoint level. The counter is independent between different etcd clients.

12 changes: 12 additions & 0 deletions lib/resty/etcd.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ local utils = require("resty.etcd.utils")
local typeof = require("typeof")
local require = require
local pcall = pcall
local ngx_shared = ngx.shared
local prefix_v3 = {
["3.5."] = "/v3",
["3.4."] = "/v3",
Expand Down Expand Up @@ -46,6 +47,17 @@ function _M.new(opts)
return nil, 'opts must be table'
end

-- health_check has value means enable etcd cluster health check
if opts.health_check then
local shared_dict = ngx_shared[opts.health_check.shm_name]
tzssangglass marked this conversation as resolved.
Show resolved Hide resolved
if not shared_dict then
return nil, "failed to get ngx.shared dict: " .. opts.health_check.shm_name
end

opts.health_check.fail_timeout = opts.health_check.fail_timeout or 10
opts.health_check.max_fails = opts.health_check.max_fails or 1
end

opts.timeout = opts.timeout or 5 -- 5 sec
opts.http_host = opts.http_host or "http://127.0.0.1:2379"
opts.ttl = opts.ttl or -1
Expand Down
8 changes: 8 additions & 0 deletions lib/resty/etcd/utils.lua
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ end
local ngx_log = ngx.log
local ngx_ERR = ngx.ERR
local ngx_INFO = ngx.INFO
local ngx_WARN = ngx.WARN
local function log_error(...)
return ngx_log(ngx_ERR, ...)
end
Expand All @@ -95,6 +96,13 @@ local function log_info( ... )
end
_M.log_info = log_info


local function log_warn( ... )
return ngx_log(ngx_WARN, ...)
end
_M.log_warn = log_warn


local function verify_key(key)
if not key or #key == 0 then
return false, "key should not be empty"
Expand Down
Loading