Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support healthcheck when connect to etcd cluster #96

Closed
wants to merge 31 commits into from

Conversation

tzssangglass
Copy link
Contributor

@tzssangglass tzssangglass commented Nov 13, 2020

fix: #55

@tzssangglass tzssangglass changed the title [WIP] support healthcheck when connect to etcd cluster [WIP] feat: support healthcheck when connect to etcd cluster Nov 13, 2020
@tzssangglass tzssangglass changed the title [WIP] feat: support healthcheck when connect to etcd cluster [WIP] feat:support healthcheck when connect to etcd cluster Nov 13, 2020
@tzssangglass tzssangglass changed the title [WIP] feat:support healthcheck when connect to etcd cluster [WIP] feat: support healthcheck when connect to etcd cluster Nov 13, 2020
@tzssangglass
Copy link
Contributor Author

tzssangglass commented Nov 15, 2020

ping @membphis @spacewander
How to write test cases for errors of etcd cluster runtime, such as network partitioning or etcd node server unavailabe?
I don't have a clear idea.
I try as follows:

=== TEST 2: connect timeout
--- http_config eval: $::HttpConfig
--- config

    resolver 8.8.8.8;
    resolver_timeout 1s;

    location /t {
        content_by_lua_block {
            local etcd, err = require "resty.etcd" .new({
                protocol = "v3",
                http_host = {
                    "http://127.0.0.1:12379",
                    "http://127.0.0.1:22379",
                    "http://127.0.0.1:32379",
                },
                user = 'root',
                password = 'abc123',
                cluster_healthcheck = {
                    shm_name = 'test_shm',
                }
            })

            check_res(etcd, err)
            local network_isolation_cmd = "iptables -A INPUT -i lo -s 127.0.0.1 -d 127.0.0.1 -p tcp --dport 12379 -j DROP"
            os.execute(network_isolation_cmd)

            local res, err = etcd:set("/test", { a='abc'})
            check_res(res, err)

            local network_recovery_cmd = "iptables -D INPUT -i lo -s 127.0.0.1 -d 127.0.0.1 -p tcp --dport 12379 -j DROP"
            os.execute(network_recovery_cmd)

            ngx.say(err)
        }
    }
--- request
GET /t
--- no_error_log
[error]
--- response_body
timeout

I'm not sure it's appropriate to write test cases this way.

@spacewander
Copy link
Contributor

I don't have any idea better than yours.

@tzssangglass tzssangglass changed the title [WIP] feat: support healthcheck when connect to etcd cluster feat: support healthcheck when connect to etcd cluster Nov 24, 2020
@tzssangglass
Copy link
Contributor Author

ping @membphis it'ok.

@tzssangglass
Copy link
Contributor Author

I tried the following two test cases, which worked in my local environment, but always failed in CI.

=== TEST 6: mock tcp connect timeout and recovery, report the node unhealthy and health
--- http_config eval: $::HttpConfig
--- config
    location /t {
        content_by_lua_block {
            local network_isolation_cmd = "export PATH=$PATH:/sbin && iptables -A INPUT -p tcp --dport 12379 -j DROP"
            io_opopen(network_isolation_cmd)
            ngx.sleep(1)

            local etcd, err = require "resty.etcd" .new({
                protocol = "v3",
                api_prefix = "/v3",
                http_host = {
                    "http://127.0.0.1:12379",
                    "http://127.0.0.1:22379",
                    "http://127.0.0.1:32379",
                },
                user = 'root',
                password = 'abc123',
                cluster_healthcheck = {
                    shm_name = 'test_shm',
                },
            })

            local res, err = etcd:set("/healthcheck", "yes")

            local network_recovery_cmd = "export PATH=$PATH:/sbin && iptables -D INPUT -p tcp --dport 12379 -j DROP"
            io_opopen(network_recovery_cmd)
            ngx.sleep(1)
        }
    }
--- request
GET /t
--- ignore_response
--- error_log eval
[qr/unhealthy TCP increment.*127.0.0.1:12379/,
qr/healthy SUCCESS increment.*127.0.0.1:12379/]
--- timeout: 10



=== TEST 7: mock network partition and recovery, report the node unhealthy and health
--- http_config eval: $::HttpConfig
--- config
    location /t {
        content_by_lua_block {
            io_opopen("export PATH=$PATH:/sbin && iptables -A INPUT -p tcp --dport 22380 -j DROP")
            io_opopen("export PATH=$PATH:/sbin && iptables -A INPUT -p tcp --dport 32380 -j DROP")
            ngx.sleep(3)

            local etcd, err = require "resty.etcd" .new({
                protocol = "v3",
                api_prefix = "/v3",
                http_host = {
                    "http://127.0.0.1:12379",
                    "http://127.0.0.1:22379",
                    "http://127.0.0.1:32379",
                },
                user = 'root',
                password = 'abc123',
                cluster_healthcheck = {
                    shm_name = 'test_shm',
                },
            })

            local res, err = etcd:set("/network/partition", "test")

            io_opopen("export PATH=$PATH:/sbin && iptables -D INPUT -p tcp --dport 22380 -j DROP")
            io_opopen("export PATH=$PATH:/sbin && iptables -D INPUT -p tcp --dport 32380 -j DROP")
            ngx.sleep(5)
        }
    }
--- request
GET /t
--- timeout: 20
--- ignore_response
--- error_log eval
[qr/unhealthy TCP increment.*127.0.0.1:12379/,
qr/healthy SUCCESS increment.*127.0.0.1:12379/]

@tzssangglass
Copy link
Contributor Author

@membphis @spacewander @nic-chen pls review

@spacewander
Copy link
Contributor

We are busy making a new release. Will take care about this a few days later.

@tzssangglass
Copy link
Contributor Author

We are busy making a new release. Will take care about this a few days later.

get

lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
lib/resty/etcd.lua Outdated Show resolved Hide resolved
@tzssangglass
Copy link
Contributor Author

solve above

lib/resty/etcd.lua Outdated Show resolved Hide resolved
lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
@tzssangglass
Copy link
Contributor Author

solve, I've made some changes to change the print error log to return err, pls take note

lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
cluster_healthcheck.md Outdated Show resolved Hide resolved
lib/resty/etcd/cluster/healthcheck.lua Outdated Show resolved Hide resolved
@tzssangglass
Copy link
Contributor Author

@membphis @spacewander pls review again

end

local err
checker, err = healthcheck.new({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I think that is the wrong way.

different etcd instance can create different checker object, the checker object should belong to the etcd instance.

so we can not use a shared checker for different etcd instance.

@tzssangglass tzssangglass deleted the IssueNo55 branch January 30, 2021 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support healthcheck when connect to etcd cluster
3 participants