fix: mostly_healthy is healthy #10639

Sn0rt · 2023-12-13T04:20:56Z

Description

build a mostly_healthy test case based on test-nginx is too difficult for me. so this PR not include test case

I will add the test case form this metric at the next PR

Checklist

I have explained the need for this PR and the problem it solves
I have explained the changes or the new features added to this PR
I have added tests corresponding to this change
I have updated the documentation to reflect this change
I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

luoluoyuyu · 2023-12-13T07:20:24Z

Hi @Sn0rt
The APISIX documentation describes the state transition process for health checks, which have a total of 4 states, and the prometheus plugin doesn't take into account the mostly_healty and mostly_unhealty states. I think this can be solved by adding mostly_healty and mostly_unhealty states to the Prometheus indicator. Instead of treating mostly_healty as a health state.

Sn0rt · 2023-12-13T07:34:33Z

Hi @Sn0rt The APISIX documentation describes the state transition process for health checks, which have a total of 4 states, and the prometheus plugin doesn't take into account the mostly_healty and mostly_unhealty states. I think this can be solved by adding mostly_healty and mostly_unhealty states to the Prometheus indicator. Instead of treating mostly_healty as a health state.

The reason why mostly_healthy is regarded as health is because when upstream selects a node, both mostly_healthy and healthy are regarded as health. The core logic reference screenshot is from resty-helath-check. For this reason, the PR is written like this.

local function fetch_health_nodes(upstream, checker)
    local nodes = upstream.nodes
    if not checker then
        local new_nodes = core.table.new(0, #nodes)
        for _, node in ipairs(nodes) do
            new_nodes = transform_node(new_nodes, node)
        end
        return new_nodes
    end

    local host = upstream.checks and upstream.checks.active and upstream.checks.active.host
    local port = upstream.checks and upstream.checks.active and upstream.checks.active.port
    local up_nodes = core.table.new(0, #nodes)
    for _, node in ipairs(nodes) do
        local ok, err = checker:get_target_status(node.host, port or node.port, host)
        if ok then
            up_nodes = transform_node(up_nodes, node)
        elseif err then
            core.log.warn("failed to get health check target status, addr: ",
                node.host, ":", port or node.port, ", host: ", host, ", err: ", err)
        end
    end

    if core.table.nkeys(up_nodes) == 0 then
        core.log.warn("all upstream nodes is unhealthy, use default")
        for _, node in ipairs(nodes) do
            up_nodes = transform_node(up_nodes, node)
        end
    end

    return up_nodes
end

luoluoyuyu · 2023-12-13T07:48:42Z

@snort Thanks for the answer, the corresponding code link is here https://github.com/api7/lua-resty-healthcheck/blob/master/lib/resty/healthcheck.lua#L684

apisix/upstream.lua

Signed-off-by: Sn0rt <wangguohao.2009@gmail.com>

shreemaan-abhishek · 2023-12-18T14:04:12Z

I agree with @luoluoyuyu's #10639 (comment). Even if logically mostly_healthy means healthy... the naming serves a purpose to the admin. They can judge that some nodes are unhealthy.

Sn0rt · 2023-12-18T14:15:28Z

I agree with @luoluoyuyu's #10639 (comment). Even if logically mostly_healthy means healthy... the naming serves a purpose to the admin. They can judge that some nodes are unhealthy.

This is a bugfix. I want to avoid doing meaningless things without clear requirements.
and the user can get the node status is down if get the below metric

apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="8765"} 1
apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="8766"} 0

Sn0rt marked this pull request as draft December 13, 2023 05:37

Sn0rt marked this pull request as ready for review December 13, 2023 07:50

luoluoyuyu approved these changes Dec 13, 2023

View reviewed changes

Sn0rt marked this pull request as draft December 14, 2023 03:26

Sn0rt marked this pull request as ready for review December 14, 2023 04:14

soulbird reviewed Dec 15, 2023

View reviewed changes

apisix/upstream.lua Outdated Show resolved Hide resolved

fix: mostly_healthy is healthy

cb880bb

Signed-off-by: Sn0rt <wangguohao.2009@gmail.com>

Sn0rt requested review from soulbird and luoluoyuyu December 18, 2023 12:37

AlinsRan approved these changes Dec 19, 2023

View reviewed changes

monkeyDluffy6017 approved these changes Dec 21, 2023

View reviewed changes

monkeyDluffy6017 merged commit b9d2dbf into apache:master Dec 21, 2023
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: mostly_healthy is healthy #10639

fix: mostly_healthy is healthy #10639

Sn0rt commented Dec 13, 2023 •

edited

Loading

luoluoyuyu commented Dec 13, 2023

Sn0rt commented Dec 13, 2023

luoluoyuyu commented Dec 13, 2023

shreemaan-abhishek commented Dec 18, 2023

Sn0rt commented Dec 18, 2023 •

edited

Loading

fix: mostly_healthy is healthy #10639

fix: mostly_healthy is healthy #10639

Conversation

Sn0rt commented Dec 13, 2023 • edited Loading

Description

Checklist

luoluoyuyu commented Dec 13, 2023

Sn0rt commented Dec 13, 2023

luoluoyuyu commented Dec 13, 2023

shreemaan-abhishek commented Dec 18, 2023

Sn0rt commented Dec 18, 2023 • edited Loading

Sn0rt commented Dec 13, 2023 •

edited

Loading

Sn0rt commented Dec 18, 2023 •

edited

Loading