Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support dns parse in stream subsystem #7733

Closed
NJLOVER opened this issue Aug 18, 2022 · 4 comments · Fixed by #8500
Closed

feat: support dns parse in stream subsystem #7733

NJLOVER opened this issue Aug 18, 2022 · 4 comments · Fixed by #8500
Assignees
Labels
enhancement New feature or request

Comments

@NJLOVER
Copy link

NJLOVER commented Aug 18, 2022

Current Behavior

i add a stream_route with host gatorcloud-pg.postgres-system.svc.cluster.local( first i use gatorcloud-pg.postgres-system, it doesn't work too )
{
"server_port": 9102,
"upstream": {
"type": "roundrobin",
"nodes": {
"gatorcloud-pg.postgres-system.svc.cluster.local:5432": 1
}
}
}

the error log is:

2022/08/18 07:46:16 [error] 45#45: *13383 stream [lua] resolver.lua:47: parse_domain(): failed to parse domain: gatorcloud-pg.postgres-system.svc.cluster.local, error: failed to query the DNS server: dns client error: 101 empty record received while prereading client data, client: 10.233.102.128, server: 0.0.0.0:9103
2022/08/18 07:46:16 [error] 45#45: *13383 stream [lua] upstream.lua:79: parse_domain_for_nodes(): dns resolver domain: gatorcloud-pg.postgres-system.svc.cluster.local error: failed to query the DNS server: dns client error: 101 empty record received while prereading client data, client: 10.233.102.128, server: 0.0.0.0:9103
2022/08/18 07:46:16 [error] 45#45: *13383 stream [lua] init.lua:942: stream_preread_phase(): failed to set upstream: no valid upstream node while prereading client data, client: 10.233.102.128, server: 0.0.0.0:9103

the same dns for route, it is ok.

{
"uri": "/test",
"methods": ["GET","POST","PUT","DELETE"],
"priority": 10000112,
"upstream": {
"type": "roundrobin",
"scheme": "http",
"nodes": {
"gatorcloud-pg.postgres-system.svc.cluster.local:5432": 1
}
}
}

2022/08/18 07:53:31 [error] 46#46: *67040 upstream prematurely closed connection while reading response header from upstream, client: 10.233.102.128, server: _, request: "GET /test HTTP/1.1", upstream: "http://10.233.19.207:5432/test", host: "172.30.3.230"

this log mean "gatorcloud-pg.postgres-system.svc.cluster.local:5432" resolve to 10.233.19.207:5432 in lv4.

Expected Behavior

dns resolve error

Error Logs

No response

Steps to Reproduce

use stream_route with a host upstream, then dns resolve error

Environment

apache/apisix:2.14.1-alpine

@NJLOVER
Copy link
Author

NJLOVER commented Aug 18, 2022

it is in k8s env. the host is a serviceName.

@tzssangglass
Copy link
Member

pls use text not images.

@tzssangglass tzssangglass self-assigned this Aug 18, 2022
@tzssangglass
Copy link
Member

tzssangglass commented Aug 19, 2022

First, show the conclusion: the lib lua-resty-dns-client we used now is not supported to run in the stream subsystem.

https://github.com/Kong/lua-resty-dns-client/blob/master/src/resty/dns/client.lua#L829-L860

  local supported_semaphore_wait_phases = {
    rewrite = true,
    access = true,
    content = true,
    timer = true,
    ssl_cert = true,
    ssl_session_fetch = true,
  }


  local ngx_phase = get_phase()


  if not supported_semaphore_wait_phases[ngx_phase] then
    -- phase not supported by `semaphore:wait`
    -- return existing query (item)
    --
    -- this will avoid:
    -- "dns lookup pool exceeded retries" (second try and subsequent retries)
    -- "API disabled in the context of init_worker_by_lua" (first try)
    return item, nil, try_list
  end


  -- block and wait for the async query to complete
  local ok, err = item.semaphore:wait(poolMaxWait)
  if ok and item.result then
    -- we were released, and have a query result from the
    -- other thread, so all is well, return it
    --[[
    log(DEBUG, PREFIX, "Query sync result: ", key, " ", fquery(item),
           " result: ", json({ result = item.result, err = item.err}))
    --]]
    return item.result, item.err, try_list
  end

When called in the http subsystem, ngx_phase is access, in supported_semaphore_wait_phases, and when called in the stream subsystem, ngx_phase is preread.

This causes the code to go through different branches of logic when run into if not supported_semaphore_wait_phases[ngx_phase] then, and the final dns parsing result, e.g.

This difference causes the lookup function in the stream subsystem to return records such as:

2022/08/19 23:17:31 [warn] 918740#918740: *163 stream [lua] client.lua:1247: resolve(): records : {
  key = "mix.sd.test.local:1",
  qname = "mix.sd.test.local",
  r_opts = {
    additional_section = true,
    qtype = 1
  },
  semaphore = {
    sem = cdata<struct ngx_stream_lua_sema_s *>: 0x7fc9373260b8,
    <metatable> = {
      __index = {
        count = <function 1>,
        new = <function 2>,
        post = <function 3>,
        version = "0.1.23",
        wait = <function 4>
      }
    }
  },
  try_list = { -- ["(short)mix.sd.test.local:(na) - cache-miss","mix.sd.test.local:1 - cache-miss/scheduled"]
     {
      msg = { -- cache-miss
         "cache-miss",
        <metatable> = <1>{
          __tostring = <function 5>
        }
      },
      qname = "(short)mix.sd.test.local"
    }, {
      msg = { -- cache-miss/scheduled
         "cache-miss", "scheduled",
        <metatable> = <table 1>
      },
      qname = "mix.sd.test.local",
      qtype = 1
    },
    ["(short)mix.sd.test.local:nil"] = 1,
    ["mix.sd.test.local:1"] = 2,
    <metatable> = {
      __tostring = <function 6>
    }
  }
} while prereading client data, client: 127.0.0.1, server: 0.0.0.0:9100

and in the http subsystem to return records such as:

2022/08/19 23:17:08 [warn] 918434#918434: *115 [lua] client.lua:1247: resolve(): records : { {
    address = "127.0.0.1",
    class = 1,
    name = "mix.sd.test.local",
    section = 1,
    ttl = 3600,
    type = 1
  },

@tzssangglass tzssangglass changed the title bug: use stream_route with dns error feat: support dns parse in stream subsystem Aug 19, 2022
@tzssangglass
Copy link
Member

tzssangglass commented Aug 19, 2022

In my test, I hacked supported_semaphore_wait_phases like:

  local supported_semaphore_wait_phases = {
    rewrite = true,
    access = true,
    content = true,
    timer = true,
    ssl_cert = true,
    ssl_session_fetch = true,
    preread = true,
  }

and it works well.

Have ask about this plan: Kong/lua-resty-dns-client#144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants