-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.7.1 upgrade to 3.9.0 migrations dns resolution issues (Postgresql 14) #14233
Comments
@dgresh1 just checking, is there any difference if you run it with:
|
@bungle wouldi put this in our primary kong deployment file and we also have a k8s_job spec where containers exist for kong-bootstrap, kong-migrations-up, and kong-migrations-finish? |
also, am i specifying on or is there anything we value need to be on kong migratons ? |
i did the new dns client and am stilling getting the same error message as before. |
@bungle we also are seeing this issue with 3.8.0 |
Probably root cause of these issues is same is the issues I detail in my git issue. Something funky going on with Kong with respect to how it does DNS lookups. |
Hi, we are having (maybe) the exact same issues- At some point, Kong just doesn't want to resolve the Postgres host anymore. What is funny is that even when the Kong goes into the CrashLoop and starts again, it doesn't work. The migrations still reply with failed to get create response: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read unix @->/var/run/tw.runc.sock: use of closed network connection"
Error: [PostgreSQL error] failed to retrieve PostgreSQL server_version_num: timeout and only then starts to work when we completely delete the pod and the deployments recreates is for us again. |
@lordgreg have not noticed that problem specifically just yet but does seem similar or related to the issues I have found. Feels like Kong isn't respecting DNS timeout settings and also for times where dns fails in 0-1ms sometimes in the logs makes me think as a client its trying to reuse stale sockets or something that it shouldn't be. Very strange behavior. Would have thought Kong's functional test suites would have caught something like this but may go deeper than that. Have you tried adding my DNS tunning and setting the attempts to 3? Seems to help some rn with our stuff. |
Is there an existing issue for this?
Kong version (
$ kong version
)kong 3.9.0
Current Behavior
when i use github actions to upgrade kong there are error messages related to
kong migrations list
kong migrations up
kong migrations finish
kong is running in hybrid mode and i am trying to upgrade our control plane.
this is not a bootstrap as we are upgrading to 3.9.0
if i exec into the pod running kong
Error: [PostgreSQL error] failed to retrieve PostgreSQL server_version_num: [cosocket] DNS resolution failed: DNS server error: failed to receive reply from UDP server 10.0.0.10:53: timeout, took 276 ms. Tried: [[“psql-hcp-apim-dmz-cp-dev-centralus.postgres.database.azure.com:A”,“DNS server error: failed to receive reply from UDP server 10.0.0.10:53: timeout, took 276 ms”]]
if i do an nslookup from the pod i do get resolution
nslookup psql-hcp-apim-dmz-cp-dev-centralus.postgres.database.azure.com
;; Got recursion not available from 10.0.0.10
;; Got recursion not available from 10.0.0.10
;; Got recursion not available from 10.0.0.10
;; Got recursion not available from 10.0.0.10
Server: 10.0.0.10
Address: 10.0.0.10#53
Non-authoritative answer:
psql-hcp-apim-dmz-cp-dev-centralus.postgres.database.azure.com canonical name = psql-hcp-apim-dmz-cp-dev-centralus.privatelink.postgres.database.azure.com.
Name: psql-hcp-apim-dmz-cp-dev-centralus.privatelink.postgres.database.azure.com
Address: 10.15.34.69
so at the pod level it does resolve, but when running migrations it doesn’t.
my /etc/resolv.conf file
$ cat /etc/resolv.conf
search dmz-kong.svc.cluster.local svc.cluster.local cluster.local 13jqinnqegaetjxzt0guttm2sb.gx.internal.cloudapp.net
nameserver 10.0.0.10
options ndots:5
one time i did get the following output from kong migrations list
$ kong migrations list
Executed migrations:
core: 000_base, 003_100_to_110, 004_110_to_120, 005_120_to_130, 006_130_to_140, 007_140_to_150, 008_150_to_200, 009_200_to_210, 010_210_to_211, 011_212_to_213, 012_213_to_220, 013_220_to_230, 014_230_to_270, 015_270_to_280, 016_280_to_300, 017_300_to_310, 018_310_to_320, 019_320_to_330, 020_330_to_340, 021_340_to_350, 022_350_to_360, 023_360_to_370, 024_380_to_390
acl: 000_base_acl, 002_130_to_140, 003_200_to_210, 004_212_to_213
acme: 000_base_acme, 001_280_to_300, 002_320_to_330, 003_350_to_360
ai-proxy: 001_360_to_370
basic-auth: 000_base_basic_auth, 002_130_to_140, 003_200_to_210
bot-detection: 001_200_to_210
hmac-auth: 000_base_hmac_auth, 002_130_to_140, 003_200_to_210
http-log: 001_280_to_300
ip-restriction: 001_200_to_210
jwt: 000_base_jwt, 002_130_to_140, 003_200_to_210
key-auth: 000_base_key_auth, 002_130_to_140, 003_200_to_210, 004_320_to_330
oauth2: 000_base_oauth2, 003_130_to_140, 004_200_to_210, 005_210_to_211, 006_320_to_330, 007_320_to_330
opentelemetry: 001_331_to_332
post-function: 001_280_to_300
pre-function: 001_280_to_300
rate-limiting: 000_base_rate_limiting, 003_10_to_112, 004_200_to_210, 005_320_to_330, 006_350_to_360
response-ratelimiting: 000_base_response_rate_limiting, 001_350_to_360
session: 000_base_session, 001_add_ttl_index, 002_320_to_330
i then ran it again and saw the dns resolution issue again.
when i did a kong migrations —v list i received additional info
/usr/local/share/lua/5.1/kong/cmd/migrations.lua:101: [PostgreSQL error] failed to retrieve PostgreSQL server_version_num: [cosocket] DNS resolution failed: DNS server error: failed to receive reply from UDP server 10.0.0.10:53: timeout, took 409 ms. Tried: [[“psql-hcp-apim-dmz-cp-dev-centralus.postgres.database.azure.com:A”,“DNS server error: failed to receive reply from UDP server 10.0.0.10:53: timeout, took 409 ms”]]
Expected Behavior
i expected the migrations to work
Steps To Reproduce
i exec into the kong pod to try running migrations list and get the errors
Anything else?
No response
The text was updated successfully, but these errors were encountered: