Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kong proxy pod crash loopback every day #9836

Closed
1 task done
sogos opened this issue Nov 28, 2022 · 20 comments
Closed
1 task done

Kong proxy pod crash loopback every day #9836

sogos opened this issue Nov 28, 2022 · 20 comments
Labels
area/kubernetes Issues where Kong is running on top of Kubernetes bug

Comments

@sogos
Copy link

sogos commented Nov 28, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

3.0.1.0

Current Behavior

On Kubernetes, kong-gateway:3.0.1.0, installed with HELM
Every night we shut down our cluster to reduce cost (dev/staging) and every morning the system restart automatically.
Since few days, kong is broken with following logs:

2022/11/25 09:58:40 [warn] 1#0: the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /kong_prefix/nginx.conf:6
nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /kong_prefix/nginx.conf:6
2022/11/25 09:58:40 [warn] 1#0: [kong] [C]:-1 [Penlight 1.13.1] the contents of module 'pl.xml' has been deprecated, please use a more specialized library instead (deprecated after 1.11.0, scheduled for removal in 2.0.0)
nginx: [warn] [kong] [C]:-1 [Penlight 1.13.1] the contents of module 'pl.xml' has been deprecated, please use a more specialized library instead (deprecated after 1.11.0, scheduled for removal in 2.0.0)
2022/11/25 09:58:40 [emerg] 1#0: bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
2022/11/25 09:58:40 [notice] 1#0: try again to bind() after 500ms
2022/11/25 09:58:40 [emerg] 1#0: bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
2022/11/25 09:58:40 [notice] 1#0: try again to bind() after 500ms
2022/11/25 09:58:40 [emerg] 1#0: bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
2022/11/25 09:58:40 [notice] 1#0: try again to bind() after 500ms
2022/11/25 09:58:40 [emerg] 1#0: bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
2022/11/25 09:58:40 [notice] 1#0: try again to bind() after 500ms
2022/11/25 09:58:40 [emerg] 1#0: bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
nginx: [emerg] bind() to unix:/kong_prefix/worker_events.sock failed (98: Address already in use)
2022/11/25 09:58:40 [notice] 1#0: try again to bind() after 500ms
2022/11/25 09:58:40 [emerg] 1#0: still could not bind()
nginx: [emerg] still could not bind()

Our last changes:

  • Use AWS Lambda plugin for a service

Expected Behavior

When the Kubernetes restart on the morning, Kong is working properly and is ready to take requests

Steps To Reproduce

No response

Anything else?

No response

@chronolaw chronolaw added area/kubernetes Issues where Kong is running on top of Kubernetes bug labels Nov 28, 2022
@chronolaw
Copy link
Contributor

Thanks for your report.

We have noticed this issue. It seems that Kong can not create unix socket kong_prefix/worker_events.sock for lua-resty-events.

@ugun111
Copy link

ugun111 commented Nov 29, 2022

when docker restart has this problem
How to solve it

@ugun111
Copy link

ugun111 commented Nov 29, 2022

2022-11-29 15:30:27 nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /usr/local/kong/nginx.conf:6
2022-11-29 15:30:27 2022/11/29 07:30:27 [emerg] 1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:27 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:27 2022/11/29 07:30:27 [notice] 1#0: try again to bind() after 500ms
2022-11-29 15:30:28 2022/11/29 07:30:27 [emerg] 1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:28 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:28 2022/11/29 07:30:27 [notice] 1#0: try again to bind() after 500ms
2022-11-29 15:30:28 2022/11/29 07:30:27 [emerg] 1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:28 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:28 2022/11/29 07:30:27 [notice] 1#0: try again to bind() after 500ms
2022-11-29 15:30:29 2022/11/29 07:30:27 [emerg] 1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:29 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:29 2022/11/29 07:30:27 [notice] 1#0: try again to bind() after 500ms
2022-11-29 15:30:29 2022/11/29 07:30:27 [emerg] 1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:29 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address in use)
2022-11-29 15:30:29 2022/11/29 07:30:27 [notice] 1#0: try again to bind() after 500ms
2022-11-29 15:30:30 2022/11/29 07:30:27 [emerg] 1#0: still could not bind()
2022-11-29 15:30:30 nginx: [emerg] still could not bind()

@fffonion
Copy link
Contributor

The fix is at #9254 starting 3.0.0, could you try new version @sogos ?

@chronolaw chronolaw added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Nov 29, 2022
@ugun111
Copy link

ugun111 commented Nov 30, 2022

Kong version ($ kong version)
3.0.1-ubuntu

@sogos
Copy link
Author

sogos commented Nov 30, 2022

Hi, checked actual version of our Docker image
kong/kong-gateway:3.0.1.0

kong@4433576f03c7:/$ kong version
Kong Enterprise 3.0.1.0

I was wrong when i opened the ticket

@chronolaw chronolaw removed the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Dec 1, 2022
@chronolaw
Copy link
Contributor

Kong 3.0 uses lua-resty-events library to propagate events, which needs a UNIX socket to send/receive messages.

The UNIX socket file is "unix:" .. prefix .. "/worker_events.sock" , here prefix is the env prefix value.

From the error logs above, I think the reason may be the path has the wrong permission, and kong can not access the socket file.

Could you share the details of docker run params?

@ugun111
Copy link

ugun111 commented Dec 23, 2022

#######################################

Kong: The API Gateway

#######################################

networks:
kong-net:

kong:
image: kong:3.1.1-ubuntu
restart: on-failure
user: kong
networks:
- kong-net
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: 127.0.0.1
KONG_PG_DATABASE: kong
KONG_PG_USER: kong
KONG_PG_PASSWORD: ${KONG_PG_PASSWORD:-kong}
KONG_CASSANDRA_CONTACT_POINTS: kong-database
KONG_PROXY_LISTEN: 0.0.0.0:8000
KONG_PROXY_LISTEN_SSL: 0.0.0.0:8443
KONG_ADMIN_LISTEN: 0.0.0.0:8001
KONG_DNS_RESOLVER: 127.0.0.1:8600
KONG_DNS_ORDER: SRV,LAST,A,CNAME
depends_on:
- kong-database

healthcheck:
  test: ["CMD", "kong", "health"]
  interval: 10s
  timeout: 10s
  retries: 10
ports:
  - "8000:8000"
  - "8443:8443"
  - "8001:8001"
  - "8444:8444"

@chronolaw
Copy link
Contributor

I did not see prefix env in this docker config. the default value should be /usr/local/kong/.

If kong works well, you can use docker exec -it to enter the container and see the socket file worker_events.sock.

Please check the prefix path to ensure it is available for kong.

@ugun111
Copy link

ugun111 commented Dec 23, 2022

$ ls
bin boot devdocker-entrypoint.sh etc home lib lib32 lib64 libx32 media mntopt proc root run sbin srv sys tmp usrvar
$ cd /usr/local/kong
$ ls
COPYRIGHT client_body_temp include logs nginx-kong.conf pids scgi_temp worker_events.sock
bin fastcgi_temp lib nginx-kong-stream.conf nginx.conf proxy_temp uwsgi_temp
$

@chronolaw
Copy link
Contributor

chronolaw commented Dec 23, 2022

Could you check the permission of these files with ls -l, they should like:

drwxr-xr-x    2 kong     nogroup       4096 Dec 23 08:43 logs
-rw-r--r--    1 kong     nogroup       2600 Dec 23 08:43 nginx-kong-stream.conf
-rw-r--r--    1 kong     nogroup      11473 Dec 23 08:43 nginx-kong.conf
-rw-r--r--    1 kong     nogroup        312 Dec 23 08:43 nginx.conf
drwxr-xr-x    2 kong     nogroup       4096 Dec 23 08:43 pids
drwx------    2 kong     nogroup       4096 Dec 23 08:43 proxy_temp
drwx------    2 kong     nogroup       4096 Dec 23 08:43 scgi_temp
drwx------    2 kong     nogroup       4096 Dec 23 08:43 uwsgi_temp
srw-rw-rw-    1 kong     nogroup          0 Dec 23 08:43 worker_events.sock

@ugun111
Copy link

ugun111 commented Dec 26, 2022

$ ls -l
total 196
-rw-rw-r-- 1 kong root 136758 Dec 9 23:09 COPYRIGHT
drwxrwxr-x 2 kong root 4096 Dec 12 20:42 bin
drwx------ 2 kong kong 4096 Dec 19 02:12 client_body_temp
drwx------ 2 kong kong 4096 Dec 19 02:12 fastcgi_temp
drwxrwxr-x 7 kong root 4096 Dec 12 20:42 include
drwxrwxr-x 4 kong root 4096 Dec 12 20:42 lib
drwxr-xr-x 2 kong kong 4096 Dec 23 08:37 logs
-rw-r--r-- 1 kong kong 2603 Dec 23 08:37 nginx-kong-stream.conf
-rw-r--r-- 1 kong kong 11530 Dec 23 08:37 nginx-kong.conf
-rw-r--r-- 1 kong kong 315 Dec 23 08:37 nginx.conf
drwxr-xr-x 2 kong kong 4096 Dec 23 08:37 pids
drwx------ 2 kong kong 4096 Dec 19 02:12 proxy_temp
drwx------ 2 kong kong 4096 Dec 19 02:12 scgi_temp
drwx------ 2 kong kong 4096 Dec 19 02:12 uwsgi_temp
srw-rw-rw- 1 kong kong 0 Dec 23 08:37 worker_events.sock
$

@chronolaw
Copy link
Contributor

Hi ugun111, from your posted data it seems all works well, and I also can not reproduce the same error.

@ugun111
Copy link

ugun111 commented Dec 26, 2022

it can work well,but if you don't stop docker and restart the computer,it will show

1#0: bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address already in use)
2022-12-26 13:32:43 nginx: [emerg] bind() to unix:/usr/local/kong/worker_events.sock failed (98: Address already in use)

@agm650
Copy link

agm650 commented Jan 4, 2023

Hello,
I can observe the same issue as @ugun111 .
I'm also using Kong 3.0.1 deployed with the chart version 2.13.1

It seems that the issue is that when the pod is killed "abruptly", sock are left in the empty dir.
Since the empty dir is not defined with "medium: Memory", sock are still present after the restart, leading to the issue describe in this issue.

A way to solve it would be IMO to set the kong_prefix emptyDir to use medium memory. this way it would use tmpfs and sock will not get persisted after a hard stop.

I've edited locally the daemonset I'm using and it seems to be working

@ugun111
Copy link

ugun111 commented Jan 9, 2023

can you give me your docker-compose.yml , volumes the name of the kong_prefix emptyDir @agm650

@agm650
Copy link

agm650 commented Jan 9, 2023

I'm using helm chart to install Kong, so I do not have a docker-compose.yml file. If you are using docker-compose directly you might be able to specify that the kong_prefix directory is mounted with the flag "--tmpfs" or the "--mount" flag by adding the type=tmpfs
At least thats what I understand from this link

@chronolaw
Copy link
Contributor

We have a PR trying to fix it. Kong/docker-kong#620

@guanlan guanlan closed this as completed Feb 10, 2023
@samk64
Copy link

samk64 commented Feb 25, 2023

For anyone still experiencing this issue while waiting for Kong/docker-kong#620 to get into a future release, here is a reference way to implement a workaround fix by helm chart (the kong image manifest shows the latest kong image 3.1.1 on commit Kong/docker-kong@5f914be, which is older than the merged fix):

One way is to define an initContainer, similar to how stale PIDs are cleared in the kong helm chart:

    release_values:
      deployment:
        initContainers:
          - name: clear-stale-sockets
            image: "kong:3.1.1"
            command:
            - "rm"
            - "-vrf"
            - "/kong_prefix/worker_events.sock"
            volumeMounts:
              - name: "kong-kong-prefix-dir"
                mountPath: "/kong_prefix/"

If your $KONG_PREFIX is different, just change the prefix dir accordingly. The kong-kong-prefix-dir volume is referencing the default volume created here. The initContainer will run before the proxy container starts up and clear the stale socket file if it's there.

The other way is as @agm650 suggested. I had a few issues getting it to work at first because the default prefix dir volume defined in the kong helm chart can't be modified, so i had to define a new alternate volume to define it as medium: "Memory" and then override the prefix dir environment variable to use the new volume:

    release_values:
      deployment:
        userDefinedVolumes:
          - name: "kong-kong-prefix-dir-tempfs"
            emptyDir:
              medium: "Memory"
              sizeLimit: 256Mi
        userDefinedVolumeMounts:
          - name: "kong-kong-prefix-dir-tempfs"
            mountPath: "/kong_prefix_tempfs/"
      env:
        prefix: "/kong_prefix_tempfs/"
        
      # optional. otherwise, it will just use the default value in the helm chart's values.yaml
      image:
        repository: "kong"
        tag: "3.1.1"

@Revanh13
Copy link

If anyone has encountered this problem in docker so far, then here is a command for you to delete the problematic file. It must be executed when this error occurs, at the moment of starting inside the Kong container

rm /usr/local/kong/worker_events.sock

If you set env "KONG_PREFIX" at the start of Kong, then replace "/usr/local/kong" (set's by default) with your own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes Issues where Kong is running on top of Kubernetes bug
Projects
None yet
Development

No branches or pull requests

8 participants