Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Kubernetes build status fetch error on server restart #562

Closed
shiipou opened this issue Mar 22, 2024 · 3 comments · Fixed by #564
Closed

[Bug]: Kubernetes build status fetch error on server restart #562

shiipou opened this issue Mar 22, 2024 · 3 comments · Fixed by #564
Assignees
Labels
bug Something isn't working

Comments

@shiipou
Copy link
Contributor

shiipou commented Mar 22, 2024

What happened?

When the server restart, it try to get back the build status to complete de build. But if the build didn't exist anymore on the kubernetes because it's finished (But notification didn't got retrieved by the server) the server throw an error at start for each build.

What browsers are you seeing the problem on?

No response

Version

1.5.3

Relevant log output

https://lenra-br.sentry.io/issues/4573629917/

Uncaught exit - {:noproc, {GenServer, :stop, [{:global, {Lenra.Kubernetes.Status, "591"}}, :normal, :infinity]}}
  File "lib/gen_server.ex", line 977, in GenServer.stop/3
  File "lib/lenra_web/controllers/runner_controller.ex", line 24, in LenraWeb.RunnerController.update_build/2
  File "lib/lenra_web/controllers/runner_controller.ex", line 1, in LenraWeb.RunnerController.action/2
  File "lib/lenra_web/controllers/runner_controller.ex", line 1, in LenraWeb.RunnerController.phoenix_controller_pipeline/2
  File "lib/phoenix/router.ex", line 354, in Phoenix.Router.__call__/2
  File "lib/lenra_web/endpoint.ex", line 1, in LenraWeb.Endpoint.plug_builder_call/2
  File "lib/lenra_web/endpoint.ex", line 1, in LenraWeb.Endpoint."call (overridable 3)"/2
  File "lib/lenra_web/endpoint.ex", line 1, in LenraWeb.Endpoint.call/2
@shiipou shiipou added the bug Something isn't working label Mar 22, 2024
@taorepoara
Copy link
Member

This error is strange since Kubernetes job should not be removed directly.

We might define a duration after which the job is failed at the server start. use the same duration as build timeout.

@taorepoara
Copy link
Member

Check if the deployment timeout covers this issue

@jonas-martinez
Copy link
Collaborator

jonas-martinez commented Apr 4, 2024

I don't think that this is a problem with Kubernetes because if you go to the lib/lenra_web/controllers/runner_controller.ex line 24 as shown in your error above, you will see that this function is called by Kubernetes from the /runner/build/:id API endpoint and that the server properly updates the build status on its database.

This means that only the Kubernetes.Status GenServer crashed or stopped prematurely and that we just need to ignore the error on the server when trying to stop this GenServer when it is not running.

Please see my PR to fix this issue, and don't hesitate to read the PR description for more information.

#564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants