Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not from forked to trigger full docker CI of Fix/6448/s6 notification polling check #6475

Merged
merged 8 commits into from
Jul 11, 2024

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Jun 13, 2024

Fixes #6448

see #6469, this is to trigger the push to ghcr.io

@danielhollas
Copy link
Collaborator

@unkcpz very strange that the docker.yml workflow was not triggered here.But in any case I've just triggered it via workflow dispatch.

https://github.com/aiidateam/aiida-core/actions/runs/9500519783

@unkcpz
Copy link
Member Author

unkcpz commented Jun 13, 2024

Thanks @danielhollas

@sphuber can you test run with the built image

docker run --rm -it ghcr.io/aiidateam/aiida-core-with-services:fix-6448-s6-notification-polling-check bash

@sphuber
Copy link
Contributor

sphuber commented Jun 14, 2024

Thanks @danielhollas

@sphuber can you test run with the built image

docker run --rm -it ghcr.io/aiidateam/aiida-core-with-services:fix-6448-s6-notification-polling-check bash

I get the same problem:

s6-rc: info: service postgresql-init successfully started
s6-rc: info: service postgresql: starting
waiting for server to start.... done
server started
s6-rc: info: service postgresql successfully started
Error:
RabbitMQ on node rabbit@localhost is not running or has not fully booted yet (check with is_booting)
s6-rc: info: service postgresql-prepare successfully started
Error:
RabbitMQ on node rabbit@localhost is not running or has not fully booted yet (check with is_booting)
 completed with 0 plugins.

it hangs there. So maybe it is a problem with my setup? I am running wsl2 which may have its peculiarities. Not sure.

@danielhollas
Copy link
Collaborator

Hmm, I am not sure why the Docker Images workflow from docker.yml is not getting triggered on this branch. I've created a new PR #6479 and it is working as expected there.

@sphuber sphuber force-pushed the fix/6448/s6-notification-polling-check branch from 85c5e4a to 86dd0e8 Compare July 11, 2024 09:09
@unkcpz
Copy link
Member Author

unkcpz commented Jul 11, 2024

@sphuber sorry, I didn't notice you update in here already. Please feel free go ahead with this and close the other one.

@sphuber
Copy link
Contributor

sphuber commented Jul 11, 2024

It may not have been fixed after all. I triggered the workflow through dispatch manually and there the tests failed: https://github.com/aiidateam/aiida-core/actions/runs/9888940447/job/27314039090
It seems that the script is still telling RMQ is up and running when it is not quite fully booted up, so verdi presto is still run too fast.

@sphuber sphuber force-pushed the fix/6448/s6-notification-polling-check branch from a61b7db to 92cc589 Compare July 11, 2024 10:57
@sphuber
Copy link
Contributor

sphuber commented Jul 11, 2024

Found (at least part of) the problem. Really unhelpfully, is_running can return 0 even if it fails and the server is not yet running

$ sudo rabbitmq-diagnostics is_running && echo $?
Asking node rabbit@invader for its status ...
RabbitMQ on node rabbit@invader is not running or has not fully booted yet (check with is_booting)
0

@sphuber
Copy link
Contributor

sphuber commented Jul 11, 2024

Ok, I think this combination of first running rabbitmq-diagnostics ping to make sure the server is even started, followed by rabbitmq-diagnostics check_running to make sure it if fully up and running, seems to do the trick. I can now build and run this locally, and the build on GHA also worked.

@sphuber sphuber merged commit 9579378 into main Jul 11, 2024
26 checks passed
@sphuber sphuber deleted the fix/6448/s6-notification-polling-check branch July 11, 2024 13:22
mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
For the `aiida-core-with-services` image where the services are part of
the image, we cannot rely on the health check for the services provided
by docker-build as is used for the `aiida-core-base` case. Instead, a
simple sleep was added to the `aiida-prepare.sh` script that sets up 
the profile, to make sure the services are up before the profile is
created.

This solution is neither elegant nor robust. Here the sleep approach is
replaced by `s6-notifyoncheck`. This hook allows blocking the startup
from continuing until a script returns a 0 exit code. The script in
question first calls `rabbitmq-diagnostics ping` to make sure the
RabbitMQ server is even up, followed by a call to
`rabbitmq-diagnostics check_running`. If the latter returns 0, it means
RabbitMQ is up and running and the script returns 0 as well, which will
trigger `s6-notifyoncheck` to continue with the rest of the startup.

Note that `rabbitmq-diagnostics is_running` could not be used as that
command sometimes returns 0 even if the service is not ready at all.

Co-authored-by: Sebastiaan Huber <mail@sphuber.net>
agoscinski pushed a commit to agoscinski/aiida-core that referenced this pull request Nov 5, 2024
For the `aiida-core-with-services` image where the services are part of
the image, we cannot rely on the health check for the services provided
by docker-build as is used for the `aiida-core-base` case. Instead, a
simple sleep was added to the `aiida-prepare.sh` script that sets up
the profile, to make sure the services are up before the profile is
created.

This solution is neither elegant nor robust. Here the sleep approach is
replaced by `s6-notifyoncheck`. This hook allows blocking the startup
from continuing until a script returns a 0 exit code. The script in
question first calls `rabbitmq-diagnostics ping` to make sure the
RabbitMQ server is even up, followed by a call to
`rabbitmq-diagnostics check_running`. If the latter returns 0, it means
RabbitMQ is up and running and the script returns 0 as well, which will
trigger `s6-notifyoncheck` to continue with the rest of the startup.

Note that `rabbitmq-diagnostics is_running` could not be used as that
command sometimes returns 0 even if the service is not ready at all.

Co-authored-by: Sebastiaan Huber <mail@sphuber.net>

Cherry-pick: 9579378
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace the sleep in aiida-prepare.sh startup script of Docker images with more clever solution
3 participants