Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix restart on fast machines #25300

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dmatej
Copy link
Contributor

@dmatej dmatej commented Dec 28, 2024

Fixes #25295 and #25292
There were several issues, see individual commits. The main problem was that the startup can be faster than shutdown and then they could collide on ports and files. The most problematic was the debug port which is enabled since the JVM startup until the very end.

On my new machine it was reproducible in some 80% of executions.

Solution for #25292:

  • Instead of busy spinning on process handles we now wait for the end of the process
  • Instead of busy spinning on remote port we now open the connection and wait until connection is disconnected.

For #25295 was needed also

  • Move startup to shutdown hooks
  • The startup hook waits for the end of other glassfish shutdown hooks (detected by name)
  • Logging is explicitly stopped
  • For extreme cases I added additional logging for dying and starting process which can be enabled by setting an environment option export AS_RESTART_LOGFILES=true;

- The start succeeded too early and on fast machines collided with shutdown.
- Shutdown Hook is really the last thing in the JVM capable of doing it.
- All shutdown hooks have names now

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
…cases

- when current (old) JVM had enabled debugging, the new one sometimes failed
  to start. It is not possible to wait from the inside.
- Stop the kernell after adding the last shutdown hook; shutdown hooks run
  in parallel, but we have to ensure that ours will be executed after all
  other non-daemon hooks finish.
- export AS_RESTART_LOGFILES=true to get "old" and "new" files in the server's
  log directory. It is trivial workaround, because the standard logging system
  might get into a conflict with the new GF instance too.
- The "super debug" is not helpful as it affects timing

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
- its only usage was for the domain restart which was reimplemented

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
@dmatej dmatej added the bug Something isn't working label Dec 28, 2024
@dmatej dmatej added this to the 7.0.21 milestone Dec 28, 2024
@dmatej dmatej requested review from avpinchuk and a team December 28, 2024 16:15
@dmatej dmatej self-assigned this Dec 28, 2024
- backup of the server.log cannot be done if the server is dead
- Using System.Logger instead of JUL

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
- Original code caused local port exhaustion
- Original code used busy spinning instead of signals

Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
@dmatej dmatej force-pushed the fix-restart-on-fast-machines branch 9 times, most recently from a9f1f7e to 2f42fab Compare December 28, 2024 23:58
Signed-off-by: David Matějček <david.matejcek@omnifish.ee>
@dmatej dmatej force-pushed the fix-restart-on-fast-machines branch from 2f42fab to 0f23409 Compare December 29, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The asadmin restart-domain may cause race conditions
1 participant