Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move reconnect logic to main thread #51

Merged
merged 5 commits into from
Dec 12, 2022
Merged

Move reconnect logic to main thread #51

merged 5 commits into from
Dec 12, 2022

Conversation

jordeu
Copy link
Member

@jordeu jordeu commented Dec 7, 2022

Description

Has been reported some cases where Tower Agent after a long time running it gets stalled without reconnecting or sending the heartbeat anymore. It has been impossible to reproduce this scenario.

Trying to mitigate this problem this PR moves all the re-connection logic to the main thread.

@pditommaso
Copy link
Collaborator

Looking at the code I think the System.exit should only be used when there's a not recoverable error e.g. lack of the access token or when the expected work directory cannot be created.

Instead it looks to me there's a system exit even for transient errors e.g. timeout.

Even better, there should be classified in recoverable and not-recoverable with a corresponding exception. Recoverable error should be managed in the infinite while loop, instead the non-recoverable should be managed in the main method producing a system exit eg.

    public static void main(String[] args) {
      try { 
        PicocliRunner.run(Agent.class, args);
      } 
      catch( Throwable e ) {
         log.error("Some message ${e.message}", e)
         System.exit(1)
      }
    }

@jordeu jordeu merged commit 519d35d into master Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants