Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

communication failure when invoking qstat on remote host updated job state #4

Open
coleslaw481 opened this issue Mar 7, 2016 · 0 comments
Labels

Comments

@coleslaw481
Copy link
Contributor

Panfish running on gordon moved a job to completed state when qstat came back with an error. The code should have just left the job in its current state.

Tue Nov 26 16:40:01 2013 (1385512801) INFO [main:243] Exit Code: 0
Tue Nov 26 16:41:01 2013 (1385512861) INFO [Panfish::QstatJobWatcher:188] State updated on 0 job(s) on gordon_shadow.q
Tue Nov 26 16:41:01 2013 (1385512861) INFO [main:243] Exit Code: 0
Unable to communicate with gordon-fe2.local(10.5.1.2)
Unable to communicate with gordon-fe2.local(10.5.1.2)
Communication failure.
qstat: cannot connect to server gordon-fe2.local (errno=15096) Unable to get connection to socket
Tue Nov 26 16:42:01 2013 (1385512921) ERROR [Panfish::PBSJobStateHashFactory:60] Unable to run /opt/torque/bin/qstat :
Tue Nov 26 16:42:01 2013 (1385512921) INFO [Panfish::QstatJobWatcher:188] State updated on 1 job(s) on gordon_shadow.q
Tue Nov 26 16:42:01 2013 (1385512921) INFO [main:243] Exit Code: 0
Tue Nov 26 16:43:01 2013 (1385512981) INFO [main:243] Exit Code: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant