Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All ranks log an error when r_lite is missing, even when it is not to run on that broker/node #1439

Closed
trws opened this issue Apr 9, 2018 · 6 comments · Fixed by flux-framework/flux-sched#321
Assignees

Comments

@trws
Copy link
Member

trws commented Apr 9, 2018

This is for a single-node job:

2018-04-09T04:21:22.093173Z job.info[0]: No lwj.0.11.1509.R_lite: No such file or directory
2018-04-09T04:21:22.095083Z job.info[1]: No lwj.0.11.1509.R_lite: No such file or directory
2018-04-09T04:21:22.094698Z job.info[2]: No lwj.0.11.1509.R_lite: No such file or directory
...
2018-04-09T04:21:22.097958Z job.info[905]: No lwj.0.11.1509.R_lite: No such file or directory
@trws
Copy link
Member Author

trws commented Apr 9, 2018

I should note, normally I wouldn't call this a bug, but it's a lot of printing to have thousands of those lines for every single job. Even for info level, it's substantial.

@garlick
Copy link
Member

garlick commented Apr 9, 2018

I think this intended to log the fallback to the old rank.N.cores method, but probably not intended to do that on nodes that aren't part of the allocation.

Is anything using the old method at this point or have we fully moved over to R_lite? Maybe not only the log message but the fallback can go away now?

@garlick
Copy link
Member

garlick commented Apr 9, 2018

Well duh, I guess at least @trws's WIP branch is using it! I should wake up before I post.

@grondo
Copy link
Contributor

grondo commented Apr 9, 2018 via email

@trws
Copy link
Member Author

trws commented Apr 9, 2018

Agreed, the sched end isn't all done with that yet if I understand correctly. There may be a branch that does, but I'm not using it yet.

@dongahn
Copy link
Member

dongahn commented Apr 9, 2018

Yeah for the record, the sched R_lite generation work isn't finished.

grondo added a commit to grondo/flux-core that referenced this issue Apr 9, 2018
Reduce the amount of redundant logging from the job module. For errors
that will be the same for every rank (e.g. missing or unparseable R_lite),
log only from rank 0. Change the log level for "no rank.N dir for this
rank" to debug, since it is an expected condition for a job that does
not target rank N.

This should greatly reduce amount of chatter from the job module to the
flux log.

Fixes flux-framework#1439
@grondo grondo self-assigned this Apr 9, 2018
grondo added a commit to grondo/flux-core that referenced this issue Apr 11, 2018
Reduce the amount of redundant logging from the job module. For errors
that will be the same for every rank (e.g. missing or unparseable R_lite),
log only from rank 0. Change the log level for "no rank.N dir for this
rank" to debug, since it is an expected condition for a job that does
not target rank N.

This should greatly reduce amount of chatter from the job module to the
flux log.

Fixes flux-framework#1439
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants