-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix machine discovery on compute nodes #511
Fix machine discovery on compute nodes #511
Conversation
If `E3SMU_MACHINE` has been set (e.g. by sourcing an E3SM-Unified load script), use that as the `machine`. This makes it possible to detect the machine on compute nodes if you are using E3SM-Unified.
fff6a54
to
499332e
Compare
@forsyth2, let me know if you'd like me to make a separate PR for the docs changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading the code changes, this looks good to me.
Partially addresses #406
Why only partially? This seems like a full resolution.
This only fixes the problem if someone is using E3SMU, not if you use a development env. |
@xylar e3sm_diags has this same challenge that when use a development env, and run on compute node, it won't auto detect the machine and use machine specific info. I'm wondering if it is possible to update |
@chengzhuzhang, I'm sorry but I don't know of any solution to this. The compute nodes have super generic names and there is not a consistent way to detect what machine you are on using things like environment variables or files. If you have a suggestion, I'm up for it. But otherwise we simply need users to identify with a config option or command-line flag what machine they are on when they launch on compute nodes. |
I was thinking for the machines that we support, if we can find some pattern of compute nodes names and filter them in |
@chengzhuzhang, on Chrysalis, it might work since the login nodes are named something like But on Anvil, they are named: Compy is the same story: On Perlmutter, even the login nodes are too generic so we have to fall back to an environment variable, and that probably works on compute nodes, though I can't say I've checked. Frontier was similar with the added problem that there was no definitive environment variable to use so I had to fall back on an environment variable related to modules. Again, that likely works on compute nodes. Other supported machines like Chicoma, Andes and ACME1, I don't know what the deal is there since I don't use the that much. |
@chengzhuzhang can you open this issue on https://github.com/E3SM-Project/mache/issues? This is not the right place to discuss this further. |
This merge uses an E3SM-Unified environment variable,
E3SMU_MACHINE
, to determine the machine if this environment variable has been set (e.g. by sourcing an E3SM-Unified load script). This makes it possibleto detect the machine on compute nodes if you are using E3SM-Unified.
This merge also cleans up a few issues with the docs that I noticed.
Partially addresses #406