-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maestro study not launching? #254
Comments
@doutriaux1 -- What version of Maestro are you using? |
maestro 1.1.7dev1 |
@FrankD412 I have the same version of the yaml but with more jobs to be run before that works great. One thing is now I do not have any slurm jobs anymore. C. |
If you pulled recently, there was a shift in where expansion happens. Expansion no longer happens before Maestro exits, instead happening in the conductor. How many nodes does your graph have? Also, what does the Maestro generated directory look like? |
ll -rt generate_hohlraum_20200429-080926
total 60K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 logs
-rw------- 1 cdoutrix aml_cs 3.4K Apr 29 08:09 maestro_custom_generator.py
drwx--S--- 3 cdoutrix aml_cs 4.0K Apr 29 08:09 meta
-rw------- 1 cdoutrix aml_cs 1.8K Apr 29 08:09 maestro_bug.yaml
-rw------- 1 cdoutrix aml_cs 25K Apr 29 08:09 generate_hohlraum.pkl
-rw------- 1 cdoutrix aml_cs 22 Apr 29 08:09 generate_hohlraum.txt
drwx--S--- 14 cdoutrix aml_cs 8.0K Apr 29 08:10 directory_permissions
drwx--S--- 3 cdoutrix aml_cs 4.0K Apr 29 08:10 kosh |
My git log says: git log -n1
commit 7022c4370cae8070e4632a423b78298782f3cabb
Author: Francesco Di Natale <dinatale3@llnl.gov>
Date: Tue Apr 14 11:28:38 2020 -0700
Bugfix for logging that didn't appear in submodules (#247)
* Improved logging setup.
* Transition to a LoggerUtil class.
* Addition of docstring to LoggerUtility + cleanup. |
Ok, you're using a more recent version which means my previous comments apply. I did notice that you were missing some keys if you're intending on scheduling steps; you'll need extra keys in your steps. Based off of your parameter generator, I think this is what you're after:
Are there any exceptions at the end of the log file in |
@FrankD412 keys should be generated by the custom generator, but you're right something is gone, the logg says: ed?: False
2020-04-29 08:10:24,970 - maestrowf.interfaces.script.localscriptadapter:submit:153 - WARNING - Execution returned an error: /usr/WS1/aml_cs/ALE/LAGER/data-generation/Hohlraum/generate_hohlraum_20200429-080926/directory_permissions/domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06/directory_permissions_domain_288.laserPowerMult_1.0.minimalAle_1.
NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06.slurm.sh: line 7: run_PATH: command not found
chgrp: cannot access '/HWH_domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06': No such file or directory |
Actually it's in the directory which are all generated, but only one kosh is generated: ll -rt generate_hohlraum_20200429-080926/kosh
total 4.0K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
(kosh) [cdoutrix@rztopaz188:Hohlraum]$ ll -rt generate_hohlraum_20200429-080926/directory_permissions
total 48K
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_18.laserPowerMult_1.0.minimalAle_0.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_72.laserPowerMult_1.0.minimalAle_0.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_72.laserPowerMult_1.0.minimalAle_0.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:09 domain_288.laserPowerMult_1.0.minimalAle_0.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_0.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_1.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_18.laserPowerMult_1.0.minimalAle_1.NODES_2.NODES_XENA_1.PROC_72.PROCS_XENA_18.meshResolution_0.15.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_72.laserPowerMult_1.0.minimalAle_1.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_72.laserPowerMult_1.0.minimalAle_1.NODES_8.NODES_XENA_1.PROC_288.PROCS_XENA_36.meshResolution_0.25.foamDensity_0.35
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_2e-06
drwx--S--- 2 cdoutrix aml_cs 4.0K Apr 29 08:10 domain_288.laserPowerMult_1.0.minimalAle_1.NODES_32.NODES_XENA_2.PROC_1152.PROCS_XENA_72.meshResolution_0.5.foamDensity_0.35 |
forget the error in the log I have a typo: |
@doutriaux1 -- You defined the variable as In your |
the log seems to indicate it's expanding ok: ==================================================
Expanding step 'kosh'
==================================================
-------- Used Parameters --------
{'LASPOWERMULT', 'NODES', 'RESOLUTION', 'DOMAIN', 'NODES_XENA', 'PROCS_XENA', 'PROC', 'RHO_FOAM2', 'MINIMALALE'}
---------------------------------
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:616 - INFO -
**********************************
Combo [laserPowerMult_1.0.minimalAle_0.meshResolution_0.15.domain_18.foamDensity_2e-06.PROC_72.NODES_2.PROCS_XENA_18.NODES_XENA_1]
**********************************
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:645 - INFO - Searching for workspaces...
cmd = echo 72.2.18.0.15.2e-06.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace/aml_cs/ALE/Hohlraum -n $RUNDIR
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:676 - INFO - New cmd = echo 72.2.18.0.15.2e-06.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace/aml_cs/ALE/Hohlraum -n $RUNDIR
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:616 - INFO -
**********************************
Combo [laserPowerMult_1.0.minimalAle_0.meshResolution_0.15.domain_18.foamDensity_0.35.PROC_72.NODES_2.PROCS_XENA_18.NODES_XENA_1]
**********************************
2020-04-29 08:09:28,064 - maestrowf.datastructures.core.study:_stage:645 - INFO - Searching for workspaces...
cmd = echo 72.2.18.0.15.0.35.1.0.0.18.1
export RUNDIR=HWH_$( basename $(WORKSPACE))
python /usr/workspace/aml_cs/ALE/LAGER/data-generation/Hohlraum/add_to_kosh.py --store=/usr/workspace/aml_cs/kosh/kosh_store.sql --root /usr/workspace |
@FrankD412 changing to RUN_PATH does not seem to make a difference |
@FrankD412 even if it fails the status should at least indicate what has run/intialiazed/failed etc. No? |
@doutriaux1 -- It should. I'll have to sit down with the sample or schedule a meeting with you to dive deeper. There isn't anything blatant that I'm seeing that's wrong here, let me mess with it on my end and I'll see what I can find. |
Ok thanks. It's really odd since the full one (with slurm jobs) works fine. |
@FrankD412 if that "helps" things get worse with the repo's head: maestro run -p maestro_custom_generator.py maestro_bug.yaml
[2020-04-29 08:46:30: INFO] INFO Logging Level -- Enabled
[2020-04-29 08:46:30: WARNING] WARNING Logging Level -- Enabled
[2020-04-29 08:46:30: CRITICAL] CRITICAL Logging Level -- Enabled
[2020-04-29 08:46:30: INFO] Loading specification -- path = maestro_bug.yaml
[2020-04-29 08:46:30: ERROR] ('variables',)
Traceback (most recent call last):
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification
specification = cls.load_specification_from_stream(data)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream
specification.verify()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify
self.verify_environment()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment
keys_seen = self._verify_variables()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables
for key, value in self.environment["variables"].items():
KeyError: 'variables'
Traceback (most recent call last):
File "/g/g19/cdoutrix/miniconda3/envs/kosh/bin/maestro", line 11, in <module>
load_entry_point('maestrowf==1.1.7.dev1', 'console_scripts', 'maestro')()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 424, in main
rc = args.func(args)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/maestro.py", line 130, in run_study
spec = YAMLSpecification.load_specification(args.specification)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 116, in load_specification
raise e
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 112, in load_specification
specification = cls.load_specification_from_stream(data)
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 155, in load_specification_from_stream
specification.verify()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 162, in verify
self.verify_environment()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 280, in verify_environment
keys_seen = self._verify_variables()
File "/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/maestrowf-1.1.7.dev1-py3.8.egg/maestrowf/datastructures/yamlspecification.py", line 200, in _verify_variables
for key, value in self.environment["variables"].items():
KeyError: 'variables'
(kosh) [cdoutrix@rztopaz188:Hohlraum]$ |
Oh, that looks like it may be a bug with the last release. Can you file that in a separate issue? That's a new feature that was added about a week ago. I do have a curiosity question related: It looks like you're putting statically defined items in |
@FrankD412 not really, I probably just copy/pasted from another example. And maybe because these do not "vary". |
Got it -- was just curious if there was a use case for it that I should be supporting. Thanks for the info. |
I just created the new issue, didn't realize I could just create it off the comment. Just an FYI not to worry about making a new one. |
@doutriaux1 -- @ben-bay just fixed the |
Here is the yaml I'm trying to run. It expands correctly and when I they "y" to launch the study everything look correct. But it appears nothing is launched a meastro status does not even show anything
Launch output:
status (nothing):
custom generator:
The text was updated successfully, but these errors were encountered: