-
-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Submitit] Restructure the plugin configuration #682
Changes from 1 commit
12fdd9f
4346543
efd26f1
1a3650f
4e059ee
5269c7e
4dbb681
46e8edf
9cba683
78285ad
b13ff1a
4200394
37e4bf5
f7a6ecb
240fc8a
cf21e0f
975a372
1f9203c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,13 +34,16 @@ def manipulate_search_path(self, search_path: ConfigSearchPath) -> None: | |
) | ||
|
||
|
||
class SubmititLauncher(Launcher): | ||
class LocalSubmititLauncher(Launcher): | ||
|
||
_EXECUTOR = "local" | ||
|
||
def __init__(self, **params: Any) -> None: | ||
self.params = OmegaConf.structured( | ||
LocalParams if params["executor"].value == "local" else SlurmParams | ||
LocalParams if self._EXECUTOR == "local" else SlurmParams | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest adding a base class to express that the slurm launcher is not REALLY a local launcher. # your current local:
class BaseSubmititLauncher(Launcher):
def _get_params_class() -> Type[BaseParams]:
raise NotImplementedError
... # the rest is the same.
class LocalSubmititLauncher(BaseSubmititLauncher)
def _get_params_class() -> Type[BaseParams]:
return LocalParams
class SlurmSubmititLauncher(BaseSubmititLauncher)
def _get_params_class() -> Type[BaseParams]:
return SlurmParams You can go back to a DictConfig with something like this (and now you don't need the update loop) self.params = OmegaConf.structured(self._get_params_class(**params)) |
||
for key, val in params.items(): | ||
OmegaConf.update(self.params, key, val if key != "executor" else val.value) | ||
OmegaConf.update(self.params, key, val) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am surprised that worked in the previous version when you use the actual object and not a DictConfig. regardless, you can get rid of this loop with the example I gave you above. |
||
self.config: Optional[DictConfig] = None | ||
self.config_loader: Optional[ConfigLoader] = None | ||
self.task_function: Optional[TaskFunction] = None | ||
|
@@ -100,37 +103,36 @@ def launch( | |
|
||
num_jobs = len(job_overrides) | ||
assert num_jobs > 0 | ||
exec_name = self.params.executor.value | ||
params = self.params | ||
|
||
# build executor | ||
init_renamer = dict(executor="cluster", submitit_folder="folder") | ||
init_renamer = dict(submitit_folder="folder") | ||
init_params = {name: params[k] for k, name in init_renamer.items()} | ||
init_params = { | ||
k: v.value if isinstance(v, Enum) else v for k, v in init_params.items() | ||
} | ||
specific_init_keys = {"max_num_timeout"} | ||
init_params.update( | ||
**{ | ||
f"{exec_name}_{x}": y | ||
f"{self._EXECUTOR}_{x}": y | ||
for x, y in params.items() | ||
if x in specific_init_keys | ||
} | ||
) | ||
init_keys = specific_init_keys | set(init_renamer) # used config keys | ||
executor = submitit.AutoExecutor(**init_params) | ||
executor = submitit.AutoExecutor(cluster=self._EXECUTOR, **init_params) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this line is confusing to me: The auto executor is expecting the strings "local" or "slurm" as options for cluster? |
||
|
||
# specify resources/parameters | ||
baseparams = set(dataclasses.asdict(BaseParams()).keys()) | ||
params = { | ||
x if x in baseparams else f"{exec_name}_{x}": y | ||
x if x in baseparams else f"{self._EXECUTOR}_{x}": y | ||
for x, y in params.items() | ||
if x not in init_keys | ||
} | ||
executor.update_parameters(**params) | ||
|
||
log.info( | ||
f"Submitit '{exec_name}' sweep output dir : " | ||
f"Submitit '{self._EXECUTOR}' sweep output dir : " | ||
f"{self.config.hydra.sweep.dir}" | ||
) | ||
sweep_dir = Path(str(self.config.hydra.sweep.dir)) | ||
omry marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
@@ -157,3 +159,8 @@ def launch( | |
|
||
jobs = executor.map_array(self, *zip(*params)) | ||
return [j.results()[0] for j in jobs] | ||
|
||
|
||
class SlurmSubmititLauncher(LocalSubmititLauncher): | ||
|
||
_EXECUTOR = "slurm" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's talk about this for a moment:
With Hydra, each job is getting a working directory inside the sweep dir.
If we can keep all the files for the job in that directory it would be really good (users will have an easier time debugging issues for a particular job).
This may take some submitit changes, but is something to consider for the future.
Right now, if a job is failing the user will need to:
If we managed to use the working directory here it would be better.
I suspect currently this is not possible without changes to submitit.
An alternative solution to this problem is to symlink the stdout and stderr from the submitit folder into the job working directory. But this can also be tricky.
This is not a feature request for this PR, but something to think about to improve the user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed (for now this is the same behavior as before, but I agree it is not convenient)
It would actually be possible if not using arrays, but that's not a good option. With arrays on the other hand we can't specify the exact folder of each run individually, it must be deduced from the parameters of the array. Hacking something is somehow possible but would require a lot of (mostly unhelpful) changes in submitit and would not be robust anyway.
The other option would be that hydra uses the folder provided by submitit (using the job_id instead of 0, 1, 2 etc). But that's probably not something you'd want either, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can customize the sweep subdirectory through Hydra.
So yeah, I can't really use the submitit generated dir.