Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Submitit] Restructure the plugin configuration #682

Merged
merged 18 commits into from
Jun 22, 2020
Original file line number Diff line number Diff line change
@@ -1,24 +1,17 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from dataclasses import dataclass
from enum import Enum
from typing import Optional

from hydra.core.config_store import ConfigStore
from hydra.types import ObjectConf


class ExecutorName(Enum):
local = "local"
slurm = "slurm"


@dataclass
class BaseParams:
"""Configuration shared by all executors
"""

executor: ExecutorName = ExecutorName.slurm # can we get rid of this?
submitit_folder: str = "${hydra.sweep.dir}/.${hydra.launcher.params.executor.value}"
submitit_folder: str = "${hydra.sweep.dir}/.submitit/%j"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about this for a moment:
With Hydra, each job is getting a working directory inside the sweep dir.

If we can keep all the files for the job in that directory it would be really good (users will have an easier time debugging issues for a particular job).
This may take some submitit changes, but is something to consider for the future.

Right now, if a job is failing the user will need to:

  1. Look at the log file in the job output directory.
  2. Look in the submitit output folder, figure out which files belong to the job that caused the issue somehow and check the output/error files.

If we managed to use the working directory here it would be better.
I suspect currently this is not possible without changes to submitit.

An alternative solution to this problem is to symlink the stdout and stderr from the submitit folder into the job working directory. But this can also be tricky.

This is not a feature request for this PR, but something to think about to improve the user experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can keep all the files for the job in that directory it would be really good

Indeed (for now this is the same behavior as before, but I agree it is not convenient)

I suspect currently this is not possible without changes to submitit.

It would actually be possible if not using arrays, but that's not a good option. With arrays on the other hand we can't specify the exact folder of each run individually, it must be deduced from the parameters of the array. Hacking something is somehow possible but would require a lot of (mostly unhelpful) changes in submitit and would not be robust anyway.

The other option would be that hydra uses the folder provided by submitit (using the job_id instead of 0, 1, 2 etc). But that's probably not something you'd want either, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can customize the sweep subdirectory through Hydra.
So yeah, I can't really use the submitit generated dir.


# maximum time for the job in minutes
timeout_min: int = 60
omry marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -67,18 +60,18 @@ class SlurmParams(BaseParams):

@dataclass
class LocalParams(BaseParams):
executor: ExecutorName = ExecutorName.local # can we get rid of this?
pass


@dataclass
class SlurmConf(ObjectConf):
cls: str = "hydra_plugins.hydra_submitit_launcher.submitit_launcher.SubmititLauncher"
cls: str = "hydra_plugins.hydra_submitit_launcher.submitit_launcher.SlurmSubmititLauncher"
params: SlurmParams = SlurmParams()


@dataclass
class LocalConf(ObjectConf):
cls: str = "hydra_plugins.hydra_submitit_launcher.submitit_launcher.SubmititLauncher"
cls: str = "hydra_plugins.hydra_submitit_launcher.submitit_launcher.LocalSubmititLauncher"
params: LocalParams = LocalParams()


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,16 @@ def manipulate_search_path(self, search_path: ConfigSearchPath) -> None:
)


class SubmititLauncher(Launcher):
class LocalSubmititLauncher(Launcher):

_EXECUTOR = "local"

def __init__(self, **params: Any) -> None:
self.params = OmegaConf.structured(
LocalParams if params["executor"].value == "local" else SlurmParams
LocalParams if self._EXECUTOR == "local" else SlurmParams
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a base class to express that the slurm launcher is not REALLY a local launcher.

# your current local:
class BaseSubmititLauncher(Launcher):
  def _get_params_class() -> Type[BaseParams]:
     raise NotImplementedError

  ... # the rest is the same.

class LocalSubmititLauncher(BaseSubmititLauncher)
  def _get_params_class() -> Type[BaseParams]:
     return LocalParams 

class SlurmSubmititLauncher(BaseSubmititLauncher)
  def _get_params_class() -> Type[BaseParams]:
     return SlurmParams

You can go back to a DictConfig with something like this (and now you don't need the update loop)

self.params = OmegaConf.structured(self._get_params_class(**params))

for key, val in params.items():
OmegaConf.update(self.params, key, val if key != "executor" else val.value)
OmegaConf.update(self.params, key, val)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised that worked in the previous version when you use the actual object and not a DictConfig.

regardless, you can get rid of this loop with the example I gave you above.

self.config: Optional[DictConfig] = None
self.config_loader: Optional[ConfigLoader] = None
self.task_function: Optional[TaskFunction] = None
Expand Down Expand Up @@ -100,37 +103,36 @@ def launch(

num_jobs = len(job_overrides)
assert num_jobs > 0
exec_name = self.params.executor.value
params = self.params

# build executor
init_renamer = dict(executor="cluster", submitit_folder="folder")
init_renamer = dict(submitit_folder="folder")
init_params = {name: params[k] for k, name in init_renamer.items()}
init_params = {
k: v.value if isinstance(v, Enum) else v for k, v in init_params.items()
}
specific_init_keys = {"max_num_timeout"}
init_params.update(
**{
f"{exec_name}_{x}": y
f"{self._EXECUTOR}_{x}": y
for x, y in params.items()
if x in specific_init_keys
}
)
init_keys = specific_init_keys | set(init_renamer) # used config keys
executor = submitit.AutoExecutor(**init_params)
executor = submitit.AutoExecutor(cluster=self._EXECUTOR, **init_params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is confusing to me:

The auto executor is expecting the strings "local" or "slurm" as options for cluster?


# specify resources/parameters
baseparams = set(dataclasses.asdict(BaseParams()).keys())
params = {
x if x in baseparams else f"{exec_name}_{x}": y
x if x in baseparams else f"{self._EXECUTOR}_{x}": y
for x, y in params.items()
if x not in init_keys
}
executor.update_parameters(**params)

log.info(
f"Submitit '{exec_name}' sweep output dir : "
f"Submitit '{self._EXECUTOR}' sweep output dir : "
f"{self.config.hydra.sweep.dir}"
)
sweep_dir = Path(str(self.config.hydra.sweep.dir))
omry marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -157,3 +159,8 @@ def launch(

jobs = executor.map_array(self, *zip(*params))
return [j.results()[0] for j in jobs]


class SlurmSubmititLauncher(LocalSubmititLauncher):

_EXECUTOR = "slurm"
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,17 @@
)
from hydra.test_utils.test_utils import chdir_plugin_root

from hydra_plugins.hydra_submitit_launcher.submitit_launcher import SubmititLauncher
from hydra_plugins.hydra_submitit_launcher import submitit_launcher

chdir_plugin_root()


def test_discovery():
# Tests that this plugin can be discovered via the plugins subsystem when looking for Launchers
assert SubmititLauncher.__name__ in [
assert submitit_launcher.LocalSubmititLauncher.__name__ in [
x.__name__ for x in Plugins.instance().discover(Launcher)
]
assert submitit_launcher.SlurmSubmititLauncher.__name__ in [
x.__name__ for x in Plugins.instance().discover(Launcher)
]

Expand Down