Use ACME with Ray Tune ASHAScheduler #311

lyshuga · 2023-10-31T13:51:01Z

Hi,

I have experienced an issue running Ray Tune experiments together with ACME distributed SAC training and ASHA scheduler. The idea behind ASHA scheduler is that it will terminate the jobs earlier in order to find the correct hyperparameters. When the single-process experiment is used, the job does not leave any hanging processes after ray terminates a job. Here is an example how the job is started

experiments.run_experiment(
        experiment=experiment,
        eval_every=1_000,
        num_eval_episodes=1)

On the contrary when the job is started in multi-processing way, the termination of job, done by ray, does not affect the processes that launchpad had spawned. So, what happens is that the training ACME job continues running instead of being terminated. Here is an example how the job function is defined and ray tune config.

def create_and_run_program(config):
    experiment = build_experiment_config(config)

    program = experiments.make_distributed_experiment(
        experiment=experiment,
        num_actors=1,
    )
    lp.launch(program, lp.LaunchType.LOCAL_MULTI_PROCESSING)


def train_function(config):
    p = mp.Process(
        target=create_and_run_program,
        args=(config,))
    p.start()
    p.join() # this blocks until the process terminates

trainable_with_cpu_gpu = tune.with_resources(train_function, {"cpu": 16 , "gpu": 1})

asha_scheduler = ASHAScheduler(
    time_attr="training_iteration",
    max_t=100000,
    grace_period=20000,
    reduction_factor=4,
    brackets=1,
)

tuner = tune.Tuner(
    trainable_with_cpu_gpu,
    tune_config=tune.TuneConfig(
        scheduler=asha_scheduler,
        metric="rewards_episode",
        mode="max",
        num_samples=100,
        reuse_actors=False
    ),
    param_space=config_space,
)


results = tuner.fit()

My question is there a way to somehow forward the termination signal from ray (when it terminates its job) to all the node processes?

Thank you in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use ACME with Ray Tune ASHAScheduler #311

Use ACME with Ray Tune ASHAScheduler #311

lyshuga commented Oct 31, 2023 •

edited

Loading

Use ACME with Ray Tune ASHAScheduler #311

Use ACME with Ray Tune ASHAScheduler #311

Comments

lyshuga commented Oct 31, 2023 • edited Loading

lyshuga commented Oct 31, 2023 •

edited

Loading