Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tune] let categorical values return indices that get resolved in a separate step #31927

Merged
merged 34 commits into from
Feb 8, 2023

Conversation

gjoliver
Copy link
Member

Signed-off-by: Jun Gong jungong@anyscale.com

Why are these changes needed?

This is so we can replace the reference table after trial restoration.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [*] Unit tests
    • Release tests
    • This PR is not tested :(

@gjoliver gjoliver force-pushed the resolve_by_references branch 2 times, most recently from a3f95ef to 073905c Compare January 25, 2023 17:22
python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved
python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved
python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved
python/ray/tune/search/sample.py Outdated Show resolved Hide resolved
python/ray/tune/tests/test_basic_variant.py Outdated Show resolved Hide resolved
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is currently specific to the basic variant generator (which might be ok), but I have a slightly different approach to discuss:

Currently:

  • Categorical samples indices instead of values
  • Categorical resolves values in a separate API call

Why?

  • Because we actually pre-generate all samples in the _VariantIterator

Problems:

  • This is not easily generalizable to other searchers
  • It's not very easy to overwrite the configs for existing trials, as the Trial.config objects will still have the actual object in them and not the index.

Instead, maybe:

  • Scan parameter space for any Categoricals. Replace Categorical.categories with a same-shape list of _Placeholder<path, index> objects
  • _Placeholder objects could include the string representation of the object (e.g. for the trial table)
  • We can choose to only replace objects of types we care about (e.g. Datasets, object refs), and not for primitives
  • Create a map of placeholders to "real" categoricals, e.g. placeholder_to_obj: Dict[_Placeholder, Any]
  • Remember which keys had categoricals, e.g. categorical_keys: Set[Tuple[str, ...]
  • After sampling, replace every sampled placeholder object with the respective object from placeholder_to_obj
  • On restore, we only need to update placeholder_to_obj

Benefits:

  • This won't need any adjustment to the sampling of Categoricals, and it can be wrapped around any searcher.suggest() call. So it generealizes to every searcher.
  • Also we can just use the placeholders to update Trial.config objects post restore

For functions, we could either build the replacement map on the fly or just not support it for restoration. I think not supporting it is actually ok.

What do you think?

python/ray/tune/impl/tuner_internal.py Outdated Show resolved Hide resolved
python/ray/tune/search/sample.py Outdated Show resolved Hide resolved
python/ray/tune/search/sample.py Outdated Show resolved Hide resolved
python/ray/tune/search/sample.py Outdated Show resolved Hide resolved
@gjoliver gjoliver force-pushed the resolve_by_references branch from 073905c to af13931 Compare January 29, 2023 22:55
Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, only a few suggestions.

python/ray/tune/execution/trial_runner.py Outdated Show resolved Hide resolved
@@ -20,6 +20,7 @@
from ray.util import get_node_ip_address
from ray.tune import TuneError
from ray.tune.callback import CallbackList, Callback
from ray.tune.search.placeholder import resolve_placeholders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this a Tune internal package? Users cannot interact with this at all.

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/execution/trial_runner.py Outdated Show resolved Hide resolved

Args:
spec: The spec to replace references in.
replacements: A dict from path to replaced objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would it be nicer to create this on the fly and return it so we don't need to pass an empty one in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it a "global container" that gets passed everywhere and collecting useful bits.
creating this on the fly and return it would mean that we will be merging a bunch of things in our code?

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/tests/test_trial_runner_3.py Outdated Show resolved Hide resolved
python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/tests/test_trial_runner_3.py Show resolved Hide resolved
python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, even better than I imagined! Couple of nits

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/execution/trial_runner.py Show resolved Hide resolved
python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved
@gjoliver
Copy link
Member Author

Addressed all the comments. PTAL.

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, One last question I have is regarding to tune.run_experiments. Do I see it correctly that in that case we'll just not use any resolution and use the old way (because placeholder_resolvers is unset)?

Another quick question about the spec vs. trial.config - I'd prefer continue using trial.config if possible

resolve_placeholders(trial.config, self._replaced_ref_map)
if self._placeholder_resolvers:
# Construct the full experiment spec for resolution.
spec = self._spec or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we updating the whole spec and not just the config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I guess, which other elements from the spec do we need? Seems like we're not using the rest of the spec here. If that's the case can we just go back to resolving trial.config - we're overwriting the spec["config"] with it anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to use spec either man. but Function taking spec is our public API ... the following is in our documentation.

"beta": tune.sample_from(lambda spec: spec.config.alpha * np.random.normal()),

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're almost there. I think a few tests are failing right now

@@ -417,6 +398,41 @@ def __init__(
self._state_json = None
self._state_valid = False

def create_placement_group_factory(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should call this implicitly when Trial.placement_group_factory. But also ok to keep this for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. added a TODO for now.
if there are tests failing because of this, I will turn it into a getter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class

self.config = config or {}
# Save a copy of the original unresolved config so that we can swap
# out and update any reference config values after restoration.
self.__unresolved_config = self.config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, why double underscore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very private. nobody should touch or have access to this variable outside of Trial. under the hood, python replace this variable with a name classname__variablename__.

Comment on lines 325 to 326
"test": tune.grid_search([1, 2, 3]),
"test2": tune.grid_search([1, 2, 3]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit for the test, can we use different parameter ranges for the different parameters? E.g.

Suggested change
"test": tune.grid_search([1, 2, 3]),
"test2": tune.grid_search([1, 2, 3]),
"test": tune.grid_search([1, 2, 3]),
"test2": tune.grid_search([4, 5, 6, 7]),

and overwrite with different values as well. Basically to make sure that we we don't conflate parameter overwrites

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@justinvyu
Copy link
Contributor

I think here and here can be removed. It was added before to update trial resources that were updated on restore. But now, create_placement_group_factory gets called on add trial which will do the same thing.

@gjoliver
Copy link
Member Author

gjoliver commented Feb 1, 2023

I think here and here can be removed. It was added before to update trial resources that were updated on restore. But now, create_placement_group_factory gets called on add trial which will do the same thing.

ok, let me remove these.

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

I think you just have to update the tune/BUILD file ad fix one param space error in TunerInternal.

Feel free to merge when CI passes.

Thanks!

@gjoliver gjoliver force-pushed the resolve_by_references branch from 3504276 to 584a88f Compare February 1, 2023 11:52
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. Thank you so much!

python/ray/tune/tests/test_ray_trial_executor.py Outdated Show resolved Hide resolved
@@ -417,6 +398,41 @@ def __init__(
self._state_json = None
self._state_valid = False

def create_placement_group_factory(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class

Copy link
Contributor

@justinvyu justinvyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. A few suggestions:

python/ray/tune/impl/placeholder.py Outdated Show resolved Hide resolved
Comment on lines +206 to +203
elif key < len(config):
return _get_placeholder(
config[key], prefix=prefix + (path[0],), path=path[1:]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we move this regular tuple case up to the first condition?

Something like:

if is_placeholder(config):
    return prefix, config

if list, dict, or tuple:
    recurse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Represents an unchosen value. Just skip.
continue

for resolver in resolvers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make resolvers a hash map, where key is resolver.hash? Currently we have linear search with respect to the search space size, which can be huge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the list of options shouldn't be long man. this is probably fine, and looks simpler.
thanks.

@@ -526,21 +542,21 @@ def get_sampler(self):
def sample(
self,
domain: Domain,
spec: Optional[Union[List[Dict], Dict]] = None,
config: Optional[Union[List[Dict], Dict]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type correct? Should just be Dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does work for list of dicts too though. doesn't have to be a single dict.

Jun Gong and others added 9 commits February 6, 2023 15:50
Signed-off-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Jun Gong and others added 15 commits February 6, 2023 15:50
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
fix
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
@gjoliver gjoliver force-pushed the resolve_by_references branch from d5bc52e to 247d5f0 Compare February 6, 2023 23:50
ci
Signed-off-by: Jun Gong <jungong@anyscale.com>
@gjoliver gjoliver merged commit befad81 into ray-project:master Feb 8, 2023
krfricke pushed a commit that referenced this pull request Feb 14, 2023
This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to #31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
@gjoliver gjoliver deleted the resolve_by_references branch February 17, 2023 06:50
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ay-project#31927)

Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…t#32317)

This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to ray-project#31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…t#32317)

This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to ray-project#31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: elliottower <elliot@elliottower.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants