-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tune] let categorical values return indices that get resolved in a separate step #31927
Conversation
a3f95ef
to
073905c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this is currently specific to the basic variant generator (which might be ok), but I have a slightly different approach to discuss:
Currently:
- Categorical samples indices instead of values
- Categorical resolves values in a separate API call
Why?
- Because we actually pre-generate all samples in the
_VariantIterator
Problems:
- This is not easily generalizable to other searchers
- It's not very easy to overwrite the configs for existing trials, as the Trial.config objects will still have the actual object in them and not the index.
Instead, maybe:
- Scan parameter space for any Categoricals. Replace Categorical.categories with a same-shape list of
_Placeholder<path, index>
objects - _Placeholder objects could include the string representation of the object (e.g. for the trial table)
- We can choose to only replace objects of types we care about (e.g. Datasets, object refs), and not for primitives
- Create a map of placeholders to "real" categoricals, e.g.
placeholder_to_obj: Dict[_Placeholder, Any]
- Remember which keys had categoricals, e.g.
categorical_keys: Set[Tuple[str, ...]
- After sampling, replace every sampled placeholder object with the respective object from
placeholder_to_obj
- On restore, we only need to update
placeholder_to_obj
Benefits:
- This won't need any adjustment to the sampling of Categoricals, and it can be wrapped around any searcher.suggest() call. So it generealizes to every searcher.
- Also we can just use the placeholders to update Trial.config objects post restore
For functions, we could either build the replacement map on the fly or just not support it for restoration. I think not supporting it is actually ok.
What do you think?
073905c
to
af13931
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, only a few suggestions.
@@ -20,6 +20,7 @@ | |||
from ray.util import get_node_ip_address | |||
from ray.tune import TuneError | |||
from ray.tune.callback import CallbackList, Callback | |||
from ray.tune.search.placeholder import resolve_placeholders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make this a Tune internal package? Users cannot interact with this at all.
|
||
Args: | ||
spec: The spec to replace references in. | ||
replacements: A dict from path to replaced objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Would it be nicer to create this on the fly and return it so we don't need to pass an empty one in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it a "global container" that gets passed everywhere and collecting useful bits.
creating this on the fly and return it would mean that we will be merging a bunch of things in our code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, even better than I imagined! Couple of nits
Addressed all the comments. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, One last question I have is regarding to tune.run_experiments
. Do I see it correctly that in that case we'll just not use any resolution and use the old way (because placeholder_resolvers is unset)?
Another quick question about the spec vs. trial.config - I'd prefer continue using trial.config if possible
resolve_placeholders(trial.config, self._replaced_ref_map) | ||
if self._placeholder_resolvers: | ||
# Construct the full experiment spec for resolution. | ||
spec = self._spec or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we updating the whole spec and not just the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I guess, which other elements from the spec do we need? Seems like we're not using the rest of the spec here. If that's the case can we just go back to resolving trial.config
- we're overwriting the spec["config"]
with it anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to use spec either man. but Function taking spec is our public API ... the following is in our documentation.
"beta": tune.sample_from(lambda spec: spec.config.alpha * np.random.normal()),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're almost there. I think a few tests are failing right now
@@ -417,6 +398,41 @@ def __init__( | |||
self._state_json = None | |||
self._state_valid = False | |||
|
|||
def create_placement_group_factory(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should call this implicitly when Trial.placement_group_factory
. But also ok to keep this for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. added a TODO for now.
if there are tests failing because of this, I will turn it into a getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class
self.config = config or {} | ||
# Save a copy of the original unresolved config so that we can swap | ||
# out and update any reference config values after restoration. | ||
self.__unresolved_config = self.config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC, why double underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very private. nobody should touch or have access to this variable outside of Trial. under the hood, python replace this variable with a name classname__variablename__.
"test": tune.grid_search([1, 2, 3]), | ||
"test2": tune.grid_search([1, 2, 3]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit for the test, can we use different parameter ranges for the different parameters? E.g.
"test": tune.grid_search([1, 2, 3]), | |
"test2": tune.grid_search([1, 2, 3]), | |
"test": tune.grid_search([1, 2, 3]), | |
"test2": tune.grid_search([4, 5, 6, 7]), |
and overwrite with different values as well. Basically to make sure that we we don't conflate parameter overwrites
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
I think you just have to update the tune/BUILD
file ad fix one param space error in TunerInternal.
Feel free to merge when CI passes.
Thanks!
3504276
to
584a88f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. Thank you so much!
@@ -417,6 +398,41 @@ def __init__( | |||
self._state_json = None | |||
self._state_valid = False | |||
|
|||
def create_placement_group_factory(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few suggestions:
elif key < len(config): | ||
return _get_placeholder( | ||
config[key], prefix=prefix + (path[0],), path=path[1:] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we move this regular tuple case up to the first condition?
Something like:
if is_placeholder(config):
return prefix, config
if list, dict, or tuple:
recurse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Represents an unchosen value. Just skip. | ||
continue | ||
|
||
for resolver in resolvers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make resolvers
a hash map, where key is resolver.hash
? Currently we have linear search with respect to the search space size, which can be huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list of options shouldn't be long man. this is probably fine, and looks simpler.
thanks.
@@ -526,21 +542,21 @@ def get_sampler(self): | |||
def sample( | |||
self, | |||
domain: Domain, | |||
spec: Optional[Union[List[Dict], Dict]] = None, | |||
config: Optional[Union[List[Dict], Dict]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this type correct? Should just be Dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does work for list of dicts too though. doesn't have to be a single dict.
Signed-off-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Signed-off-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com> Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
d5bc52e
to
247d5f0
Compare
This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run. This is a follow-up to #31927 Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
…ay-project#31927) Signed-off-by: Jun Gong <gongjunoliver@hotmail.com> Co-authored-by: Justin Yu <justinvyu@anyscale.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…t#32317) This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run. This is a follow-up to ray-project#31927 Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
…t#32317) This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run. This is a follow-up to ray-project#31927 Signed-off-by: Justin Yu <justinvyu@berkeley.edu> Signed-off-by: elliottower <elliot@elliottower.com>
Signed-off-by: Jun Gong jungong@anyscale.com
Why are these changes needed?
This is so we can replace the reference table after trial restoration.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.