New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Tune] let categorical values return indices that get resolved in a separate step #31927

Merged

gjoliver merged 34 commits into ray-project:master from gjoliver:resolve_by_references

Feb 8, 2023

Member

gjoliver commented Jan 25, 2023

Signed-off-by: Jun Gong jungong@anyscale.com

Why are these changes needed?

This is so we can replace the reference table after trial restoration.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- [*] Unit tests
- Release tests
- This PR is not tested :(

gjoliver requested review from krfricke and justinvyu

January 25, 2023 17:13

gjoliver assigned justinvyu and krfricke

gjoliver force-pushed the resolve_by_references branch 2 times, most recently from a3f95ef to 073905c Compare

January 25, 2023 17:22

justinvyu reviewed

View reviewed changes

python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved

python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved

python/ray/tune/search/basic_variant.py Outdated Show resolved Hide resolved

python/ray/tune/search/sample.py Outdated Show resolved Hide resolved

python/ray/tune/tests/test_basic_variant.py Outdated Show resolved Hide resolved

krfricke reviewed

View reviewed changes

Contributor

krfricke left a comment

Hm, this is currently specific to the basic variant generator (which might be ok), but I have a slightly different approach to discuss:

Currently:

Categorical samples indices instead of values
Categorical resolves values in a separate API call

Why?

Because we actually pre-generate all samples in the _VariantIterator

Problems:

This is not easily generalizable to other searchers
It's not very easy to overwrite the configs for existing trials, as the Trial.config objects will still have the actual object in them and not the index.

Instead, maybe:

Scan parameter space for any Categoricals. Replace Categorical.categories with a same-shape list of _Placeholder<path, index> objects
_Placeholder objects could include the string representation of the object (e.g. for the trial table)
We can choose to only replace objects of types we care about (e.g. Datasets, object refs), and not for primitives
Create a map of placeholders to "real" categoricals, e.g. placeholder_to_obj: Dict[_Placeholder, Any]
Remember which keys had categoricals, e.g. categorical_keys: Set[Tuple[str, ...]
After sampling, replace every sampled placeholder object with the respective object from placeholder_to_obj
On restore, we only need to update placeholder_to_obj

Benefits:

This won't need any adjustment to the sampling of Categoricals, and it can be wrapped around any searcher.suggest() call. So it generealizes to every searcher.
Also we can just use the placeholders to update Trial.config objects post restore

For functions, we could either build the replacement map on the fly or just not support it for restoration. I think not supporting it is actually ok.

What do you think?

python/ray/tune/impl/tuner_internal.py Outdated Show resolved Hide resolved

python/ray/tune/search/sample.py Outdated Show resolved Hide resolved

python/ray/tune/search/sample.py Outdated Show resolved Hide resolved

python/ray/tune/search/sample.py Outdated Show resolved Hide resolved

gjoliver force-pushed the resolve_by_references branch from 073905c to af13931 Compare

January 29, 2023 22:55

justinvyu reviewed

View reviewed changes

Contributor

justinvyu left a comment

Looks good to me, only a few suggestions.

python/ray/tune/execution/trial_runner.py Outdated Show resolved Hide resolved

python/ray/tune/execution/trial_runner.py Outdated

    
            @@ -20,6 +20,7 @@
          
              from ray.util import get_node_ip_address

              from ray.tune import TuneError

              from ray.tune.callback import CallbackList, Callback

              from ray.tune.search.placeholder import resolve_placeholders

Contributor

justinvyu Jan 30, 2023

Should we make this a Tune internal package? Users cannot interact with this at all.

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/execution/trial_runner.py Outdated Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated

    
                  Args:

                      spec: The spec to replace references in.

                      replacements: A dict from path to replaced objects.

Contributor

justinvyu Jan 30, 2023

Nit: Would it be nicer to create this on the fly and return it so we don't need to pass an empty one in?

Member Author

gjoliver Jan 30, 2023

I think we should keep it a "global container" that gets passed everywhere and collecting useful bits.
creating this on the fly and return it would mean that we will be merging a bunch of things in our code?

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/tests/test_trial_runner_3.py Outdated Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/tests/test_trial_runner_3.py Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

krfricke reviewed

View reviewed changes

Contributor

krfricke left a comment

Awesome, even better than I imagined! Couple of nits

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/execution/trial_runner.py Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/search/placeholder.py Outdated Show resolved Hide resolved

Member Author

gjoliver commented Jan 31, 2023

Addressed all the comments. PTAL.

justinvyu approved these changes

View reviewed changes

Contributor

justinvyu left a comment

Looks good to me!

krfricke reviewed

View reviewed changes

Contributor

krfricke left a comment

Looks good to me, One last question I have is regarding to tune.run_experiments. Do I see it correctly that in that case we'll just not use any resolution and use the old way (because placeholder_resolvers is unset)?

Another quick question about the spec vs. trial.config - I'd prefer continue using trial.config if possible

python/ray/tune/execution/trial_runner.py Outdated

    
                          resolve_placeholders(trial.config, self._replaced_ref_map)

                      if self._placeholder_resolvers:

                          # Construct the full experiment spec for resolution.

                          spec = self._spec or {}

Contributor

krfricke Jan 31, 2023

Why are we updating the whole spec and not just the config?

Contributor

krfricke Jan 31, 2023

Or I guess, which other elements from the spec do we need? Seems like we're not using the rest of the spec here. If that's the case can we just go back to resolving trial.config - we're overwriting the spec["config"] with it anyways.

Member Author

gjoliver Jan 31, 2023

I don't want to use spec either man. but Function taking spec is our public API ... the following is in our documentation.

"beta": tune.sample_from(lambda spec: spec.config.alpha * np.random.normal()),

krfricke reviewed

View reviewed changes

Contributor

krfricke left a comment

We're almost there. I think a few tests are failing right now

python/ray/tune/experiment/trial.py

    
            @@ -417,6 +398,41 @@ def __init__(
          
                      self._state_json = None

                      self._state_valid = False

                  def create_placement_group_factory(self):

Contributor

krfricke Feb 1, 2023

I'm wondering if we should call this implicitly when Trial.placement_group_factory. But also ok to keep this for now

Member Author

gjoliver Feb 1, 2023

good idea. added a TODO for now.
if there are tests failing because of this, I will turn it into a getter.

Contributor

krfricke Feb 6, 2023

I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class

python/ray/tune/experiment/trial.py

    
                      self.config = config or {}

                      # Save a copy of the original unresolved config so that we can swap

                      # out and update any reference config values after restoration.

                      self.__unresolved_config = self.config

Contributor

krfricke Feb 1, 2023

OOC, why double underscore?

Member Author

gjoliver Feb 1, 2023

very private. nobody should touch or have access to this variable outside of Trial. under the hood, python replace this variable with a name classname__variablename__.

python/ray/tune/tests/test_tune_restore.py Outdated

Comment on lines 325 to 326

    
                              "test": tune.grid_search([1, 2, 3]),

                              "test2": tune.grid_search([1, 2, 3]),

Contributor

krfricke Feb 1, 2023

Nit for the test, can we use different parameter ranges for the different parameters? E.g.

Suggested change

      
                            "test": tune.grid_search([1, 2, 3]),
          
                            "test2": tune.grid_search([1, 2, 3]),
          
                            "test": tune.grid_search([1, 2, 3]),
          
                            "test2": tune.grid_search([4, 5, 6, 7]),

and overwrite with different values as well. Basically to make sure that we we don't conflate parameter overwrites

Member Author

gjoliver Feb 1, 2023

done

Contributor

justinvyu commented Feb 1, 2023

I think here and here can be removed. It was added before to update trial resources that were updated on restore. But now, create_placement_group_factory gets called on add trial which will do the same thing.

Member Author

gjoliver commented Feb 1, 2023

I think here and here can be removed. It was added before to update trial resources that were updated on restore. But now, create_placement_group_factory gets called on add trial which will do the same thing.

ok, let me remove these.

krfricke approved these changes

View reviewed changes

Contributor

krfricke left a comment

This looks good to me.

I think you just have to update the tune/BUILD file ad fix one param space error in TunerInternal.

Feel free to merge when CI passes.

Thanks!

gjoliver force-pushed the resolve_by_references branch from 3504276 to 584a88f Compare

February 1, 2023 11:52

krfricke approved these changes

View reviewed changes

Contributor

krfricke left a comment

Love it. Thank you so much!

python/ray/tune/tests/test_ray_trial_executor.py Outdated Show resolved Hide resolved

python/ray/tune/experiment/trial.py

    
            @@ -417,6 +398,41 @@ def __init__(
          
                      self._state_json = None

                      self._state_valid = False

                  def create_placement_group_factory(self):

Contributor

krfricke Feb 6, 2023

I think we should do this in any case (turn into getter and create PGF on first call). Otherwise it increases the complexity of the Trial class

justinvyu approved these changes

View reviewed changes

Contributor

justinvyu left a comment

Looks good. A few suggestions:

python/ray/tune/impl/placeholder.py Outdated Show resolved Hide resolved

python/ray/tune/impl/placeholder.py

Comment on lines +206 to +203

    
                      elif key < len(config):

                          return _get_placeholder(

                              config[key], prefix=prefix + (path[0],), path=path[1:]

                          )

Contributor

justinvyu Feb 6, 2023

Nit: Can we move this regular tuple case up to the first condition?

Something like:

if is_placeholder(config):
    return prefix, config

if list, dict, or tuple:
    recurse

Member Author

gjoliver Feb 6, 2023

done

python/ray/tune/impl/placeholder.py

    
                              # Represents an unchosen value. Just skip.

                              continue

                          for resolver in resolvers:

Contributor

justinvyu Feb 6, 2023

Can we just make resolvers a hash map, where key is resolver.hash? Currently we have linear search with respect to the search space size, which can be huge.

Member Author

gjoliver Feb 6, 2023

I think the list of options shouldn't be long man. this is probably fine, and looks simpler.
thanks.

python/ray/tune/search/sample.py

    
            @@ -526,21 +542,21 @@ def get_sampler(self):
          
                  def sample(

                      self,

                      domain: Domain,

                      spec: Optional[Union[List[Dict], Dict]] = None,

                      config: Optional[Union[List[Dict], Dict]] = None,

Contributor

justinvyu Feb 6, 2023

Is this type correct? Should just be Dict?

Member Author

gjoliver Feb 6, 2023

it does work for list of dicts too though. doesn't have to be a single dict.

Jun Gong and others added 9 commits

February 6, 2023 15:50


          [Tune] Replace reference values in a config dict with placeholders.

10e6e68

Signed-off-by: Jun Gong <jungong@anyscale.com>


          Update python/ray/tune/search/placeholder.py

8352fdf

Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          Update python/ray/tune/tests/test_trial_runner_3.py

3ae088a

Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          Update python/ray/tune/search/placeholder.py

d08d245

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          Update python/ray/tune/search/placeholder.py

00dc4b1

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          Update python/ray/tune/execution/trial_runner.py

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          address review comments

4c7c068

Signed-off-by: Jun Gong <jungong@anyscale.com>


          move to python/ray/tune/impl/

7123c59

Signed-off-by: Jun Gong <jungong@anyscale.com>


          lint

df03fe9

Signed-off-by: Jun Gong <jungong@anyscale.com>

Jun Gong and others added 15 commits

February 6, 2023 15:50


          fixes

5469d1a

Signed-off-by: Jun Gong <jungong@anyscale.com>


          only replace obj refs

2f59ec4

Signed-off-by: Jun Gong <jungong@anyscale.com>

fix

2d32989

Signed-off-by: Jun Gong <jungong@anyscale.com>


          allow disable placehold replacement

43a216d

Signed-off-by: Jun Gong <jungong@anyscale.com>


          do not cache placement_group_factor

0ef9368

Signed-off-by: Jun Gong <jungong@anyscale.com>


          fix placement_group_creation

9752ba6

Signed-off-by: Jun Gong <jungong@anyscale.com>


          get rid of local_mode

0a9dd68

Signed-off-by: Jun Gong <jungong@anyscale.com>


          handle simple nested search spaces, and fix everything

cc4eefd

Signed-off-by: Jun Gong <jungong@anyscale.com>


          lint

f580276

Signed-off-by: Jun Gong <jungong@anyscale.com>


          really handle nested or complex Categorical options

cad449d


          lint

eab008e

Signed-off-by: Jun Gong <jungong@anyscale.com>


          minor fix

0dfba7e

Signed-off-by: Jun Gong <jungong@anyscale.com>


          Update python/ray/tune/tests/test_ray_trial_executor.py

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          Update python/ray/tune/impl/placeholder.py

373fff8

Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>


          minor update

247d5f0

gjoliver force-pushed the resolve_by_references branch from d5bc52e to 247d5f0 Compare

February 6, 2023 23:50

ci

52935d2

Signed-off-by: Jun Gong <jungong@anyscale.com>

gjoliver merged commit befad81 into ray-project:master

justinvyu mentioned this pull request

[Tune] Allow re-specifying param space in Tuner.restore #32317

Merged

7 tasks

krfricke pushed a commit that referenced this pull request


          [Tune] Allow re-specifying param space in Tuner.restore (#32317)

bc6b40a

This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to #31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>

gjoliver deleted the resolve_by_references branch

February 17, 2023 06:50

edoakes pushed a commit to edoakes/ray that referenced this pull request


          [Tune] Replace reference values in a config dict with placeholders (r…

73026d0

…ay-project#31927)

Signed-off-by: Jun Gong <gongjunoliver@hotmail.com>
Co-authored-by: Justin Yu <justinvyu@anyscale.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

edoakes pushed a commit to edoakes/ray that referenced this pull request


          [Tune] Allow re-specifying param space in Tuner.restore (ray-projec…

cd4f620

…t#32317)

This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to ray-project#31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

elliottower pushed a commit to elliottower/ray that referenced this pull request


          [Tune] Allow re-specifying param space in Tuner.restore (ray-projec…

a85322c

…t#32317)

This PR adds a `Tuner.restore(param_space=...)` argument. This allows object refs to be updated if used in the original run.

This is a follow-up to ray-project#31927

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
Signed-off-by: elliottower <elliot@elliottower.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet