Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement #1389 or similar functionality #1939

Open
somedadaism opened this issue Dec 27, 2021 · 25 comments
Open

Implement #1389 or similar functionality #1939

somedadaism opened this issue Dec 27, 2021 · 25 comments
Labels
Milestone

Comments

@somedadaism
Copy link

somedadaism commented Dec 27, 2021

🚀 Feature Request

I would like to ask if it is possible to implement the functionality asked for in #1389

Motivation

I believe the feature is extremely useful for ML applications. To be able to have list composition in the defaults list the way we already have dict composition would allow to be less redundant in the configuration by reusing config files more.

Pitch

I know that #1389 had been closed as invalid but it seems that in omegaconf 2.1 the necessary prerequisites maybe have been implemented since the issue had been closed.

OmegaConf 2.1 will support more powerful custom resolvers. It's possible that they will enable generating as_list automatically from dict. (cc @odelalleau).

Not in my current implementation because resolvers systematically wrap their output in a ValueNode. However I suspect that if we used _node_wrap() instead then it should be possible. Not going to change it now though -- I'd rather keep it for later.

I think this should be possible now, with oc.dict.keys, see https://omegaconf.readthedocs.io/en/2.1_branch/custom_resolvers.html#oc-dict-keys-value

This could maybe be used to allow for list composition in the defaults list in the way originally asked for instead of using os.dict.values in a hacky fashion, like proposed e.g. by Max Ehrlich here:
https://stackoverflow.com/questions/64802586/how-to-gather-config-files-in-a-list-with-hydra-fb

Are you willing to open a pull request?

Maybe if I can get sufficient guidance I would try.

@somedadaism somedadaism added the enhancement Enhanvement request label Dec 27, 2021
@Jasha10
Copy link
Collaborator

Jasha10 commented Dec 29, 2021

Hi @somedadaism,
Thank you for your interest in Hydra and for taking the time to open this issue!

This could maybe be used to allow for list composition in the defaults list in the way originally asked for...

The way originally asked for proposes that, with the following config, the db group would produce a ListConfig object containing each of the mysql and postgresql input configs:

defaults:
  - base_config
  - db: 
    - mysql
    - postgresql

This will not be possible because there is already a different behavior implemented for nested lists within a defaults list (such as the nested list - db: [mysql, postgresql]): the mysql and postgresql configs are merged, and the result of the merge is placed in the db package. See the Hydra docs on Selecting multiple configs from a Config Group.

Admittedly I have not yet had a chance to look closely at later discussion in comments on the linked issue #1389.

@somedadaism somedadaism changed the title Implement #1389 Implement #1389 or similar functionality Dec 30, 2021
@somedadaism
Copy link
Author

somedadaism commented Dec 30, 2021

Hi @Jasha10 thanks for your quick reply.
I would like to clarify the functionality I search for. Please consider first the following (simplified) situation:

.
├── hydra_config
│   ├── callbacks
│   │   ├── my_callback_1.yaml
│   │   └── my_callback_2.yaml
│   └── config.yaml
└── main.py

With:

# main.py
import hydra

@hydra.main(config_path="hydra_config", config_name="config_new.yaml")
def main(config):
    print(config)

if __name__ == "__main__":
    main()
# config.yaml
defaults:
  - callbacks:
    - my_callback_1
    - my_callback_2

What currently works is:

# my_callback_1.yaml
my_callback_1:
  my_param1: 1
# my_callback_2.yaml
my_callback_2:
  my_param2: 2

resulting in:

{'callbacks': {'my_callback_1': {'my_param1': 1}, 'my_callback_2': {'my_param2': 2}}}

What I want to create as output is:

{'callbacks':  [{'my_param1': 1}, {'my_param2': 2}]}

Now I understand that:

# my_callback_1.yaml
my_param1: 1
# my_callback_2.yaml
my_param2: 2

Can not work as it (correctly!) results in:

{'callbacks': {'my_param1': 1, 'my_param2': 2}}

which is what I believe you also pointed out.
What is not understandable to me is that for an example:

# my_callback_1.yaml
- my_param1: 1
# my_callback_2.yaml
- my_param2: 2

will result in:

{'callbacks': [{'my_param2': 2}]}

and not in

{'callbacks':  [{'my_param1': 1}, {'my_param2': 2}]}

I believe the ability to merge configs into lists is very important to avoid repeating oneself for ML applications of hydra, but there seems to be no way to merge them.
(Imagine using the same 5 callbacks all the time in a list but being forced to type them out repeatedly for different experimental model runs - if you change one thing you have to change 5 config files if you are lucky enough to remember them all.).

@omry
Copy link
Collaborator

omry commented Dec 30, 2021

Take a look at oc.dict.values, it may be what you are looking for.
Basically, you can let Hydra compose a dict, and have a key that dynamically transform it to a list.

@somedadaism
Copy link
Author

somedadaism commented Jan 1, 2022

Hello @omry thank you for creating hydra and for taking a look. :)
I am very aware of oc.dict.values actually the answer provided by you was also proposed on StackOverflow and was linked by me in the original question.
Sadly it is specifically what I don't want to do.
My problems with this approach are the following:

  1. It's not straight forward to read - it just feels hacky.
  2. The approach forces us to create a dictionary in first place before obtaining a list. This means the dictionary will also be part of the configuration. If now I just want to use hydra.utils.instantiate on my entire config it will create all objects created vie _target_ twice, once for the dict and then for the list. For some objects, e.g. some of my code for W&B this is a problem.
    I know multiple ways to hack around this but I would rather invest time to get a cleaner solution in hydra itself.
  3. I feel the current defaults feel inconsistent. When selecting multiple configs we replace the respective line in the defaults with the content of the referenced file and then merge if necessary. If this would apply for configs with lists too in the situation:
# my_callback_1.yaml
- my_param1: 1
# my_callback_2.yaml
- my_param2: 2

Then:

# config.yaml
defaults:
  - callbacks:
    - my_callback_1
    - my_callback_2

Would be text substituted to:

callbacks:
  - my_param1: 1
  - my_param2: 2

Just as in the situation:

# my_callback_1.yaml
my_callback_1:
  my_param1: 1
# my_callback_2.yaml
my_callback_2:
  my_param2: 2

The defaults list in:

# config.yaml
defaults:
  - callbacks:
    - my_callback_1
    - my_callback_2

are text substituted to:

callbacks:
  my_callback_1:
    my_param1: 1
  my_callback_2:
    my_param2: 2

I know what I ask for is not how it works but I believe it would be the more natural and consistent way.
I have discussed on this with multiple of my coworkers and the ability to merge lists in the defaults list in a more natural way is missed by us very much. Is there really no less hacky way than oc.dict.values?
How about introducing a new keyword like we already have override, e.g.:

# my_callback_1.yaml
my_param1: 1
# my_callback_2.yaml
my_param2: 2

Then:

# config.yaml
defaults:
  - list_merge callbacks:  # list_merge or similar could be a keyword like override
    - my_callback_1
    - my_callback_2

Would be text substituted to:

callbacks:
  - my_param1: 1
  - my_param2: 2

@somedadaism
Copy link
Author

@omry @Jasha10 Any updates on this?

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 5, 2022

The current behavior is probably due to the implementation of OmegaConf.merge:

>>> from omegaconf import OmegaConf
>>> OmegaConf.merge([1,2], [3,4])
[3, 4]
>>> OmegaConf.merge({"a": [1,2], "b": [3,4]}, {"b": [5,6], "c": [7,8]})
{'a': [1, 2], 'b': [5, 6], 'c': [7, 8]}

mappings on the left are updated by mappings on the right; sequences on the left are overwritten by sequences on the right.

I agree with you that, in the specific case of "Selecting multiple configs from a Config Group" in the defaults list, it would be more useful to concatenate ListConfigs instead of overwriting them (as you have outlined above). The proposed change would technically be breaking, but I doubt anyone is relying on the old behavior.

This being said, I'm not sure that special-casing the "selecting multiple configs from a config group" behavior is the most general solution to the use-case you've highlighted. It still feels like a hack that my_callback_1.yaml has to be written with a leading dash:

# my_callback_1.yaml
- my_param1: 1

Painfully, we are prevented from using a defaults list in my_callback_1.yaml if the file is written as a yaml sequence (as above).

You are trying to achieve the following result:
{'callbacks': [{'my_param1': 1}, {'my_param2': 2}]}.
Inspired by #1547, I propose the following:

# config.yaml
defaults:
  - append callbacks: my_callback_1
  - append callbacks: my_callback_2

Here an "append" keyword would signify that the callbacks package should be a ListConfig, and that the selected my_callback_{1,2} config should be appended to that list. These configs my_callback_1.yaml and my_callback_2.yaml could then be written as yaml mappings, possibly having defaults lists of their own:

# my_callback_1.yaml
defaults: ...  # possible defaults list
my_param1: 1  # no leading dash

Edit: I should mention that I don't personally have the bandwidth to work on implementing such a feature for the next Hydra release.

@somedadaism
Copy link
Author

I think the proposed syntax is very good.
I would take a look at contributing but only have limited time.
Where could I get hints on how to achieve this?

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 11, 2022

Unfortunately, I fear that the implementation of this feature might not be straight-forward; it will require adding a new keyword to the ConfigRepository.Keywords class and modifying the logic in defaults_list.py.

@Jasha10
Copy link
Collaborator

Jasha10 commented Jan 11, 2022

Cross-link to #1547 which is related.

@Jasha10 Jasha10 added this to the Hydra 1.3.0 milestone Jan 13, 2022
@tangbinh
Copy link

tangbinh commented Feb 9, 2022

@Jasha10 @jieru-hu I agree with @somedadaism that this feature is very much needed for ML applications. We often pass a list of callbacks to a trainer, and being able to compose such a list via the command line is necessary. Given the huge number of combinations one can derive from a list (e.g. for 5 callbacks, there are 32 combinations), it's impossible to have a YAML for each combination.

@Jasha10 If it's not easy to implement this feature, is there an acceptable workaround in the meantime? For example, I would like to specify a default list of callbacks in a primary config and couldn't figure out a way to do it yet.

# cb1.yaml
_target_: torch.nn.Identity
arg1: 1
# cb2.yaml
_target_: torch.nn.Identity
arg2: 2
# config.yaml
defaults:
  - callbacks:
    # what to put here?

The desired output looks like

callbacks:
- _target_: torch.nn.Identity
  arg1: 1
- _target_: torch.nn.Identity
  arg2: 2

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 10, 2022

@tangbinh, see Omry's comment above.

Basically, you should use the defaults list to compose a dictionary whose values are the callbacks you want. You then can use the oc.dict.values resolver to get a list of callbacks.

$ tree
├── callbacks
│   ├── cb1.yaml
│   └── cb2.yaml
├── config.yaml
└── my_app.py
# config.yaml
defaults:
  - callbacks@_callback_dict.cb1: cb1
  - callbacks@_callback_dict.cb2: cb2
  - _self_

callbacks: ${oc.dict.values:_callback_dict}
# my_app.py
import hydra
from omegaconf import OmegaConf

@hydra.main(config_path=".", config_name="config")
def app(cfg):
    OmegaConf.resolve(cfg)
    del cfg._callback_dict
    print(OmegaConf.to_yaml(cfg))

if __name__ == "__main__":
    app()
$ python my_app.py
callbacks:
- _target_: torch.nn.Identity
  arg1: 1
- _target_: torch.nn.Identity
  arg2: 2

@tangbinh
Copy link

@Jasha10 Thank you for working out the example. It's certainly helpful, although I think we all agree that invoking @_callback_dict and ${oc.dict.values:_callback_dict} reduces readability significantly. Can you also work out how I can append another callback to the list via the command line?

If possible, would we be able to prioritize this list composition feature a bit? I believe ML users would run into this issue quite often.

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 12, 2022

If possible, would we be able to prioritize this list composition feature a bit?

Thanks for the nudge :)
I agree completely that improved support for list composition should be a high-priority feature. For the record, I suspect that improvements here will require a significant amount of effort.

Can you also work out how I can append another callback to the list via the command line?

Here's the trick:

$ python my_app.py +callbacks@_callback_dict.cb3=cb3
callbacks:
- _target_: torch.nn.Identity
  arg1: 1
- _target_: torch.nn.Identity
  arg1: 2
- _target_: torch.nn.Identity
  arg1: 3

The plus symbol is for appending to the defaults list.
You can also remove items from the defaults list using the tilde symbol:

$ python my_app.py '~callbacks@_callback_dict.cb1'
callbacks:
- _target_: torch.nn.Identity
  arg1: 2

@Yevgnen
Copy link

Yevgnen commented Feb 17, 2022

@Jasha10

# config.yaml
defaults:
  - callbacks@_callback_dict.cb1: cb1
  - callbacks@_callback_dict.cb2: cb2
  - _self_

callbacks: ${oc.dict.values:_callback_dict}

Hi, in this example, is there a way to avoid writing callbacks@_callback_dict.cb{x}: cb{x} for every {x} so that we can include all items under callback/ at once?

@Jasha10
Copy link
Collaborator

Jasha10 commented Feb 17, 2022

Hi @Yevgnen,
No, there is currently no way to do that.

@jzazo
Copy link

jzazo commented Feb 17, 2022

Thanks for bringing this discussion back, it has been quite useful. I like the append approach for concatenating items into a list, and I think it would work perfectly for ml needs.

I want to point out that if this new functionality is added, it may also be natural to add extend for lists, rather than merge, possibly in omegaconf and hydra? Not sure how this would turn up, just wanted to ask if it would make sense. Something like

defaults:
  - extend callbacks: callbacks_list1
  - extend callbacks: callbacks_list2

Not sure this is great, I personally feel the append mechanism would be more useful, but this may generalize some use cases.

@ssnl
Copy link

ssnl commented May 23, 2022

The workaround here doesn't work nicely with structured config. Unless the config also have a _callback_dict entry, OmegaConf will complain in merging. :(

facebook-github-bot pushed a commit to facebookresearch/FLSim that referenced this issue Jul 6, 2022
Summary:
Currently, flsim cannot handle a list of configs
Eg: a json config like the following is not supported.
```
{
  "trainer": {
    "clients": [
      {"_base_": "base_client", "optimizer": {"lr": 0.1}},
      {"_base_": "base_client", "optimizer": {"lr": 0.2}},
    ]
  }
```

The one hiccup in supporting this is hydra still doesn't support overriding/appending lists when values are configs. (see facebookresearch/hydra#1939 (comment))

In order to overcome the above, we treat the list as a dictionary with the key as list index.

The above config in yaml format will look as follows:
```
trainer:
  ...
  clients:
    '0':
      _target_: flsim.clients.base_client.Client
      _recursive_: false
      epochs: 1
      optimizer:
        _target_: ???
        _recursive_: false
        lr: 0.1
        momentum: 0.0
        weight_decay: 0.0
      lr_scheduler:
        _target_: ???
        _recursive_: false
        base_lr: 0.001
      max_clip_norm_normalized: null
      only_federated_params: true
      random_seed: null
      shuffle_batch_order: false
      store_models_and_optimizers: false
      track_multiple_selection: false
    '1':
      _target_: flsim.clients.base_client.Client
      _recursive_: false
      epochs: 1
      optimizer:
        _target_: ???
        _recursive_: false
        lr: 0.1
        momentum: 0.0
        weight_decay: 0.0
      lr_scheduler:
        _target_: ???
        _recursive_: false
        base_lr: 0.001
      max_clip_norm_normalized: null
      only_federated_params: true
      random_seed: null
      shuffle_batch_order: false
      store_models_and_optimizers: false
      track_multiple_selection: false
  ...
```

Reviewed By: Anonymani

Differential Revision: D37638999

fbshipit-source-id: 5444da742742d4cc976875a6dc055a0c71c186e4
@TesnimHadhri
Copy link

The workaround here doesn't work nicely with structured config. Unless the config also have a _callback_dict entry, OmegaConf will complain in merging. :(

Do you mind explaining the workaround you used with structured config?

@ssnl
Copy link

ssnl commented Jul 28, 2022

@TesnimHadhri I eventually did not use it, but I remember having a _callback_dict field with dictionary type helped. but YMMV

@TesnimHadhri
Copy link

@TesnimHadhri I eventually did not use it, but I remember having a _callback_dict field with dictionary type helped. but YMMV

I see, thanks for the quick answer!

@libokj
Copy link

libokj commented Mar 28, 2023

@Jasha10 Hi, is it possible for you to provide a brief update on the status of the implementation of this feature? Should we look into it as you commented earlier?

Unfortunately, I fear that the implementation of this feature might not be straight-forward; it will require adding a new keyword to the ConfigRepository.Keywords class and modifying the logic in defaults_list.py.

@anton-bushuiev
Copy link

Note for the ones still seeking for a better solution: Lightning-Hydra-Template implements another workaround which is however still based on a dictionary with redundant keys. The difference between the solution suggested by @Jasha10 above is that the dict is parsed internally by a custom instantiator rather than relying on oc.dict.values which increases readability significantly but loses succinctness.

@libokj
Copy link

libokj commented Aug 30, 2023

Note for the ones still seeking for a better solution: Lightning-Hydra-Template implements another workaround which is however still based on a dictionary with redundant keys. The difference between the solution suggested by @Jasha10 above is that the dict is parsed internally by a custom instantiator rather than relying on oc.dict.values which increases readability significantly but loses succinctness.

That's the workaround I am resorting to as well. Still hope this feature can be implemented.

@jkarns275
Copy link

Bumping this because I've ran into this on two completely unrelated projects now :(

@alex-hh
Copy link

alex-hh commented Aug 26, 2024

I also would find this extremely useful: anywhere a functional style is used it is useful to specify an ordered list of items.

As well as callbacks, pipelines of preprocessing transformations seem like a natural use case.

Do the current workarounds preserve the order of the composed items? This is critical in such cases but I'm unclear whether the fact that dictionaries are involved in intermediate steps means that the order is not preserved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests