Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Better model config management #660

Closed
erogol opened this issue Feb 19, 2021 · 7 comments
Closed

[Discussion] Better model config management #660

erogol opened this issue Feb 19, 2021 · 7 comments

Comments

@erogol
Copy link
Contributor

erogol commented Feb 19, 2021

Hi All!!

I guess one of the biggest issues in TTS is the way we handle the configs for models and training. Putting example config files under the config folder is hard to maintain and looks complicated for people to start using TTS.

So I want to discuss here some of the better alternatives and ask for the wisdom of the crowd 🧑‍🤝‍🧑.

Couple of constraints we need to consider from the top of my head.

  • configs should not be python specific and they should be in a generic form to be serialized and loaded by other systems and programming languages. So if someone likes to export the model and use it in an embedded system config file should not be a problem.
  • configs should allow easy experimentation, collaboration, and reproduction.
  • Each model should explain its config fields. Right now I do this in config.json by violating the json format with comments. It is no optimal ☹️.

If you have an idea please share it below and let's discuss it.

Edit:

I should also add one more constraint.

  • We should solve this with no dependencies if possible.
@nmstoker
Copy link
Contributor

nmstoker commented Feb 19, 2021

Hello!

I think this is a great topic to discuss.

All your points above resonate with me. I agree it makes sense to be language agnostic and I haven't any particular preferences, although I appreciate it's something of a bodge with the comments breaking the strict .JSON format.

I'm open minded on the specific format so long as it's easily human readable/editable in a basic text editor (as a minimum).
Two formats that spring to mind are:

YAML (am I right in thinking that there's some handling for this already?)

HJSON - https://hjson.github.io/ I haven't used anything with this but it seems to deal with the comment issue and has a few other accomodations to make it easier, it's supported in a few other languages and has some editor integrations for formatting.

Something I'd raised before is the idea of config being "composable" - by which I mean that you can have two or more layers of config that get combined into the final config.

This initially seems more complicated but I suspect it would ease a number of common scenarios:

  1. It would mean that the core of config settings was common and could be read from a default location

  2. Variations on models then only need to define their new config values, as they can inherit the common core config

  3. When there are widely needed config changes, it's simpler if these are in the core config - this gets us away from the situation where you take an existing config that worked and find the updated repo complains about a missing field that was recently added but your existing config doesn't have.
    I've seen this trip up new users a few times, so whilst it's not necessarily a major worry for more experienced users it seems helpful in general.

  4. As a user experimenting, you would have your base config and then each variant that you want to test has a very simple obvious set of changes, eg if you're just tweaking three settings then each variant config just sets those and the rest come from the base - you can get this with a diff but it's just that little bit easier to work with

  5. Composing would also mean that personal file locations are less likely to end up in the repo and you can set your own ones once locally and they'll persist.

I know there are some modules that enable this but as you raised before, it's important we don't add excessive dependencies. That said, there's a nice looking module called Hydra which seems very much aligned with the needs here

https://medium.com/pytorch/hydra-a-fresh-look-at-configuration-for-machine-learning-projects-50583186b710

https://hydra.cc/

One last point from me (for now!):

Should we think about what level of config makes most sense?
Currently we've got config files for TTS models and vocoders, which is sensible as they're distinct things but they also have a fair bit of crossover.
In addition we've got settings and command line parameters we pass onto the TTS cli and server.py, so maybe something that works well with those is worth thought too?

Anyway that's plenty to be thinking about, so I'll stop to see that others suggest.

@gerazov
Copy link
Contributor

gerazov commented Feb 19, 2021

I third this 👍

Things are a bit messy atm with everything being in one file, and having one file for each model type with a lot of overlap inbetween. In fact adding a new parameter now is quite tedious as one needs to update all config JSONs in the project (18 in total for #649 😅 ) Plus JSON doesn't support comments so they end up messing up some editors, e.g. in vim you need a special plugin that will not treat the comments as errors, and even GitHub doesn't like it 🙂

As far as formats go I think YAML is a sensible choice. It's Python like, but not Python specific (as per @erogol's point 1.), and is also well established. 👍

Having modularity is very important here as @nmstoker elaborates. We should for sure keep model configs model specific, and have more general ones for common parameters.

The most important one here is to at least have a separate database config. This will allow users to easily apply the default model settings to a different database without having to recreate a database specific model config. Other ones could be (from present config): TENSORBOARD and LOGGING, AUDIO PARAMETERS (although they are database specific so they might go with the database config), TRAINING and VALIDATION ...

Going one step further it would be awesome to have hierarchical overloading of default ones with your custom ones, e.g. having a default sample_rate of 22050, which is then overloaded by the sample_rate: 16000 field in your database config.

Hydra is a nice catch @nmstoker I think it's definitely worth considering for this project 👍

@thorstenMueller
Copy link
Contributor

Important topic with lots of great ideas by @erogol, @nmstoker and @gerazov .

Some thoughts of mine on it:

  • Should we keep config backward compatible if someone trained previous models with current config syntax?
  • Would a wiki describing different config values be an option. I think of depenencies between different config values. Eg: If you change key1 think on adjusting key4 and key8. This might lead into a knowledebase on good config values on a specific dataset.
  • Would be nice if we could easily compare different config experiments made on one dataset

by which I mean that you can have two or more layers of config that get combined into the final config.

I also like the idea of overloading. So "audio params" (eg) could be used for tacotron2 and vocoder training as they should obviously match. I like a "single-version-of-truth".

@reuben
Copy link
Contributor

reuben commented Feb 20, 2021

For CI on DeepSpeech we use JSON-e: https://github.com/taskcluster/json-e

It's a system for rendering data structures that allows you to write composable data structures (in our case defined with YAML) with somewhat advanced functionality (variable substitution, basic loops, basic string and date operations, etc) and then render them out as a single object (either as a JSON file or directly as an in memory data structure).

It allows you to have a fully declarative config system, simple to read and write as it's all just YAML, but still abstract repetitive details and avoid copy-paste hell for better maintainability. It's also language agnostic.

@reuben
Copy link
Contributor

reuben commented Feb 20, 2021

It also allows you to combine multiple contexts when rendering a JSON. For example you can have a base "full model config" which requires three different contexts to be rendered: a TTS config, a vocoder config, and a dataset config. Then you could have independent configs for all the TTS models, all the vocoders, and all the datasets, and mix and match at will by just rendering with the appropriate contexts.

@erogol
Copy link
Contributor Author

erogol commented Feb 23, 2021

So long the suggestions are around (correct me if I miss anything )

  • Decomposition of the config file into Audio, Model, Dataset etc. Even decomposition of the model config into layers.
  • Documenting config values.
  • Setting up good default values for config fields when possible.
  • For possible syntax: JSON-e, YAML

After considering all this, maybe we can create python config classes with reasonable default values (That would replace our value checking under generic_utils.py). So the python class serves the value check, the default values, and the config decomposition.

These python classes can also have different loaders which support JSON or YAML or something else.

But if we do this we make things even more python specific. So for other langs. we need to replicate those classes. I think this is the only problem that resides with this approach.

@stephenmelsom
Copy link

My team is running into a similar issue in our TTS repo. The bottom line is that we wanted comments. YAML and HJSON are great (we're probably going with HJSON to prevent any more work), but also take a look at TOML. It's well-supported and a little easier on the eyes IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants