-
Notifications
You must be signed in to change notification settings - Fork 1.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hyper-parameter optimization api #2532
Comments
Hi @schlichtanders Thank you for the detailed description - interesting perspective to the hyper params and good ideas! We are discussing experimentation scenarios in DVC and it looks like DVC needs special support for some cases. A recent discussion example - #2379. I'd love to discuss this from the point of hyperparameter tuning case and hyper param optimization packages. Could you please clarify a few things:
The major question I have - Why do we need two abstractions: branches AND subfolders? Additional questions: |
I made some progress and created a small example, however currently have no time completing it. Nevertheless here the link: the idea is simple: after defining two helper functionalities a hyperparameter search is just a little wrapper script which calls another .dvc file two helpers
I hope I find time in november/december to finish this and answer all your questions respectively |
I had two thoughts related to potential API for hyperparameters on how to choose whether to store resulting models or not ("treat it as cache" and "treat it as optimal decision"). I posted them in another thread: #2379 (comment) If API would allow such flexibility, exact decision can be easily delegated to other libraries. Unfortunately I don't have anything more concrete than this wish/feature request yet. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Dear DVC folk,
Motivation
you mention it yourself on your documentation: Fully versioned Hyperparameter Optimization comes to mind when using DVC.
Little Research
I just made a quick research and it gets apparent very soon that this needs a specific implementation for dvc.
All the existing hyperparameter optimizers like python's hyperopt
Suggestions how to integrate to DVC
It seems to me the following is needed for hyperparameter optimization to be a natural addition to DVC:
each triggered hyperoptimization orchestration should have its own git branch subfolder
each single hyperoptimization run should have its own subbranch under that subfolder
a file-based hyper-parameter API, probably based on json
dvc metrics
already work.dvc repro
it would unbelievably awesome to not reinvent the wheel entirely, but provide wrappers around existing hyperoptimization* packages like hyperopt or smac or others
the principle idea is simple: instead of running a concrete algorithm with the specific framework, you run a wrapper which
dvc repro myalgorithm.dvc
on a previously specified routinemyalgorithm.dvc
wrapping existing optimization frameworks has several advantages
Of course more details will pop up while actually implementing this, e.g. how to integrate hyperoptimization with .dvc pipeline files as neatly as possible (for instance we may want to commit both the single run.dvc as well as a hyperopt.dvc to the same repository -- these need to interact seamlessly together)
What do you think about this suggested approach?
The text was updated successfully, but these errors were encountered: