-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic scheduling policies and a scheduler #1506
base: master
Are you sure you want to change the base?
Conversation
Note that this PR gives us all the scheduling policies available in PyTorch + more. |
One question is how we handle the docs. I am guessing the same as NNlib functions? |
Can I request punting on this? I also am not clear about what benefits we get putting it into the repo as opposed to adding it as a separate package. Further, I'd be inclined to work towards a scheduler interface in optimisers.jl instead. We also have been speaking about moving chunks to their own independent repos. Happy to link from the website |
Sure, we can punt, but I feel like this is a core feature that our framework lacks. I'd rather just have the discussion. Certainly, it may make more sense in Optimisers.jl. But, I think like the current optimizers, this switch can happen quite seamlessly when Flux is ready to transition to Optimisers.jl. This PR is a total of 18 lines of code, and it adheres to the current optimizers interface (no additional interfaces). So moving it over is about as hard as moving Of course, if you are referring to the schedules themselves, this is just reexporting them. I think that similar to optimizers, this is something that we want. Alternatively, we can just have docs and point users to the package like we do for MLDatasets.jl. |
We'd not really have much trouble defining our own with the current interface, and I'd certainly want to have schedulers, but we'd move over to optimisers.jl soon ish so we can start playing with it, and I'd rather work on schedulers there. Linking from the website and docs seems like the way to go to me. Thanks for understanding |
There's two pieces here: the schedules and the scheduler. In the case of the former, we cannot define all kinds of schedules with our interface without going to great lengths to circumvent the limitations of the interface. If #1481 lands soon-ish, then I agree that this PR is better suited for Optimisers.jl. But I have my doubts about the mutability discussion being resolved soon. We don't have to merge this PR immediately by any means. I'm happy to wait and see if we can resolve #1481 soon-ish (and would be quite happy to be proven wrong 😉). |
I have serious concerns about the transition to Optimizers.jl, the whole design may need more discussion, and if we want to do serious benchmarks and add a proper deprecation path, It's quite unlikely #1481 is going to land soon. I encourage people to complete the v0.12 milestone instead of rushing on that |
Any remaining breakages in that pr are not optimisers related I don't think. Deprecations can be handled. |
#1481 is still missing FluxML/Optimisers.jl#9 (comment). Plus, there are important concerns in FluxML/Optimisers.jl#12 |
I don't like what you call the average case, the fact that a scheduler is an optimizer, is awkward and not needed. scheduler and opt should be separate objects (although scheduler can contain a reference to opt if needed). We can add a keyword arg to train! to pass the scheduler. |
The struct seems very dirty as a deprecation path. Hadn't seen FluxML/Optimisers.jl#12 earlier. |
The average case views the scheduler as an optimizer composition (higher-order optimizer). This is inline with the view that optimizer rules are gradient transformations, and optimizer compositions are rule transformations. Maybe it makes more sense with my original name "scheduled optimizer." But if we want to punt on this syntax, then I am fine with that. As you mention, it only matters for |
Other concerns include names for everything and keyword argument constructors. I think I need to remove a lot of Unicode to match Flux practices. Also whether we want a |
@darsnack are you planning on adding refs in the docs and removing the dependency or do we want to do it in a separate PR? |
If we want to push out docs with a link reference, then I would suggest another PR. My intent isn't to close this one, as I still believe it should be reviewed, augmented to address issues, and merged. I was going to submit another PR with the doc reference, but I was planning on doing more than a link (i.e. I think there should be a small snippet in the docs as a usage example). As for this PR, now that I have removed Like I explained before, scheduling policies and the scheduler (or mechanism to actually set the parameter in the optimizer) are separate. Scheduling policies should be thought of as equivalent to Now, I understand that the code in ParameterSchedulers.jl needs to be properly reviewed and discussed, which is why I am happy to take this slow. But I find it a little frustrating that every time I bring this up, it gets shut down. I've expressed I'm open to all forms of changes to the code in ParameterSchedulers.jl and transferring ownership. What I've tried to do is put in my time and effort to present usable code with examples and documentation to have a constructive conversation around the design. I think it's only fair that we allow that conversation to happen. In terms of the dependency issue, ParameterSchedulers.jl should be a negligible addition. It has no sub-dependencies since it only extends interfaces in Base, and it is a pretty small code base in total. In fact, it could easily fit into a |
Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
Maybe the module should live within Flux main module instead of being a submodule of Optimise. I'm ok with adding a lean external dependence. As a start, Flux should document only the foundations of the interface. These are the possible interfaces I can think off: interface 1using Flux.Optimise
using Flux.Schedulers
opt = ADAM()
scheduler = Schedulers.Exp(...)
for epoch in 1:epochs
....
opt.eta = Schedulers.next!(scheduler)
end interface 2opt = ADAM()
scheduler = Schedulers.Exp(...)
for epoch in 1:epochs
....
opt.eta = scheduler[epoch]
end interface 3opt = ADAM()
scheduler = Schedulers.Exp(...,)
for (epoch, eta) in zip(1:epochs, scheduler)
....
opt.eta = eta
end interface 4opt = ADAM()
scheduler = Schedulers.Exp(opt, :eta, ...)
for (epoch, eta) in zip(1:epochs, scheduler)
....
Scheduler.next!(scheduler)
end interface 5opt = ADAM()
scheduler = Schedulers.Exp(opt, :eta, ...)
train!(..., scheduler=scheduler) interface 6the scheduler is a optimizer. Integration in `update! DiscussionI would document no more than interface 1,2,3 here (and actually only 1 is quite good already). Interface 1 is the very basic interface and for me also the more convenient and the one I'm likely going to use. |
I'm uncomfortable with depending on a package much less re-export it if it's not vetted Flux code since that becomes first class API. It makes it very hard to change in the future. As you said, adding this as a dep doesn't benefit it's usability, so it's giving me further pause around the same. I am aware of the limitations of the current api and share your frustrations and want to address them (with hopefully your learnings and guidance) consistently in Optimisers.jl. In the meantime, to have users be directed to the right place, I offered to have the refs in the right places. I appreciate and encourage discussions around this topic and don't think I've stopped that here? I do think your involvement has particularly helped stream discussion in the right places too |
Yes, sorry if my comments came off as antagonistic; it's just that this topic has come up several times, and we only now seem to be discussing it fully (thus, my frustrations). But it seems like we are moving in similar directions here, so thanks for that.
I agree with this; my impression was that something wasn't even being considered, but maybe that's my misunderstanding. I think the best approach will be add the reference and documentation of how to use it. Let it exist in that form for some release cycles then we can evaluate how well it worked in practice with Flux code.
Absolutely, but just to be clear, I don't think that the optimizers and scheduling policies should be too tightly coupled. It makes more sense to me to use an interface (like 4, 5, 6 above or something similar but different) to apply scheduling policies to any optimizer in a generic way. I also still think the policies themselves should be separate from Optimisers.jl for the reasons I stated above. Discussion on interfacesSo interfaces 1, 2, and 3 are all already concurrently supported. For 1, it would be Interfaces 4, 5, and 6 are all various versions of the The cleanest version of 4, 5, and 6 requires Optimisers.jl. This would look like |
I think adding the refs is a good idea. I'm still going to hold off on adding the dependency for the reasons I stated earlier. I have long wanted a Callbacks.jl and Schedulers.jl too for completeness. |
1511: Add ParameterSchedulers.jl to docs r=darsnack a=darsnack As per the discussion in #1506, this adds a section to the docs that briefly describes scheduling with ParameterSchedulers.jl. Only concerns before merging are the naming conventions in ParameterSchedulers.jl. If I could get some feedback on those, then I can submit a minor release before officially merging this into the Flux docs. ### PR Checklist - [ ] ~~Tests are added~~ - [ ] ~~Entry in NEWS.md~~ - [x] Documentation, if applicable - [ ] ~~Final review from `@dhairyagandhi96` (for API changes).~~ Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
I have a version of ParameterSchedulers.jl without the type hierarchy at darsnack/rm-optim. A couple of noted changes:
All in all, assuming we punt on Interfaces 4, 5, 6, I would say this can be merged (needing doc updates and proper review first of course). I don't mind if we don't, this is really just to say that it's here if we want it at some point. |
Shouldn't we wait for darsnack/rm-optim to be merged into master and a new ParameterSchedulers release tagged? |
Yeah definitely. Will do that after cleaning up the docs. |
1513: Update for latest ParameterSchedulers.jl release r=darsnack a=darsnack Due to some finagling with #1506 and work on Optimisers.jl, I made some changes to the names/API of ParameterSchedulers.jl. This just updates the docs to reflect those names. ### PR Checklist - [ ] ~~Tests are added~~ - [ ] ~~Entry in NEWS.md~~ - [x] Documentation, if applicable - [ ] ~~Final review from `@dhairyagandhi96` (for API changes).~~ Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
This change adds basic scheduling policies via ParameterSchedulers.jl. You can find a full list of the available schedules here. Of course, I am happy to transfer ownership of ParameterSchedulers.jl to FluxML if that's desired. Also happy to go over the ParameterSchedulers.jl design and make changes/adjust the names and API.
The ideal scheduling interface would require #1481. In lieu of that, this PR adds
Scheduler(schedule, opt)
as the API with the caveat that only the LR can be scheduled. Once #1481 is merged, then this can be extended to schedule any parameter inopt
without needing a breaking API change. (If we were to allow any parameter at this stage, then we would need a breaking API change).Any schedule is just a simple iterator (i.e. implements
Base.iterate
andBase.getindex
). That's pretty much the full interface. A summary of how this works in Flux:A couple of notes:
ScheduleIterator
, so please suggest something better if you can think of itopt.eta = schedule[t]
oropt.eta = next!(schedule)
can be wrapped into something likestep!(opt, schedule)
orset!(opt, schedule)
(like @CarloLucibello suggested on Slack)PR Checklist
@dhairyagandhi96
(for API changes).