Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Multi-phase, Scheduled Finetuning Callback #10197

Closed
speediedan opened this issue Oct 27, 2021 · 2 comments
Closed

A Multi-phase, Scheduled Finetuning Callback #10197

speediedan opened this issue Oct 27, 2021 · 2 comments
Labels
feature Is an improvement or enhancement
Milestone

Comments

@speediedan
Copy link
Contributor

speediedan commented Oct 27, 2021

🚀 Feature

A callback that enables multi-phase, scheduled finetuning of foundational models.

    from pytorch_lightning import Trainer
    from pytorch_lightning.callbacks import FinetuningScheduler
    trainer = Trainer(callbacks=[FinetuningScheduler()])

fts_explicit_loss_anim

Motivation

Gradual unfreezing/thawing can help maximize foundational model knowledge retention while allowing (typically upper layers of) the model to optimally adapt to new tasks during transfer learning [1, 2, 3].

When tuning pre-trained large language (aka foundational) models for downstream tasks over the last couple years, I've personally observed the benefits of this technique in multiple project contexts and have been using/refining code to expedite the application of this pattern. Given that this approach to finetuning continues to be widely used and that multi-phase finetuning has be requested by others in the PL community I thought it could be immensely useful to the community to provide a callback (extending BaseFinetuning) for this purpose. I've created said callback (named FinetuningScheduler) and have been using it from my PL fork to great effect that last few months and am hoping others may find it similarly useful.

  1. Howard, J., & Ruder, S. (2018). Fine-tuned Language Models for Text Classification. ArXiv, abs/1801.06146.
  2. Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019). An embarrassingly simple approach for transfer learning from pretrained language models. arXiv preprint arXiv:1902.10547.
  3. Peters, M. E., Ruder, S., & Smith, N. A. (2019). To tune or not to tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987.

Pitch

Though approaches to leveraging foundational models for downstream tasks are continually evolving (e.g. prompt/prefix-tuning etc.), finetuning w/ gradual unfreezing continues to be widely used and multi-phase finetuning has be requested by the PL community (#2006) to boot. I think multi-phase finetuning is a natural extension of the BaseFinetuning functionality that PL provides and comports nicely with its aspiration to decouple the science from the engineering.

The PR I'm submitting includes a fully-functional, tested and documented beta version of the FinetuningScheduler (fts) callback as well as a new example (in ./pl_examples/basic_examples/fts) demonstrating a few use cases as applied to a SuperGLUE benchmark task using the LightningCLI. Given the nature of this callback, I thought a LightningCLI-based example was better suited than a notebook-based one.

Rather than re-iterate the documentation here in detail, I think the best way to get a sense of the potential utility of this callback would be to review the documentation I've provided in the PR and execute the new example. At a high-level though, this callback essentially implements gradual unfreezing of foundational models via either explicit or implicit finetuning schedules. Explicit finetuning mode involves unfreezing/thawing layers based upon user-defined layer groupings. Schedule definition is facilitated via a method that dumps a default finetuning schedule which can be adjusted as desired by the user and subsequently passed to the callback. Implicit finetuning mode generates the default schedule and proceeds to finetune according to the generated schedule.

Alternatives

Since this pattern is so commonly used, I think it makes sense to have it available in the PL framework as a callback rather than have it implemented in each user LightningModule or as a community example. I'd note that I did consider modifying ModelCheckpoint to accommodate this callback, but ultimately decided given the extensive usage of that callback, it would be more prudent to extend ModelCheckpoint with FTSCheckpoint, at least while FinetuningScheduler is in beta.

Additional context

Feel free to look at the Tensorboard experiment demo I've linked to in the documentation. While I've made other minor contributions to PyTorch Lightning, this is my first feature contribution, so please bear with me if there are any shortcomings wrt my contribution. Thank you so much to everyone in the PL community for contributing to this awesome framework! I've found it immensely useful and plan to continue using it (and evangelizing about it) in the future.

cc @Borda

@stale
Copy link

stale bot commented Dec 1, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Dec 1, 2021
@Borda Borda added this to the 1.7 milestone Dec 1, 2021
@stale stale bot removed the won't fix This will not be worked on label Dec 1, 2021
@speediedan
Copy link
Contributor Author

Unless there's objection, I think we can close this now that the requested functionality is available via the Finetuning Scheduler extension (used in this lightning tutorial).

On a related note, any sense of when the _notebooks submodule will be updated? Looks like it was last updated in mid-Jan so wondering if there will be an update tied to the 1.7 release. Nice work on 1.7 so far btw! ⚡ 🚀 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants