-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for freezing pretrained vision model layers with regex #3981
Conversation
This reverts commit 5df7362.
Unit Test Results 6 files ±0 6 suites ±0 18m 36s ⏱️ + 4m 15s Results for commit 4e62b92. ± Comparison against base commit 4b07ce4. This pull request skips 2 tests.
♻️ This comment has been updated with latest results. |
ludwig/cli.py
Outdated
from ludwig.utils import pretrained_summary | ||
|
||
pretrained_summary.cli_summarize_pretrained(sys.argv[2:]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be good to add in the docs some example runs and outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanreidel To second @skanjila -- would it be possible to show an example of running this command in terms of how it is different from the existing one -- and an example output. Thank you very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qq: when you say you'd like an example, do you mean an example in the Ludwig docs or how would you prefer it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanreidel One option is to create an example in the examples/
top level directory in Ludwig
ludwig/utils/pretrained_summary.py
Outdated
from ludwig.contrib import add_contrib_callback_args | ||
from ludwig.globals import LUDWIG_VERSION | ||
from ludwig.utils.print_utils import print_ludwig | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait are we really supporting all of these models, I thought we were just going to go out the door with a couple of models to start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanreidel While it looks like for torchvision this will be supported, what about text/LLMs (this is kind of related to my previous comment in the Trainers section). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific feature (simple regex freezing), as long as you have access to the string representation of layers + actual model architecture, you can freeze any layers that you'd like. It wasn't any extra work adding support for all torchvision models besides adding to this list. I however don't like the look of this long model array though
@ethanreidel Sorry, could you please point me to this "long model array"? Which line in your code has it? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your first question Alex: as long as access to the model layers + their requires_grad parameter is available, in theory, this feature should work on LLMs/text. I'm not too familiar with LLM architecture and I'll have to do some quick checks, but I'm 99% sure it is an easy addition. Second question: in a previous commit, I had a pretty hacky solution where users had another command line option (under pretrained_summary) which would list all available model names. Those names were stored in a Python list which had a few issues namely having to expand it regularly/many lines of unnecessary code. Saad made a good point and said to fully remove it (it was not needed), so it's no longer there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked and sure enough you can apply the same regex freezing technique to an LLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked and sure enough you can apply the same regex freezing technique to an LLM
@ethanreidel That's awesome. Maybe we can then use one of my earlier comments to only add this parameter to theECDTrainerConfig
andFineTuneTrainerConfig
for now.
As part of the examples you have, it would be good to create 2 example Python files:
- To show how to use it with a computer vision model
- To show how to use it with an LLM base model
What do you think?
ludwig/utils/trainer_utils.py
Outdated
|
||
|
||
def freeze_layers_regex(config: "BaseTrainerConfig", model: ECD) -> None: | ||
"""Freezes layers based on provided regular expression.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add all of the comments around inputs/outputs as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanreidel I think that if you put rom __future__ import annotations
as the very first line in the module, you would not need to quote the types. Would you like to give it a try and see if it works? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the annotations import, and it worked, but the git pre-commit was forcing changes (e.g. converting all uppercase Dicts to lowercase dicts) that I didn't like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ethanreidel I think that makes sense.
Are you able to expand on the docstring itself for this function? Also, if it also supports LLM
, can we make model
a union of ECD and LLM?
ludwig/schema/trainer.py
Outdated
layers_to_freeze_regex: str = schema_utils.String( | ||
default=None, | ||
allow_none=True, | ||
description=( | ||
"Freeze specific layers based on provided regex. Freezing specific layers can improve a " | ||
"pretrained model's performance in a number of ways. At a basic level, freezing early layers can " | ||
"prevent overfitting by retaining more general features (beneficial for small datasets). Also can " | ||
"reduce computational resource use and lower overall training time due to less gradient calculations. " | ||
), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting this in the base trainer config, what if we put it in the ECDTrainerConfig
? For now, that will be good enough to ensure that this only works for ECD supported models and is not a valid argument/parameter for GBMs and LLMs. If it also works for LLMs, then we can also duplicate adding it in FineTuneTrainerConfig
which is used by LLMs. It also means we don't have to modify any other trainers for now.
ludwig/utils/pretrained_summary.py
Outdated
model = encoder_class() | ||
|
||
for name, _ in model.named_parameters(): | ||
print(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally don't like to use Prints in Ludwig code - can we use logger.info()
instead?
import logging
logger = logging.getLogger(__name__)
logger.info("message")
ludwig/utils/trainer_utils.py
Outdated
pattern = re.compile(config.layers_to_freeze_regex) | ||
except re.error: | ||
logger.error(f"Invalid regex input: {config.layers_to_freeze_regex}") | ||
exit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of exit()
, let's raise a RuntimeError() with the same message.
In fact, here's a thought I have: We can move this check to earlier in the code path, that is, at config validation time. Specifically, you can create a __post_init__()
hook for ECDTrainerConfig
and FineTuneTrainerConfig
that tries to do re.compile() and if it fails, throws a ConfigValidationError with the error message. That way, we don't have to wait for all of preprocessing etc to be done before catching this error.
Here's an example explaining the same idea in a different part of the Ludwig codepath: https://github.com/ludwig-ai/ludwig/blob/master/ludwig/schema/llms/peft.py#L443
ludwig/utils/trainer_utils.py
Outdated
matched = False | ||
for name, p in model.named_parameters(): | ||
if re.search(pattern, str(name)): | ||
p.requires_grad = False | ||
matched = True | ||
if not matched: | ||
logger.error(f"No regex match for {config.layers_to_freeze_regex}! Check layer names and regex syntax.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts here:
- Instead of
logger.error
, perhaps we can do alogger.warning()
? I think it's okay if there are no matches, but we just want to claim that as a warning so the users can notice it as opposed to calling it an error (which it sort of is). - One thing that could be very useful here, is to also create a set of all the layers where the regex search actually returns true and requires grad gets set to false, and then log that full list! Observability can be super helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, really nice work and clean implementation @ethanreidel! I left a few suggestions that might help simplify edge cases as well as considerations to add more observability into which layers are frozen.
I would also recommend installing |
@ethanreidel Let us know when this is ready for re-review! |
Hi @arnavgarg1! Thanks for checking in. The PR is good for re-review. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @ethanreidel -- thank you for this amazing and very useful contribution!
Allows the user to input a regular expression in the yaml config which freezes specific layers of a pretrained model. Adds new CLI option "pretrained_summary" to let users access string representations of model layers for freezing via regex. Currently all pretrained torchvision models are accessible.
trainer:
layers_to_freeze_regex: (regex here)
ludwig pretrained_summary -m (model name here)
(I am aware that the collect_summary CLI command is similar, however it only accepts a preexisting directory so I thought creating a separate command to strictly output layer names was appropriate for this feature.)
Closes #3733
Future plans -> expand this capability to implement gradual unfreezing
Test: pytest tests/ludwig/modules/test_regex_freezing.py