Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX for ConvNd layers using the groups argument. #2403

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gslama12
Copy link
Contributor

As discussed in #2153.

Code example

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The change seems to be much smaller than I initially thought, which is great. Before we proceed, could we do the following:

Let's add a test case for this. First, let's create an entry like this one:

("Conv2d 1 LoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d"]}),

Then we need to define a model with a conv layer that uses groups. Something similar to this with groups=5 should work:

class ModelConv2D(nn.Module):
def __init__(self):
super().__init__()
self.conv2d = nn.Conv2d(5, 10, 3)
self.relu = nn.ReLU()
self.flat = nn.Flatten()
self.lin0 = nn.Linear(10, 2)
self.sm = nn.LogSoftmax(dim=-1)
def forward(self, X):
X = X.float().reshape(-1, 5, 3, 3)
X = self.conv2d(X)
X = self.relu(X)
X = self.flat(X)
X = self.lin0(X)
X = self.sm(X)
return X

Then make sure that the model is being used when it's model ID is passed by adding an entry similar to this one:

if model_id == "Conv2d":
return ModelConv2D().to(torch_dtype)

LMK if anything is unclear.

Moreover, don't forget to run make style for the linter.

gslama12 and others added 3 commits February 27, 2025 12:02
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@gslama12
Copy link
Contributor Author

thanks for the fast reply. I implemented the test as you described. I added one for lora and one for dora. LMK if there is something missing.

@BenjaminBossan
Copy link
Member

Thanks for adding the tests. Unfortunately, a lot of them are failing for me locally. Do they pass for you? E.g. this one:

pytest tests/test_custom_models.py -k test_forward_output_finite_021_Conv2d_Groups_LoRA

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 1, 2025

I had a bug in the test model, i fixed it now and the TC you stated should work. Let's see if anything else fails.

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 3, 2025

It seems like there is an issue with the merging TCs. I will try to look into it.. if you have any suggestions LMK.

@BenjaminBossan
Copy link
Member

Thanks for the updates but some tests are still failing on CI (ignore those caused by timeouts, that's a HF Hub issue). Checking the values, it doesn't look like it's just a matter of precision/tolerance but that there's something else going on. Do these tests pass locally for you?

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 3, 2025

No they don't. It seems like something with the merging procedure is off. I will try to look into it.

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 3, 2025

I tried to recreate the test test_merge_layers_021_Conv2d_Groups_LoRA, which is one of the failing TCs in the pipeline. Maybe you can checkout this gist, which runs fine on my machine. I wonder if there is maybe something going on within the testing pipeline that causes the assertion error. 🤔

@BenjaminBossan
Copy link
Member

You don't need to create a standalone script to reproduce the error, just run pytest like so:

pytest tests/test_custom_models.py -k test_merge_layers_021_Conv2d_Groups_LoRA

With this, I can reproduce the error locally. Dropping into the debugger, I see:

(Pdb) logits
tensor([-0.0053, -5.2435], grad_fn=<SelectBackward0>)
(Pdb) logits_merged
tensor([-7.0452e-04, -7.2583e+00])

To run all the groups-related PRs, call this:

pytest tests/test_custom_models.py -k groups -v

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 3, 2025

Yes i am aware of that. I also get the same assertion error when running the test so my first thought was that i messed up the merging with my changes. But when i create a similar scenario and run it as a local file without the pipeline, the assertion seems to work.

So i'm currently trying to figure out what the difference is.

@BenjaminBossan
Copy link
Member

Ah sorry, I misunderstood you.

Yes, your script passes, but there are a few differences. Please pass init_lora_weights=False to LoraConfig or else LoRA will just be a no-op. Furthermore, I had to pass a non-zero input, so e.g. dummy_input = torch.arange(90).reshape(9, 10) as in the test. Now the first assert fails.

@gslama12
Copy link
Contributor Author

gslama12 commented Mar 3, 2025

Ahh ok, thanks for the help. I will try to debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants