-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Wrong shape for input_ids (shape torch.Size([18])) or attention_mask (shape torch.Size([18])) #10
Comments
Is it possible to use custom models with simalign? (I'm mostly interested in alignment from english to english, not other languages) |
Maybe this might be the issue? |
Hi @youssefavx thanks for pointing this issue out. Custom models should work with simalign (just pass the path to the model when instantiating |
Hey @pdufter Awesome! Will experiment with custom models (I assume I could just use the model name like with transformers library? or do I have to find an actual path to those?). I'm running version 3.1.0. |
At the moment we only tested on transformers==2.3.0 |
Unfortunately I can’t really downgrade because there’s new functionality in the new transformers that is essential. Do you know if there’s a way to run both versions of a package at the same time in the same application? If not, then I guess I’ll try to debug this one and report back. |
I do not know whether you can run two versions at the same time. But we anyway plan to make simalign usable with newer transformer versions and to add new features soon, if that helps. In the meantime, if you find the issue, any pull request is obviously highly appreciated. |
@pdufter Will do if I solve it! |
Okay, I think I fixed this (or found the problem) but my fix breaks simalign for earlier versions of transformers. I really don't think this is because of any impossibility to have compatibility with earlier versions but more due to my ignorance. I should note that I have:
Perhaps you could add an if statement in the code like "if the version is earlier" (a much better if statement would obviously be one that allows you to detect nested arrays inside a tensor or something, as we dont know what huggingface will do at any point in time when they change their packages. It's probably also less tedious to set up the latter) So here's the problem: In this function:
When I print the "inputs" variable (after updating transformers to 3.1.0):
The tensor that you get is different, which is why we get this error I assume: Whereas when I downgrade transformers, and I print inputs again:
So all I had to do was add another (array?) to it. Keep in mind I have no clue whatsoever how to do this appropriately, nor do I have any clue what I'm doing. I searched online, and came across this solution. So, in this function, here's the edit I made:
In this function:
So you may have better ideas as to what the implications of this edit are and how to better implement it. |
And testing to make sure that the in_ids are the same before and after the resize:
|
@youssefavx |
@masoudjs Thank you for making such a useful and essential tool |
I had the same issue, but it was resolved by wrapping my data in a torch Dataloader. I am not sure as to why that solved the problem, but solve it, it did. |
Modules in torch accept inputs in form of [batch_size, ...] therefore perform .unsqueeze(0) on input/attention_masks tensors would help. which is done by torch.utils.data.DataLoader |
After running the example code provided I get this error:
I wonder if this is due to my recent update of transformers. If so, that's going to be difficult for me to solve because the newest version of transformers has a fill-mask feature that was not available in previous versions that I'm going to need in conjunction with simalign's invaluable functionality.
Hopefully, this is unrelated. I did cancel the download then restart it again (and it seemed to restart from a fresh file though I could be wrong).
The text was updated successfully, but these errors were encountered: