-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify BERT/BERT-descendants to be TorchScript-able (not just traceable) #5067
Comments
Hi! This is interesting. Could you resume what are the changes that would be needed in order to have our models scriptable? |
Sure, mostly my changes fall into these categories: 1. Class members can only be basic types, None, nn.Modules, or list or tuple thereof
2. Inputs are assumed to be Tensors
3. TorchScript can't figure out that an Optional is not None
4. Variable types are not allowed to change depending on conditionals
5. TorchScript can't handle the expand (*) operator on lists
6. You can't use nn.Modules as local variables (take variable number of args)
7. TorchScript doesn't know about nn.ModuleList's enumerate
Most of these are pretty small changes and do not affect the logic. #4 and #1c can be tricky, and #5 might be an issue with recent changes made here: #4874 |
Hi @sbrody18, Thanks for opening this issue and taking the time to dive into our TorchScript support. Regarding A scriptable model would allow for variable-length input, offering big speedup gains and simplification: Do you have some numbers to compare against the current transformers library? We ran some TorchScript tests and the differences where not that huge at that time, may be this has changed since? I (and probably others) would be very interested in knowing more on this aspect. Regarding the list of changes you suggested: I'm currently not really in favour of such changes as they are almost changing all the way the library is designed and would have an impact on all the models. Some of them might be further discussed if there are real performance benefits. |
Hi @mfuntowicz, Assuming you are trying to run in C++ (which is the reason to use TorchScript), the current solution, using I'm guessing the tests you ran were focused specifically on the technical behavior of the models on a fixed input set and didn't take into account the max-length issue. Also, this is only an issue if you need to use TorchScript in order to run in C++. Re. the change to design, my intention is to keep the model changes to a minimum (e.g., adding type hints and asserts does not change the design at all) and make sure they are fully backwards compatible. There would still be some changes required, but I don't think they are drastic. As I said in the original post, I have a PR where I did a lot of the work, and I'd be happy to work with someone to figure out how to get it to a state where it can be merged. |
@sbrody18 do you mind sharing your fork ? |
Yes, I can do so, but it may have to wait a week or two - things are busy at the moment. |
I am very interested in this work as well. Our team would like to be able to use TorchScript so we can train without depending on Python. If there's any way I can be of help, I would gladly offer some time here! |
Sorry for the delay. I hope to have a reasonable PR later this week. |
My change is available at https://github.com/sbrody18/transformers/tree/scripting Note that it is based off of a commit from earlier this month: I listed the principles I used in #5067 (comment). The original components tended to return different sized tuples, depending on arguments, which is problematic for TorchScript. When a component BertX required an interface change to be scriptable, I made a BertScriptableX version with the modifications, and had the BertX component inherit from it and just modify the output so it is compatible with the original API. I made scriptable versions of BertModel and all the BertFor<Task> classes, except BertForMaskedLM (some complexities there were too much work for a proof of concept). Note that my change disables the gradient_checkpoint path in the encoder. I think this can be resolved, but I didn't have the time to work on it. |
Thanks for all the work. Looking at this and our recent changes in the model API (in particular the return_dict argument) I think we probably won't be able to have the models be fully compatible with TorchScript. What is possible however would be to have a second version of the models that don't have the option of return_dict (we can also remove output_hiddens/output_attentions if it makes life easier) and would be fully scriptable. Since you already started with some components in a different class, I think we should have two models (let's say Then I'm not sure what's easiest:
I think we should focus on having a proof of concept on one model before moving forward with others. |
That makes sense to me. It will probably result in some amount of code duplication, and we'd need to make sure we keep the named parameters in sync, but probably easier to maintain. |
Not necessarily a separate file, I guess it depends on the amount of code to rewrite. I think we can worry about this in a second stage, once we have a good poc. |
@sgugger Please see POC implementation in PR above. |
@kevinstephano - see discussion and conclussions here |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, I just tried
@sbrody18 how did you export the model? I guess the workaround would be to remove try blocks, but apparently it did work for you as it is. |
@fteufel you can see #6846 for a stand-alone implementation that worked at a previous version of the transformers library. Maybe that's good enough for your purposes? |
@sbrody18 It seems have not been merged to official transformers ? My transformers Version: 4.21.3, and it can not use |
🚀 Feature request
Modify BERT models (src/transformers/modeling_bert.py) to conform to TorchScript requirements, so they can be
jit.script()
-ed, not justjit.trace()
-ed (as is currently the only supported option)Note: I have a working version implementing this, which I would like to contribute.
See below.
Motivation
A scriptable model would allow for variable-length input, offering big speedup gains and simplification (no need to create different models for different input lengths).
In addition, it would avoid other potential pitfalls with tracing (e.g., code paths that are input dependent and not covered by the tracing example input).
Related issues:
#2417
#1204
possibly also
#1477
#902
Your contribution
I have a working PR that modifies all the models in src/transformers/modeling_bert.py and makes them TorchScript-able. I have not tested it on other models that use BERT components (e.g., albert), but it should be possible to expand the capability to those, as well.
However, it would require some significant work to make it ready for submission: besides formatting, documentation, testing etc., my current version changes the method signatures, and I would need to avoid that to maintain backward-compatibility.
Before putting in that work, I'd like to make sure that such a PR is something you'd be interested in and would be willing to merge in, assuming it meets the requirements.
The text was updated successfully, but these errors were encountered: