-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX compatible models #2640
Comments
This would be very useful. Do you have any idea how large a piece of work this is (my gut feel is very)? I can see if we can help with some of the work - I'll be honest, this wouldn't be my own speciality. |
The first part is almost finished: #2643 is ready for review. That one surprisingly straight forward: First think of how to refactor a model and then apply the same to all other models too (as it is manly the same). The hardest part is deciding what kind of refactoring to apply, there it might be helpful already to just discuss/brainstorm how to do it. I have some thoughts on the open tasks:
|
Is this code refactoring only for making Flair models compatible with ONNX? |
As long as you are not using the flair embeddings with flair version < 0.11, you can apply dynamic quantisation on all flair models that run on cpu. However, you cannot store them due to the way embeddings are stored. |
…forward GH-2640/transformer forward
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
ping to reveive issue as it isn't dead |
@helpmefindaname Were you able to finish this? I did an export of the german flair model to a single ONNX ~2 years ago but need the english version too, did you make any further progress in this matter? |
I created a script for ONNX export for the de-pos model, it is running just fine on the onnxruntime on .NET. Will test if I can also get an export for Core ML to work, in case anyone needs it you can find it here: https://github.com/edoust/flair/commits/master |
I created single file ONNX models from Basically it first computes the forward and backward embeddings, selects the right embedding tensors from the total embeddings using the
The above example values are given for the two short sentences @alanakbik @helpmefindaname |
Hi @edoust , I think it will take a long time to finish this. So far the models can be exported without embeddings and the transformer embeddings themselves can be exported. The way I want to integrate the onnx export should be, that you can use For the use case that you want to export it to another language (and therefore anyways have to recreate the code to handle inputs and outputs), I would say that your script looks quite solid. |
thanks for the reply, you are right Regarding the ONNX export, I think it would be great to have the possibility to create single file ONNX model exports from various Flair models (combining embeddings and tagging model), otherwise it is a very high effort to include such a Flair model in any app. Having such an option would make the integration into (native) non-python apps/services much easier |
Hi all, I'm interested in this as well, to speed up Flair inference. Do you have any measurements of performance of some models? I'd be interested in GPU vs. CPU vanilla vs. CPU ONNX/TorchScript. |
Hi @jonashaag |
I can’t find any numbers there. Can you please point me to them? |
sorry, wrong PR, I meant this one: #2739 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
To be removed, once it is done: Please add the appropriate label to this ticket, e.g. feature or enhancement.
Is your feature/enhancement request related to a problem? Please describe.
ONNX support is a frequently requested feature, some issues mention it (#2625, #2451, #2317, #1936, #1423, #999)
so I think there is a big desire for the community to support it.
I suppose the usual ONNX compatibility would also make the models compatible to torch.jit (#2528) or AWS Neutron (#2443)
ONNX provides large enhancements in terms of production readiness, it creates a static computational graph which can be quantized and optimized towards specific hardware, see https://onnxruntime.ai/docs/performance/tune-performance.html (it claims to be 17x faster)
Describe the solution you'd like
I'd suggest iterative progression as multiple architecture changes are required:
forward
/forward_pass
methods, such that all models have a method_prepare_tensors
which converts all DataPoints to tensors and aforward
which takes in tensors and outputs tensors (e.g. for the SeqeuenceTagger we the forward has the signaturedef forward(self, sentence_tensor: torch.Tensor, lengths: torch.LongTensor)
and returns a single tensorscores
)this change allows conversion to ONNX models, however the logic (like decoding crf scores, filling up sentence results, extracting tensors) won't be implemented. Also embeddings won't be part of the ONNX model.
forward
/_prepare_tensors
architecture for embeddings, such that those could be converted too.This would allow converting embeddings to ONNX models, but again without logic.
Notice that this would be 4 different PRs and probably all of them would be very large and should be tested a lot before moving to the next PR,
I would offer to do the first one and then see how much effort this is/how much time I have for this.
The text was updated successfully, but these errors were encountered: