Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX compatible models #2640

Closed
helpmefindaname opened this issue Feb 20, 2022 · 16 comments
Closed

ONNX compatible models #2640

helpmefindaname opened this issue Feb 20, 2022 · 16 comments
Labels
wontfix This will not be worked on

Comments

@helpmefindaname
Copy link
Collaborator

helpmefindaname commented Feb 20, 2022

To be removed, once it is done: Please add the appropriate label to this ticket, e.g. feature or enhancement.

Is your feature/enhancement request related to a problem? Please describe.
ONNX support is a frequently requested feature, some issues mention it (#2625, #2451, #2317, #1936, #1423, #999)
so I think there is a big desire for the community to support it.
I suppose the usual ONNX compatibility would also make the models compatible to torch.jit (#2528) or AWS Neutron (#2443)

ONNX provides large enhancements in terms of production readiness, it creates a static computational graph which can be quantized and optimized towards specific hardware, see https://onnxruntime.ai/docs/performance/tune-performance.html (it claims to be 17x faster)

Describe the solution you'd like
I'd suggest iterative progression as multiple architecture changes are required:

  1. split the forward/forward_pass methods, such that all models have a method _prepare_tensors which converts all DataPoints to tensors and a forward which takes in tensors and outputs tensors (e.g. for the SeqeuenceTagger we the forward has the signature def forward(self, sentence_tensor: torch.Tensor, lengths: torch.LongTensor) and returns a single tensor scores)
    this change allows conversion to ONNX models, however the logic (like decoding crf scores, filling up sentence results, extracting tensors) won't be implemented. Also embeddings won't be part of the ONNX model.
  2. create the same forward/_prepare_tensors architecture for embeddings, such that those could be converted too.
    This would allow converting embeddings to ONNX models, but again without logic.
  3. change the architecture, that both embeddings and models have the logic part (creating inputs, adding outputs to data points) and the pytorch part be split, such that the pytorch model part can be replaced by a converted ONNX model.
  4. create an end-to-end model wrapper, that both embeddings & the model can be converted to a single ONNX model and used as such.

Notice that this would be 4 different PRs and probably all of them would be very large and should be tested a lot before moving to the next PR,
I would offer to do the first one and then see how much effort this is/how much time I have for this.

@kieron-guinamard-privitar

This would be very useful. Do you have any idea how large a piece of work this is (my gut feel is very)?

I can see if we can help with some of the work - I'll be honest, this wouldn't be my own speciality.

@helpmefindaname
Copy link
Collaborator Author

The first part is almost finished: #2643 is ready for review.

That one surprisingly straight forward: First think of how to refactor a model and then apply the same to all other models too (as it is manly the same).
Only the lemmatization model (encoder decoder architecture) has increased complexity.

The hardest part is deciding what kind of refactoring to apply, there it might be helpful already to just discuss/brainstorm how to do it.

I have some thoughts on the open tasks:

  1. The TransformersEmbeddings likely will be a bigger piece, maybe flair (pool) embeddings too, one would convert the lengths and indices to LongTensors to ensure everything is convertible.
    also, I think it would make sense to change the architecture, that Sentence stores the full embedding vector for the whole sequence instead of tokens storing the individual embeddings. That way, the forward method of the embeddings could return the already padded sequences. And embeddings.embed could return the raw tensors.
    We could make _prepare_tensors return a dictionary {embedding_name: tensor} so stacked embeddings have an easy way to handle them separated by embedding.
  2. This one struggles me a lot, the new architecture should be in a way that you don't need to load the pytorch weights if you use the onnx model and reversed. This could be done, by splitting the class into 2 classes (logic, vs Model), however it should also be easy to implement new models and slitting them up might make it too complicated.

@aytugkaya
Copy link

Is this code refactoring only for making Flair models compatible with ONNX?
OR is it possible to quantize the Flair models without use of ONNX before code is refactored?

@helpmefindaname
Copy link
Collaborator Author

As long as you are not using the flair embeddings with flair version < 0.11, you can apply dynamic quantisation on all flair models that run on cpu. However, you cannot store them due to the way embeddings are stored.

@stale
Copy link

stale bot commented Jul 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 30, 2022
@helpmefindaname
Copy link
Collaborator Author

ping to reveive issue as it isn't dead

@alanakbik alanakbik removed the wontfix This will not be worked on label Aug 10, 2022
alanakbik added a commit that referenced this issue Aug 18, 2022
@edoust
Copy link

edoust commented Sep 11, 2022

@helpmefindaname Were you able to finish this? I did an export of the german flair model to a single ONNX ~2 years ago but need the english version too, did you make any further progress in this matter?

@edoust
Copy link

edoust commented Sep 11, 2022

I created a script for ONNX export for the de-pos model, it is running just fine on the onnxruntime on .NET. Will test if I can also get an export for Core ML to work, in case anyone needs it you can find it here: https://github.com/edoust/flair/commits/master

@edoust
Copy link

edoust commented Sep 13, 2022

I created single file ONNX models from de-pos, flair/upos-multi, flair/upos-multi-fast that work with variable batch and sentence sizes.

Basically it first computes the forward and backward embeddings, selects the right embedding tensors from the total embeddings using the forwardIndices and backwardIndices. Then it concatenates the selected tensors, and "stripes" them into the final sentence_tensor using the striping

Input Shape Example Shape Example Description
forward characters x sentences (9,2) [mapped with char table] The mapped character input sequence for the forward language model
forwardIndices total_tokens (4) [6,14,5,17] The indices of the embeddings to take from the full embedding tensor
backward (9,2) [mapped with char table]
backwardIndices (4) [14,6,5,17]
striping total_embeddings (8) [0,4,1,5,2,6,3,7] Used to generate the sentence tensor from the concatenated forward and backward embedding tensors, e.g.
characterLengths sentences (2) [9,4] Required for keeping dynamic shapes right
lengths sentences (2) [2,1] Required for keeping dynamic shapes right

The above example values are given for the two short sentences Pla Gon and Xu.

@alanakbik @helpmefindaname
Does this make sense to you, or is there an easier/better way to achieve a single ONNX model export that includes the embeddings? Did I miss anything? Any feedback would be appreciated

This is the visual model representation:
de-pos-onnx

@helpmefindaname
Copy link
Collaborator Author

Hi @edoust ,
sorry for the late answer

I think it will take a long time to finish this. So far the models can be exported without embeddings and the transformer embeddings themselves can be exported. The way I want to integrate the onnx export should be, that you can use torch.onnx.export and can use the exported model within the flair library. For this there are quite some architectural changes required where I am currently not sure how to handle them at best.

For the use case that you want to export it to another language (and therefore anyways have to recreate the code to handle inputs and outputs), I would say that your script looks quite solid.
The only thing there is that I wonder if the striping is really necessary? Shouldn't it be possible to concatenate the embeddings on the embedding dimension at line master...edoust:flair:master#diff-2cdd6b2846dd6d89526228ebe147fc75f9b0aa7c999593a4ee32db2ae142adfdR74 ?

@edoust
Copy link

edoust commented Sep 16, 2022

Hi @helpmefindaname

thanks for the reply, you are right striping is not necessary, thanks for that :)

Regarding the ONNX export, I think it would be great to have the possibility to create single file ONNX model exports from various Flair models (combining embeddings and tagging model), otherwise it is a very high effort to include such a Flair model in any app. Having such an option would make the integration into (native) non-python apps/services much easier

@jonashaag
Copy link

Hi all, I'm interested in this as well, to speed up Flair inference. Do you have any measurements of performance of some models? I'd be interested in GPU vs. CPU vanilla vs. CPU ONNX/TorchScript.

@helpmefindaname
Copy link
Collaborator Author

Hi @jonashaag
I did some evaluation for the TransformerEmbeddings in this PR.
Notice that the times heavily depend on your devices. A cheap CPU will be way slower than a strong CPU and same for GPU. At the end, you have to evaluate it yourself for your hardware

@jonashaag
Copy link

I can’t find any numbers there. Can you please point me to them?

@helpmefindaname
Copy link
Collaborator Author

sorry, wrong PR, I meant this one: #2739

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Mar 18, 2023
@stale stale bot closed this as completed Apr 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

6 participants