Runtime Optimization #86

aditya-y47 · 2023-09-25T04:10:15Z

Hey, first up, thank you for building and open sourcing such a great piece of work, I have been using INSTRUCTOR for some time now and I absolutely love it.

I'm planning on working generating embeddings for a large corpus of texts (In Million scale), I intend to schedule the embedding generation job as an aysnc-MQ based execution. Based on some of my initial estimates the run-time estimates are a bit on the higher side, I was hoping certain methods could be used to optimize the generation of embeddings. Some of them include.

Inference on TensorRT
Compile the underlying PyTorch model
- I see that you folks use Sentence-transformers like implementation, so I am unsure if torch compile how it would work
Using Kernel fusion / Custom kernels. etc

Are there any generally prescribed guidelines which would help me achieve these, is anyone here working on such optimizations?

hongjin-su · 2023-12-19T09:46:40Z

Yeah, INSTRUCTOR is highly similar to sentence-transformer in terms of the model architecture. Therefore, any optimization that applies to sentence-transformer models may also be applicable to the INSTRUCTOR models.

Recently, there have been some efforts in model quantization, which you may take as references:
https://www.sbert.net/examples/training/distillation/README.html#quantization
https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime Optimization #86

Runtime Optimization #86

aditya-y47 commented Sep 25, 2023

hongjin-su commented Dec 19, 2023

Runtime Optimization #86

Runtime Optimization #86

Comments

aditya-y47 commented Sep 25, 2023

hongjin-su commented Dec 19, 2023