Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Optimizing bert_cos_score_idf 1) Pad BERT embeddings on GPU instead of CPU. Padding on CPU is the bottleneck in computing the greedy matching, so padding on GPU speeds up the matching by ~3x for me. Moving tensors to GPU then becomes the bottleneck, but it also takes ~2x less time to move pre-padding tensors to GPU, I think since you don't have to move a bunch of padding numbers. So overall I get a ~6x speed up on the sequences I'm evaluating 2) Using `torch.no_grad()` when computing greedy matching to save memory. I was able to increase the batch size for greedy matching by 2x after doing this. I'm not sure if increasing the batch size here will cause OOMs for others though, so it might be worth someone else checking/trying it out (or just removing the batch size increase). * Removing batch size increase Occasionally found OOMs with batch size increase for greedy matching only, so I removed that
- Loading branch information