-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630
Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630
Conversation
@tomaarsen Hi Tom, could you take time to review this PR/merge it? Thanks much! |
@ZhengHongming888 Yes, I will review it today. Perhaps I can also merge it today, but I don't think I can bring out a release with it today. Is that okay? |
@tomaarsen That's already very good! haha. :-) Thanks so much for your great support! |
@tomaarsen everything you updated seems good for me! .. Thanks for your review! |
@tomaarsen if everything is ok could you merge it by today? :-) Thanks! |
@ZhengHongming888 I won't merge it today, I'm afraid. I've realised that
I think the best solution is sadly to create three
|
@tomaarsen but if i remove the request for padding argument in the init part i can do the specific code for hpu side could it be merged for my PR tomorrow? Thanks. |
@ZhengHongming888 I think so, yes. Feel free to make those changes.
|
@tomaarsen I removed the request for adding padding as one argument in init and also passed all checks. :-) Now it should be ok. So please check and if any further please feel free to let me know. Thanks ! |
device = get_device_name() | ||
logger.info("Use pytorch device_name: {}".format(device)) | ||
|
||
if device == "hpu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure that optimum-habana
is installed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good suggestion. i will add the check. thanks.
if self.device.type == "hpu": | ||
hpu_graph_out = self.forward(features) | ||
out_features = copy.deepcopy(hpu_graph_out) | ||
else: | ||
out_features = self.forward(features) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.device.type == "hpu": | |
hpu_graph_out = self.forward(features) | |
out_features = copy.deepcopy(hpu_graph_out) | |
else: | |
out_features = self.forward(features) | |
out_features = self.forward(features) | |
if self.device.type == "hpu": | |
out_features = copy.deepcopy(out_features) |
I would prefer this, could you verify if that works? Admittedly, I'm not sure why you have to copy these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to support the PR (test_encode_truncate() under tests/test_sentence_transformer) #2573, we need the code here. The reason is in original implementation of out_features = self.forward(features) the variables of features/out_features are using one single memory and out_features just adding additional key ("sentence_embedding"). When later on done with Dim_truncate there will be some error in hpu's graph mode because graph mode will take the original output as one reference.
I think what you changed here is good for simplicity.
@tomaarsen thanks much for your suggestions! All are good for me and I have done to add the 'optimum-habana' check and also deepcopy part after forward output for hpu. All checks passed. Thanks for your some holiday time. :-) Sorry for that. |
I have made one final modification to the code in cc8b7e9, because kwargs is now always empty. I think this is ready now. Thanks for implementating my suggestions.
|
@tomaarsen yes you are right for kwargs. Thanks for your great support on the PRs! Also thanks for your time! :-) |
This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training.
This is the follow up PR to #2557. There are few following considerations/modifications in this new PR on hpu device -
Here we give one new algorithm to add more padding to align the maximum batch length requirement for hpu graph mode.
The algorithm is to add a little padding based on original features output from tokenizer based on the following padding calculation.
additional_pad_len = 2 ** math.ceil(math.log2(curr_tokenize_len[1])) - curr_tokenize_len[1]
The maximum batch length will be align into few values like [ 2, 4, 8, 16, 32, 64, 128, 256 .. ] which will greatly reduce the tokenizer/forward/cpu_conversion time under hpu graph mode.
We have tested the performance over all 37 models listed in pretrained models page and compared the performance between cuda A100 and hpu device. Based on the eval code over 100000 sentences over the batch size of [32, 128, 512] the new padding for hpu device shows great performance.
feat
] Add truncation support #2573Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_encode_truncate) under tests/test_sentence_transformer.py also successfully passed.
clip
] Prevent warning withpadding
when tokenizing for CLIP #2599Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_simple_encode ) under tests/test_image_embeddings.py also successfully passed
Welcome for any questions/comments!
Thanks.