Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

Merged

Conversation

ZhengHongming888
Copy link
Contributor

@ZhengHongming888 ZhengHongming888 commented May 6, 2024

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training.

This is the follow up PR to #2557. There are few following considerations/modifications in this new PR on hpu device -

  1. new padding algorithm added for hpu device after tokenizer (still padding=True) to give better performance

Here we give one new algorithm to add more padding to align the maximum batch length requirement for hpu graph mode.
The algorithm is to add a little padding based on original features output from tokenizer based on the following padding calculation.
additional_pad_len = 2 ** math.ceil(math.log2(curr_tokenize_len[1])) - curr_tokenize_len[1]

The maximum batch length will be align into few values like [ 2, 4, 8, 16, 32, 64, 128, 256 .. ] which will greatly reduce the tokenizer/forward/cpu_conversion time under hpu graph mode.

We have tested the performance over all 37 models listed in pretrained models page and compared the performance between cuda A100 and hpu device. Based on the eval code over 100000 sentences over the batch size of [32, 128, 512] the new padding for hpu device shows great performance.

  1. hpu support for PR [feat] Add truncation support #2573

Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_encode_truncate) under tests/test_sentence_transformer.py also successfully passed.

  1. hpu support for PR [clip] Prevent warning with padding when tokenizing for CLIP #2599

Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_simple_encode ) under tests/test_image_embeddings.py also successfully passed

Welcome for any questions/comments!

Thanks.

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen Hi Tom, could you take time to review this PR/merge it? Thanks much!

@tomaarsen
Copy link
Collaborator

@ZhengHongming888 Yes, I will review it today. Perhaps I can also merge it today, but I don't think I can bring out a release with it today. Is that okay?

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen That's already very good! haha. :-) Thanks so much for your great support!

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen everything you updated seems good for me! .. Thanks for your review!

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen if everything is ok could you merge it by today? :-) Thanks!

@tomaarsen
Copy link
Collaborator

@ZhengHongming888 I won't merge it today, I'm afraid. I've realised that

I think the best solution is sadly to create three __init__ arguments: model_kwargs, tokenizer_kwargs, and config_kwargs. I won't merge this PR as-is because it adds the padding parameter which I'd like to avoid. Most likely I'll add the kwargs parameters in #2578 and discuss it with @satyamk7054 as he might have some good insight as well.
There are also a lot of public holidays this time of year, so I'm slightly less available too.

  • Tom Aarsen

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen but if i remove the request for padding argument in the init part i can do the specific code for hpu side could it be merged for my PR tomorrow? Thanks.

@tomaarsen
Copy link
Collaborator

@ZhengHongming888 I think so, yes. Feel free to make those changes.

  • Tom Aarsen

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen I removed the request for adding padding as one argument in init and also passed all checks. :-) Now it should be ok. So please check and if any further please feel free to let me know. Thanks !

device = get_device_name()
logger.info("Use pytorch device_name: {}".format(device))

if device == "hpu":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that optimum-habana is installed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. i will add the check. thanks.

Comment on lines 415 to 419
if self.device.type == "hpu":
hpu_graph_out = self.forward(features)
out_features = copy.deepcopy(hpu_graph_out)
else:
out_features = self.forward(features)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.device.type == "hpu":
hpu_graph_out = self.forward(features)
out_features = copy.deepcopy(hpu_graph_out)
else:
out_features = self.forward(features)
out_features = self.forward(features)
if self.device.type == "hpu":
out_features = copy.deepcopy(out_features)

I would prefer this, could you verify if that works? Admittedly, I'm not sure why you have to copy these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support the PR (test_encode_truncate() under tests/test_sentence_transformer) #2573, we need the code here. The reason is in original implementation of out_features = self.forward(features) the variables of features/out_features are using one single memory and out_features just adding additional key ("sentence_embedding"). When later on done with Dim_truncate there will be some error in hpu's graph mode because graph mode will take the original output as one reference.

I think what you changed here is good for simplicity.

@ZhengHongming888
Copy link
Contributor Author

@tomaarsen thanks much for your suggestions! All are good for me and I have done to add the 'optimum-habana' check and also deepcopy part after forward output for hpu. All checks passed. Thanks for your some holiday time. :-) Sorry for that.

@tomaarsen
Copy link
Collaborator

I have made one final modification to the code in cc8b7e9, because kwargs is now always empty. I think this is ready now.

Thanks for implementating my suggestions.

  • Tom Aarsen

@tomaarsen tomaarsen merged commit de3b298 into UKPLab:master May 13, 2024
9 checks passed
@ZhengHongming888
Copy link
Contributor Author

@tomaarsen yes you are right for kwargs. Thanks for your great support on the PRs! Also thanks for your time! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants