Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

ZhengHongming888 · 2024-05-06T17:44:58Z

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training.

This is the follow up PR to #2557. There are few following considerations/modifications in this new PR on hpu device -

new padding algorithm added for hpu device after tokenizer (still padding=True) to give better performance

Here we give one new algorithm to add more padding to align the maximum batch length requirement for hpu graph mode.
The algorithm is to add a little padding based on original features output from tokenizer based on the following padding calculation.
additional_pad_len = 2 ** math.ceil(math.log2(curr_tokenize_len[1])) - curr_tokenize_len[1]

The maximum batch length will be align into few values like [ 2, 4, 8, 16, 32, 64, 128, 256 .. ] which will greatly reduce the tokenizer/forward/cpu_conversion time under hpu graph mode.

We have tested the performance over all 37 models listed in pretrained models page and compared the performance between cuda A100 and hpu device. Based on the eval code over 100000 sentences over the batch size of [32, 128, 512] the new padding for hpu device shows great performance.

hpu support for PR [feat] Add truncation support #2573

Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_encode_truncate) under tests/test_sentence_transformer.py also successfully passed.

hpu support for PR [clip] Prevent warning with padding when tokenizing for CLIP #2599

Test this PR on hpu device/ failed. So we proposed the revision version to support this PR with hpu device. So now the test case of this PR (# test_simple_encode ) under tests/test_image_embeddings.py also successfully passed

Welcome for any questions/comments!

Thanks.

ZhengHongming888 · 2024-05-08T05:07:22Z

@tomaarsen Hi Tom, could you take time to review this PR/merge it? Thanks much!

tomaarsen · 2024-05-08T05:43:47Z

@ZhengHongming888 Yes, I will review it today. Perhaps I can also merge it today, but I don't think I can bring out a release with it today. Is that okay?

ZhengHongming888 · 2024-05-08T06:15:28Z

@tomaarsen That's already very good! haha. :-) Thanks so much for your great support!

sentence_transformers/models/CLIPModel.py

ZhengHongming888 · 2024-05-08T14:26:33Z

@tomaarsen everything you updated seems good for me! .. Thanks for your review!

ZhengHongming888 · 2024-05-09T06:38:39Z

@tomaarsen if everything is ok could you merge it by today? :-) Thanks!

tomaarsen · 2024-05-09T17:27:27Z

@ZhengHongming888 I won't merge it today, I'm afraid. I've realised that

Allow passing model_args to ST #2578 requires users to pass options to the model,
Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630 requires users to pass options to the tokenizer,
SimCSE dropout parameter #2634 requires users to pass options to the config.

I think the best solution is sadly to create three __init__ arguments: model_kwargs, tokenizer_kwargs, and config_kwargs. I won't merge this PR as-is because it adds the padding parameter which I'd like to avoid. Most likely I'll add the kwargs parameters in #2578 and discuss it with @satyamk7054 as he might have some good insight as well.
There are also a lot of public holidays this time of year, so I'm slightly less available too.

Tom Aarsen

ZhengHongming888 · 2024-05-09T17:44:54Z

@tomaarsen but if i remove the request for padding argument in the init part i can do the specific code for hpu side could it be merged for my PR tomorrow? Thanks.

tomaarsen · 2024-05-09T18:37:50Z

@ZhengHongming888 I think so, yes. Feel free to make those changes.

Tom Aarsen

ZhengHongming888 · 2024-05-09T23:19:24Z

@tomaarsen I removed the request for adding padding as one argument in init and also passed all checks. :-) Now it should be ok. So please check and if any further please feel free to let me know. Thanks !

tomaarsen · 2024-05-10T07:44:33Z

sentence_transformers/SentenceTransformer.py

+            device = get_device_name()
+            logger.info("Use pytorch device_name: {}".format(device))
+
+        if device == "hpu":


Can we be sure that optimum-habana is installed here?

good suggestion. i will add the check. thanks.

tomaarsen · 2024-05-10T07:45:45Z

sentence_transformers/SentenceTransformer.py

+                if self.device.type == "hpu":
+                    hpu_graph_out = self.forward(features)
+                    out_features = copy.deepcopy(hpu_graph_out)
+                else:
+                    out_features = self.forward(features)


Suggested change

if self.device.type == "hpu":

hpu_graph_out = self.forward(features)

out_features = copy.deepcopy(hpu_graph_out)

else:

out_features = self.forward(features)

out_features = self.forward(features)

if self.device.type == "hpu":

out_features = copy.deepcopy(out_features)

I would prefer this, could you verify if that works? Admittedly, I'm not sure why you have to copy these.

In order to support the PR (test_encode_truncate() under tests/test_sentence_transformer) #2573, we need the code here. The reason is in original implementation of out_features = self.forward(features) the variables of features/out_features are using one single memory and out_features just adding additional key ("sentence_embedding"). When later on done with Dim_truncate there will be some error in hpu's graph mode because graph mode will take the original output as one reference.

I think what you changed here is good for simplicity.

sentence_transformers/SentenceTransformer.py

ZhengHongming888 · 2024-05-10T15:34:51Z

@tomaarsen thanks much for your suggestions! All are good for me and I have done to add the 'optimum-habana' check and also deepcopy part after forward output for hpu. All checks passed. Thanks for your some holiday time. :-) Sorry for that.

tomaarsen · 2024-05-13T08:08:56Z

I have made one final modification to the code in cc8b7e9, because kwargs is now always empty. I think this is ready now.

Thanks for implementating my suggestions.

Tom Aarsen

ZhengHongming888 · 2024-05-13T13:47:55Z

@tomaarsen yes you are right for kwargs. Thanks for your great support on the PRs! Also thanks for your time! :-)

ZhengHongming888 added 6 commits April 22, 2024 23:02

revision for padding argument and truncate dim test

ebd7ec8

Merge branch 'UKPLab:master' into intel_gaudi_inference_revision

33685e8

Merge branch 'UKPLab:master' into intel_gaudi_inference_revision

d741ccb

Merge branch 'UKPLab:master' into intel_gaudi_inference_revision

b2ea7f0

add new padding for hpu graph mode

8cfeb5e

ruff format

e3df830

tomaarsen reviewed May 8, 2024

View reviewed changes

sentence_transformers/models/CLIPModel.py Outdated Show resolved Hide resolved

Return dict encoding rather than BatchEncoding for CLIPModel

7c83ad7

tomaarsen reviewed May 8, 2024

View reviewed changes

sentence_transformers/models/CLIPModel.py Outdated Show resolved Hide resolved

Remove unused import

1734dd6

tomaarsen mentioned this pull request May 8, 2024

Allow passing model_args to ST #2578

Merged

Merge branch 'UKPLab:master' into intel_gaudi_inference_revision

5e250d8

remove padding argument

bb6260b

ZhengHongming888 added 2 commits May 9, 2024 20:59

modify the graph enable position

ed9df11

ruff format

e7b1ecf

tomaarsen reviewed May 10, 2024

View reviewed changes

sentence_transformers/SentenceTransformer.py Show resolved Hide resolved

ZhengHongming888 added 2 commits May 10, 2024 08:08

add check for optimum install

a33363b

ruff format

02d4382

Simplify tokenization

cc8b7e9

tomaarsen merged commit de3b298 into UKPLab:master May 13, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

ZhengHongming888 commented May 6, 2024 •

edited

Loading

ZhengHongming888 commented May 8, 2024

tomaarsen commented May 8, 2024

ZhengHongming888 commented May 8, 2024

ZhengHongming888 commented May 8, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen commented May 9, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen commented May 9, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen May 10, 2024

ZhengHongming888 May 10, 2024

tomaarsen May 10, 2024

ZhengHongming888 May 10, 2024

ZhengHongming888 commented May 10, 2024

tomaarsen commented May 13, 2024

ZhengHongming888 commented May 13, 2024

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for #2557 #2630

Conversation

ZhengHongming888 commented May 6, 2024 • edited Loading

ZhengHongming888 commented May 8, 2024

tomaarsen commented May 8, 2024

ZhengHongming888 commented May 8, 2024

ZhengHongming888 commented May 8, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen commented May 9, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen commented May 9, 2024

ZhengHongming888 commented May 9, 2024

tomaarsen May 10, 2024

Choose a reason for hiding this comment

ZhengHongming888 May 10, 2024

Choose a reason for hiding this comment

tomaarsen May 10, 2024

Choose a reason for hiding this comment

ZhengHongming888 May 10, 2024

Choose a reason for hiding this comment

ZhengHongming888 commented May 10, 2024

tomaarsen commented May 13, 2024

ZhengHongming888 commented May 13, 2024

ZhengHongming888 commented May 6, 2024 •

edited

Loading