Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) #2557

Merged
merged 13 commits into from
Apr 8, 2024

Conversation

ZhengHongming888
Copy link
Contributor

This PR belongs to one of enabling Intel's Gaudi2 GPU supported tasks for Sentence Transformer's inference/training.

This is the first PR including the items as below -

  1. Add 'hpu' device name in get_device_name() under sentence_transformers/util.py
  2. Add padding strategy argument in tokenize() for Gaudi2 device in sentence_transformers/SentenceTransformer.py and sentence_transformers/models/Transformer.py. Keep the original padding strategy for cuda/cpu as the first priority.
  3. Enable graph mode for Gaudi2 device with better performance in sentence_transformers/SentenceTransformer.py.

There is no modification for any inference examples which will seamlessly choose the default device like 'cuda' in cuda system, 'hpu' in Gaudi2 system, neither of above two will choose 'cpu', etc.

Welcome for any questions/comments!

Thanks.

@tomaarsen
Copy link
Collaborator

Hello!

Thanks for the PR. I've taken a few minutes to fix some of the things we talked about during our meeting. Feel free to look at the individual commits to get a feeling for the changes.
I haven't yet had time to test this on a HPU device. Perhaps you can give it a try yourself after my updates?

  • Tom Aarsen

@ZhengHongming888
Copy link
Contributor Author

Thanks Tom for the modification which are all very good especially for the "padding" argument! :-)

Also i have tested all test cases under sentence-transformers/tests and all passed for your commit in my machine with 'hpu' device.

Besides your commits i also made a little change. I moved the initialization of HPU graph mode for hpu device from init() into encode() part and initialize only once time. The enable for HPU training will be different from inference and will enable later for training side. I also make a little change for tests/test_compute_embeddings.py due to the new padding argument.

So right now from my side i can pass all test cases. please help check. ..

Thanks.
Hongming

@tomaarsen
Copy link
Collaborator

I think this is looking good now, thanks for these changes!

  • Tom Aarsen

@tomaarsen tomaarsen merged commit 1cee15c into UKPLab:master Apr 8, 2024
9 checks passed
tomaarsen added a commit that referenced this pull request May 13, 2024
… ( 'hpu' ) - Follow up for #2557 (#2630)

* revision for padding argument and truncate dim test

* add new padding for hpu graph mode

* ruff format

* Return dict encoding rather than BatchEncoding for CLIPModel

* Remove unused import

* remove padding argument

* modify the graph enable position

* ruff format

* add check for optimum install

* ruff format

* Simplify tokenization

---------

Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants