Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allocate tmp based on sgmv kernel if available #2345

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Jul 31, 2024

This PR improves the get_tmp_tensors function to correctly allocate temp tensors based on the sgmv kernel is availability.

This fixes an issue where temp tensors were allocated via the non kernel path, but later checked by a kernel assert.

started

text-generation-launcher \
--model-id meta-llama/Meta-Llama-3-8B-Instruct \
--lora-adapters DavidLanz/Llama3_tw_8B_btc_qlora

req/rep without adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "What are three words to describe you?",
  "parameters": {
    "max_new_tokens": 20
  }
}'
# {"generated_text":" (e.g. funny, outgoing, creative)\nI would say that three words to describe me are"}

req/rep with adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "What are three words to describe you?",
  "parameters": {
    "max_new_tokens": 20,
    "adapter_id": "DavidLanz/Llama3_tw_8B_btc_qlora"
  }
}'
# {"generated_text":" A. Adventurous, B. Creative, C. Curious\nWhat are three words to describe"}%

Copy link
Collaborator

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Narsil Narsil merged commit 4c3f8a7 into main Aug 12, 2024
9 checks passed
@Narsil Narsil deleted the adjust-lora-tmp-tensor-allocation branch August 12, 2024 15:24
yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024
* fix: allocate tmp based on sgmv kernel if available

* fix: re add copy build artifacts step for punica kernels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants