-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed is very slow #28
Comments
Running on a GPU, I find that dumping extracted features takes up most time. So you may optimize it yourself. |
Hi, these examples are provided as starting point to write your own training scripts using the package modules. I don't plan to update them any further. |
stevezheng23
added a commit
to stevezheng23/transformers
that referenced
this issue
Mar 24, 2020
* update kd-quac runner to support ensemble evaluation * update kd-quac runner to support ensemble evaluation (cont.) * fix kd issues in kd-quac runner * update codalab submission pipeline to support single model & ensemble * update codalab submission pipeline to support single model & ensemble (cont.) * update codalab submission pipeline to support single model & ensemble (cont.) * update codalab submission pipeline to support single model & ensemble (cont.) (huggingface#27) * update codalab submission pipeline to support single model & ensemble (cont.)
jameshennessytempus
pushed a commit
to jameshennessytempus/transformers
that referenced
this issue
Jun 1, 2023
1 task
ocavue
pushed a commit
to ocavue/transformers
that referenced
this issue
Sep 13, 2023
jonb377
pushed a commit
to jonb377/hf-transformers
that referenced
this issue
Nov 3, 2023
Summary: This pull requst tries to shard every matmul in LLaMA, below is the sharding strategy: 1. up_proj (batch, length, intermediate): mesh (data, None, model) 2. gate_proj (batch, length, intermediate): mesh (data, None, model) 3. down_proj (batch, length, hidden): mesh (data, None, model) 4. query_states (batch, length, hidden): mesh (data, None, model) 5. key_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model) 6. value_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model) 7. attn_weights (batch, num_attention_heads, length, length): mesh (data, model, none, none) 8. attn_output (batch, length, hidden): mesh (data, None, model) 9. hidden_states (batch, length, hidden): mesh (data, None, model) Test Plan: Tested on v4-8
ZYC-ModelCloud
pushed a commit
to ZYC-ModelCloud/transformers
that referenced
this issue
Nov 14, 2024
* always build triton, cuda, exllama kernels. remove unmaintained windows/rocm * Remove docker-amd and zh readme whichh contains personal notes and unrelated to gptq * cleanup
ZYC-ModelCloud
pushed a commit
to ZYC-ModelCloud/transformers
that referenced
this issue
Nov 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
convert samples to features, is very slow
The text was updated successfully, but these errors were encountered: