Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed is very slow #28

Closed
susht3 opened this issue Nov 17, 2018 · 2 comments
Closed

speed is very slow #28

susht3 opened this issue Nov 17, 2018 · 2 comments

Comments

@susht3
Copy link

susht3 commented Nov 17, 2018

convert samples to features, is very slow

@zhhongzhi
Copy link

Running on a GPU, I find that dumping extracted features takes up most time. So you may optimize it yourself.

@thomwolf
Copy link
Member

Hi, these examples are provided as starting point to write your own training scripts using the package modules. I don't plan to update them any further.

stevezheng23 added a commit to stevezheng23/transformers that referenced this issue Mar 24, 2020
* update kd-quac runner to support ensemble evaluation

* update kd-quac runner to support ensemble evaluation (cont.)

* fix kd issues in kd-quac runner

* update codalab submission pipeline to support single model & ensemble

* update codalab submission pipeline to support single model & ensemble (cont.)

* update codalab submission pipeline to support single model & ensemble (cont.)

* update codalab submission pipeline to support single model & ensemble (cont.) (huggingface#27)

* update codalab submission pipeline to support single model & ensemble (cont.)
jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023
ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023
jonb377 pushed a commit to jonb377/hf-transformers that referenced this issue Nov 3, 2023
Summary:
This pull requst tries to shard every matmul in LLaMA, below is the sharding strategy:

1. up_proj (batch, length, intermediate): mesh (data, None, model)
2. gate_proj (batch, length, intermediate): mesh (data, None, model)
3. down_proj (batch, length, hidden): mesh (data, None, model)
4. query_states (batch, length, hidden): mesh (data, None, model)
5. key_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model)
6. value_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model)
7. attn_weights (batch, num_attention_heads, length, length): mesh (data, model, none, none)
8. attn_output (batch, length, hidden): mesh (data, None, model)
9. hidden_states (batch, length, hidden): mesh (data, None, model)

Test Plan:
Tested on v4-8
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
* always build triton, cuda, exllama kernels. remove unmaintained windows/rocm

* Remove docker-amd and zh readme whichh contains personal notes and unrelated to gptq

* cleanup
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants