speed is very slow #28

susht3 · 2018-11-17T06:51:54Z

convert samples to features, is very slow

zhhongzhi · 2018-11-17T11:46:00Z

Running on a GPU, I find that dumping extracted features takes up most time. So you may optimize it yourself.

thomwolf · 2018-11-17T22:02:38Z

Hi, these examples are provided as starting point to write your own training scripts using the package modules. I don't plan to update them any further.

* update kd-quac runner to support ensemble evaluation * update kd-quac runner to support ensemble evaluation (cont.) * fix kd issues in kd-quac runner * update codalab submission pipeline to support single model & ensemble * update codalab submission pipeline to support single model & ensemble (cont.) * update codalab submission pipeline to support single model & ensemble (cont.) * update codalab submission pipeline to support single model & ensemble (cont.) (huggingface#27) * update codalab submission pipeline to support single model & ensemble (cont.)

ra

Typegen

Summary: This pull requst tries to shard every matmul in LLaMA, below is the sharding strategy: 1. up_proj (batch, length, intermediate): mesh (data, None, model) 2. gate_proj (batch, length, intermediate): mesh (data, None, model) 3. down_proj (batch, length, hidden): mesh (data, None, model) 4. query_states (batch, length, hidden): mesh (data, None, model) 5. key_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model) 6. value_states (batch, length, hidden / attention_heads * key_value_heads): mesh (data, None, model) 7. attn_weights (batch, num_attention_heads, length, length): mesh (data, model, none, none) 8. attn_output (batch, length, hidden): mesh (data, None, model) 9. hidden_states (batch, length, hidden): mesh (data, None, model) Test Plan: Tested on v4-8

* always build triton, cuda, exllama kernels. remove unmaintained windows/rocm * Remove docker-amd and zh readme whichh contains personal notes and unrelated to gptq * cleanup

thomwolf closed this as completed Nov 17, 2018

maeotaku mentioned this issue May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this issue Jun 1, 2023

Merge pull request huggingface#28 from huggingface/main

1d5a2fe

ra

lwmlyy mentioned this issue Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ocavue pushed a commit to ocavue/transformers that referenced this issue Sep 13, 2023

Merge pull request huggingface#28 from chelouche9/typegen

0a77a4e

Typegen

ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024

update README.md (huggingface#28)

0c1cbed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed is very slow #28

speed is very slow #28

susht3 commented Nov 17, 2018

zhhongzhi commented Nov 17, 2018

thomwolf commented Nov 17, 2018

speed is very slow #28

speed is very slow #28

Comments

susht3 commented Nov 17, 2018

zhhongzhi commented Nov 17, 2018

thomwolf commented Nov 17, 2018