Multiple GPUS #47

qspang · 2024-01-20T10:28:55Z

When I tested llama2-70b on an A800 graphics card, I encountered the problem of insufficient video memory. I want to ask how I should write the command if I want to test on two A800 graphics cards?I tried this command again：

But an error was reported：

Viol2000 · 2024-01-20T11:43:50Z

Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.

qspang · 2024-01-20T12:14:26Z

Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.

It is now running normally. Thanks to the author for his timely reply! ! !

qspang · 2024-01-20T13:22:03Z

btw,do you have code for testing accuracy like the medusa project?https://github.com/FasterDecoding/Medusa/blob/v1.0-prerelease/medusa/eval/heads_accuracy.py

Viol2000 · 2024-01-20T13:53:26Z

I did not implement such a function, but it may not be too hard to compute accuracy by dividing the number of per-step accepted tokens by the number of per-step speculation tokens.

qspang · 2024-01-20T14:51:08Z

It can run normally when USE_LADE is set to 1.When I just used multiple GPUs to test without using LADE, it turned out that it was out of memory. Is there something wrong somewhere?The following is my execution command：

The error is reported as follows：

Viol2000 · 2024-01-20T14:56:32Z

When USE_LADE is set to 0, please set --use-pp=1 to use huggingface's pipeline parallelism, or you can use the deepspeed for tensor parallelism as the script I provided https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L33

qspang · 2024-01-20T15:07:03Z

Fixed it!Thank you for your patient reply! If I use a smaller model such as llama2-7b-chat and set USE_LADE to 0, what will be the impact if --use-pp is also set to 0?

Viol2000 · 2024-01-20T15:13:04Z

I guess it will be put on a single gpu. Check this https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L214 and https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L250 for model placement configuration.

qspang · 2024-01-20T15:14:43Z

ok,got it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple GPUS #47

Multiple GPUS #47

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

Multiple GPUS #47

Multiple GPUS #47

Comments

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024

Viol2000 commented Jan 20, 2024

qspang commented Jan 20, 2024