Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple GPUS #47

Open
qspang opened this issue Jan 20, 2024 · 9 comments
Open

Multiple GPUS #47

qspang opened this issue Jan 20, 2024 · 9 comments

Comments

@qspang
Copy link

qspang commented Jan 20, 2024

When I tested llama2-70b on an A800 graphics card, I encountered the problem of insufficient video memory. I want to ask how I should write the command if I want to test on two A800 graphics cards?I tried this command again:
微信截图_20240120182752
But an error was reported:
1705746440612

@Viol2000
Copy link
Collaborator

Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.

@qspang
Copy link
Author

qspang commented Jan 20, 2024

Quick fix: https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/eval_mtbench.py#L511 in this line set DIST_WORKERS=0. I will do a refactor later to fix this thoroughly.

It is now running normally. Thanks to the author for his timely reply! ! !

@qspang
Copy link
Author

qspang commented Jan 20, 2024

btw,do you have code for testing accuracy like the medusa project?https://github.com/FasterDecoding/Medusa/blob/v1.0-prerelease/medusa/eval/heads_accuracy.py

@Viol2000
Copy link
Collaborator

I did not implement such a function, but it may not be too hard to compute accuracy by dividing the number of per-step accepted tokens by the number of per-step speculation tokens.

@qspang
Copy link
Author

qspang commented Jan 20, 2024

It can run normally when USE_LADE is set to 1.When I just used multiple GPUs to test without using LADE, it turned out that it was out of memory. Is there something wrong somewhere?The following is my execution command:
image
The error is reported as follows:
image

@Viol2000
Copy link
Collaborator

When USE_LADE is set to 0, please set --use-pp=1 to use huggingface's pipeline parallelism, or you can use the deepspeed for tensor parallelism as the script I provided https://github.com/hao-ai-lab/LookaheadDecoding/blob/main/applications/run_mtbench.sh#L33

@qspang
Copy link
Author

qspang commented Jan 20, 2024

Fixed it!Thank you for your patient reply! If I use a smaller model such as llama2-7b-chat and set USE_LADE to 0, what will be the impact if --use-pp is also set to 0?

@Viol2000
Copy link
Collaborator

@qspang
Copy link
Author

qspang commented Jan 20, 2024

ok,got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants