Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Hardware requirement #12

Open
saki-37 opened this issue Mar 18, 2024 · 2 comments
Open

Questions about Hardware requirement #12

saki-37 opened this issue Mar 18, 2024 · 2 comments

Comments

@saki-37
Copy link

saki-37 commented Mar 18, 2024

Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!

@qijimrc
Copy link
Collaborator

qijimrc commented Mar 20, 2024

Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!

Hi, thanks for your interest! I am currently trying to investigate this quantization problem.

@zilunzhang
Copy link

zilunzhang commented Apr 1, 2024

Same here. Is there an approximate estimation for VRAM usage? (<20GB or ~24GB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants