-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valid model.pt for ckpt_path -- Is it a open-source model #100
Comments
But, not sure, what has to be given for ckpt_path, currently I do not have model.pt? Where do I get this? Is it some open-source model available in hugging face etc.? Please let me know. Currently, it is failing with this error. Thanks |
You can find the projector(model.pt) ckpt in the README page. |
Thanks @ddlBoJack .. Will try that |
Thanks @ddlBoJack .. This is solved. But, I am facing another issue. I am running this in a 4 GPU machine. I got cuda OOM, when using only single GPU (through CUDA_VISIBLE_DEVICES="0" ). After this, I tried using all 4 GPU's by setting CUDA_VISIBLE_DEVICES="0,1,2,3", but I still get same cuda OOM. I parallelly monitored nvidia-smi output, and found that the script is using only single GPU, in spite of setting CUDA_VISIBLE_DEVICES="0,1,2,3". Am, I missing anything here. Please suggest. Thanks Stack trace: Error executing job with overrides: ['++model_config.llm_name=vicuna-7b-v1.5', '++model_config.llm_path=lmsys/vicuna-7b-v1.5', '++model_config.llm_dim=4096', '++model_config.encoder_name=wavlm', '++model_config.normalize=true', '++dataset_config.normalize=true', '++model_config.encoder_projector_ds_rate=5', '++model_config.encoder_path=/mnt/efs/manju/if/repos/prompt/slam/models/WavLM-Large.pt', '++model_config.encoder_dim=1024', '++model_config.encoder_projector=linear', '++dataset_config.dataset=speech_dataset', '++dataset_config.val_data_path=/mnt/efs/manju/if/repos/prompt/slam/data/librispeech_slam_test-clean_bidisha.jsonl', '++dataset_config.input_type=raw', '++dataset_config.inference_mode=true', '++train_config.model_name=asr', '++train_config.freeze_encoder=true', '++train_config.freeze_llm=true', '++train_config.batching_strategy=custom', '++train_config.num_epochs=1', '++train_config.val_batch_size=1', '++train_config.num_workers_dataloader=2', '++train_config.output_dir=/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426', '++decode_log=/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/decode_librispeech_test_clean_beam4', '++ckpt_path=/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/model.pt'] |
Does this support multi-GPU? If not how can I resolve this issue. Thanks @ddlBoJack @LauraGPT @chenxie95 |
Currently we use a single GPU for decoding. We have the plan to support Multi-GPU decoding and the script is on the way. |
Ok Thanks.. Does this mean, I should increase the GPU memory and run there? Thanks |
What GPU do you use? You can set batch size to 1, and use half-precision to run the inference, if the GPU memory is limited. |
We use nvidia a10g GPU with 24 GB GPU memory.. Please guide me on how to set these.. Currently, for batch size is set as " ++train_config.val_batch_size=1 " |
Hi @ddlBoJack And, Pls suggest, if there is any better way to make this work on GPU with 24 gb memory. Thanks |
Hi @ddlBoJack @byrTony-Frankzyq |
The minimum GPU we have used is A40 with 48GB memory. |
Would be great if this is documented as either general requirement, will be a great easy start! 🙂 |
Ok thank you @ddlBoJack . When can we expect multi gpu scripts availability? Any tentative ETA? |
Any update on this? @ddlBoJack @byrTony-Frankzyq Thanks |
Hi, we can hardly provide an ETA, since all our contributors are part-time on the project. However, we will try our best to fix the existing bugs and claimed features by users. |
@uni-manjunath-ke I am using the same spec of gpu. Could you let me know how did you run |
Hi, I built the slam_llm docker as given in Readme file and ran this script inside docker. It worked. Hope this will be useful. |
@uni-manjunath-ke thank you for the reply! How did you do differently when you got OOM? When you didn't use docker, did you get OOM? I encountered OOM issue even with 'train config.use fp16=true'. |
I used 40 GB single GPU with docker. With this, I dint get OOM |
I used 4 46 GB GPUs. I still get an OOM error all the time, how should I fix this? |
System Info
I am trying to run this: bash decode_wavlm_large_linear_vicuna_7b.sh
But, not sure, what has to be given for ckpt_path, currently I do not have model.pt? Where do I get this? Is it some open-source model available in hugging face etc.? Please let me know. Currently, it is failing with this error. Thanks @byrTony-Frankzyq
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/model.pt'
Thanks
Information
🐛 Describe the bug
bash decode_wavlm_large_linear_vicuna_7b.sh
Error logs
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/efs/manju/if/repos/prompt/slam/output/vicuna-7b-v1.5-librispeech-linear-steplrwarmupkeep1e-4-wavlm-large-20240426/asr_epoch_1_step_1000/model.pt'
Expected behavior
It is expected to produce output
The text was updated successfully, but these errors were encountered: