Replies: 1 comment
-
I believe that should be possible yes |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi
I'm looking to distill a model from 70B into 500M, 1.5B and 3B for testing. However this won't fit into a single GPU so is it possible to use fsdp in combination with GKD and also can the student model be trained with Lora
Thanks
Beta Was this translation helpful? Give feedback.
All reactions