Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow converting HF Falcon models with only one shard in memory at a time #1

Merged

Conversation

KerfuffleV2
Copy link

Some changes to the conversion script to allow loading just one part of the multi-part model into memory at a time. From what I have heard, Transformers shards don't split tensors so this should be safe.

Without this change, I wasn't able to convert the real 40B Falcon model in 64GB RAM. I was able to verify that the mini-Shakespeare model gets converted correctly and I get reasonable output from the quantized version.

@cmp-nct
Copy link

cmp-nct commented Jun 14, 2023

Just as a sidenote, I could convert the 40B model on 64GB RAM, you just need plenty of fast swap and a bit patience.
It heavily randomly swaps for a while which stabilizes (on windows at least) and then it finishes.

In any case, great work to split it up :)

@KerfuffleV2
Copy link
Author

you just need plenty of fast swap and a bit patience.

It's a shame I possess neither of those things!

But yeah, assuming you're willing to throw enough time and swap at the problem there's no memory issue that's insurmountable. :)

@KerfuffleV2 KerfuffleV2 force-pushed the feat-improve-falcon-convert-hf branch from bc4dadb to ac64e94 Compare June 14, 2023 21:22
@KerfuffleV2 KerfuffleV2 changed the title Allow converting Falcon models one part at a time Allow converting HF Falcon models with only one shard in memory at a time Jun 14, 2023
@jploski jploski merged commit cc8ac10 into jploski:falcon40b Jun 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants