Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add python/pytorch version compat notes #44

Merged
merged 1 commit into from
Mar 12, 2023

Conversation

wizzard0
Copy link
Contributor

see #32 (comments)

@ggerganov ggerganov merged commit b9bd1d0 into ggml-org:master Mar 12, 2023
44670 pushed a commit to 44670/llama.cpp that referenced this pull request Aug 2, 2023
* RAM usage reduction and calculations
Removed -b batch limit (1024) (tested up to-b 8192)
Fixed a integer overflow in ggml matmul (happened at around nbatch 3000)
Added a dynamic calculation for batched scratch memory consumption
Overall reduced RAM buffer sizes by magnitudes for normal settings
RAM usage scales quadratically with increasing context size * batch
Using a small batch (or default 1) will result in a very small memory footprint even at thousands of tokens processed
Tested up to 13,000 tokens prompt and 8k batch
Needs more tests on various platforms

* removed debug

* minor

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants