GPT4ALL Python Lib vs Installer #2630
vaibhav-college
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Greetings all,
I noticed something very weird on using GPT4ALL on my GPU (RTX 3050Ti Laptop GPU, 6GB VRAM and 16 GB RAM). I was creating a project for my end-semester project presentation. The LLM's task was to create a PDF-Reading application for summarization of Offline or Online Data. I am a newbie when it comes to LLM, but I am familiar with fundamentals of AI/ML/DL. I apologize in advanced if there's a similar topic, I have done my due research but could not find it in discussions.
I was using Llama 3 8B instruct model (Meta-Llama-3-8B-Instruct.Q4_0.gguf), and I barely used to get 0.8-0.08 tokens per second on my python library execution on the CPU. When I switched to CUDA I got somewhat of an improvement but not by a whole lot.
My average execution time was 1-2 minutes. With CUDA I was able to cut it down to 40 seconds or 1:30 minutes. Sometimes raise it up to 4 minutes in pure randomness (Probably due to complex prompts idk).
Now, recently I reinstalled GPT4ALL application. Loved the UI etc. I got 3.8-4.4 TPS (Tokens per second) while I had some GPU intensive game running the background. Switched to GPU and turned off the intensive application and I got the sweet 16 TPS which I have been searching since January 2024.
I had tried all forms of optimization by cutting down my context length, max tokens, etc etc. All of hyperparameter tuning and proper CUDA installation etc. Still I am not sure what had happened on my Python file. I eventually abandoned the use of GPT4ALL for Llama library itself.
My main question is, how is the Application on GPT4ALL able to optimize the LLM model like it's on steroids? is it possible for me to do the same optimization on the Python script?
I basically used the same hyper parameters the installer used and tuned them a bit, but they were close to the default values which are shared in the image.
Beta Was this translation helpful? Give feedback.
All reactions