-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingestion of documents with Ollama is incredibly slow #1691
Comments
I have the exact same issue with the ollama embedding mode pre--configured in the file I ingested my documents with a reasonable (much faster) speed with the huggingface embedding mode. |
Interesting. Ollama embedding model is way bigger than the default huggingface one, may be the main cause. Dimensionality of vectors is double in Ollama's embedding model |
I can confirm a performance degradation on 0.4.0 when running with this : |
Embedding model changes:
|
Thanks @dbzoo but I think it might be more than just that. During the 60+ min it was ingesting, there was a very modest resource utilisation: At least one of those resources above should have been very high (on average) during those 60+ minutes while processing that small PDF before I decided to cancelled it. Note: cc: @imartinez |
@iotnxt maybe Ollama's support for embeddings models is not fully optimized yet. Could be the case. Go back to Huggingface embeddings for intensive use cases. About the feature request, feel free to contribute through a PR! Being transparent, the roadmap is full of functional improvements, and the "progress bar" would never be prioritized - it is perfect for a contribution though |
For me it's very slow too, and I keep getting the error below after a certain amount of time:
|
There's an issue with ollama + nomic-embed-text. Fixed but not yet released. Using ollama 0.1.29 fixed the issue for me. |
+1. It takes ~2s to generate embeddings for a 4 word phrase |
Ollama v0.1.30 has recently been released. Is this issue still reproducible in that version? |
I seem to have the same or a very similar problem with "ollama" default settings and running ollama v0.1.32. The console says I get parsing nodes: ~1000 it/s, and generating embeddings: ~ 2s/it The strange thing is, that it seems that private-gpt/ollama are using hardly any of the available resources. CPU < 4%, Memory < 50%, GPU < 4% processing (1.5/12GB GPU memory), Disk <1%, etc on a Intel i7- I3700K, 32GB Ram, RTX 4070 Example output from the console log: |
Still excruciatingly slow with it barely hitting GPU. Embeddings are ~8 it/s on a 3080. It -does- use the GPU, I confirmed that much. If I double the number of workers, it halves the it/s performance. So there is zero recourse there. ollama 0.1.33-rc6 so that patch would have been applied. |
Can confirm, using Any fixes @imartinez ? |
I noticed the same when using http API and python interface. The server says it took <50ms (CPU), so I'm guessing the problem is with detecting that the response is complete. Setting my request timeout to 100ms makes each request take 100ms. If I use fetch() in nodejs, the response takes <30ms. I've never used private-gpt, but I'm guessing it's the same problem EDIT: The python request is fast if I use http://127.0.0.1 rather than http://localhost |
Hi, You can set a progress bar in the console, by editing in ui.py: Instead this line (345):
put this one:
by using tdqm you´ll be able to see something like this in the console: Ingesting files: 40%|████ | 2/5 [00:38<00:49, 16.44s/it]14:10:07.319 [INFO ] private_gpt.server.ingest.ingest_service - Ingesting Don´t forget to import the library: from tqdm import tqdm I´ll probablly integrate it in the UI in the future. Have some other features that may be interesting to @imartinez Cheers |
Hmmm it seems to go until it halts the ollama service, additionally it's lazy-loading, so the upload process doesn't begin until I supply a prompt:
|
I upgraded to the last version of privateGPT and the ingestion speed is much slower than in previous versions. It is so slow to the point of being unusable.
I use the recommended ollama possibility. More than 1 h stiil the document is not finished. I have 3090 and 18 core CPU. And I am using the very small Mistral.
I am ingesting 105 kb pdf file. 37 pages of text
Later I switched to less recommended 'llms-llama-cpp' option in PrivateGP. The problem was solved. But still is anyway to have fast ingetion with Ollama?
The text was updated successfully, but these errors were encountered: