Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High gpu consumption #18

Open
leesean5150 opened this issue Oct 17, 2024 · 4 comments
Open

High gpu consumption #18

leesean5150 opened this issue Oct 17, 2024 · 4 comments

Comments

@leesean5150
Copy link

Hi,

I've been trying to integrate postgresml into a fastapi backend using korvus, but I have been facing issues with gpu resources. the gpu i am using is a mobile NVIDIA GeForce RTX 2060 with 6gb vram.

On initialisation of the backend, i create a collection and pipeline using the class definitions that korvus provides (they are in separate functions but ill place it in order of execution):

vector_collection = Collection("file")

vector_pipeline = Pipeline(
            "splitter",
            {
                "file_contents": {
                    "splitter": {"model": "recursive_character"},
                    "semantic_search": {
                        "model": "mixedbread-ai/mxbai-embed-large-v1",
                    },
                },
            },
        )

await vector_collection.add_pipeline(vector_pipeline)

So everytime i start up my docker compose (which includes a postgresml service and a python fastapi service), my gpu usage maxes out to about 93%, before i get an OutOfMemory exception being thrown in the backend. After restarting the postgresml service, the gpu usage drops back down, and if i restart the fastapi service again, it will startup fine. Occasionally, the gpu usage will max out again, and i need to restart the postgresml service again, but once this happens, the endpoints that include upserting documents and vector searching for documents work perfectly fine. Considering all of these, I am wondering if the issue could be due to garbage collection since resources that can be freed are not being freed up. Are there any workarounds for this or is my implementation incorrect?

@SilasMarvin
Copy link
Collaborator

SilasMarvin commented Oct 17, 2024

There could be a few things going on here.

mixedbread-ai/mxbai-embed-large-v1 is a relatively large embedding model. I would suggest using a smaller one like intfloat/e5-small-v2. This will help a ton with GPU usage.

FastAPI spawns multiple processes where each process creates a seperate connection to Postgres. Each of these separate connections is loading in a new instance of your embedding model. Switching to the smaller model will help a ton when FastAPI spins up multiple processes to handle connections. I believe you can also limit the number of processes FastAPI spawns but I don't use it so I can confirm how to do it.

Depending on the size and number of documents you are upserting, it may be worth limiting the batch size Korvus uses to process them:

await vector_collection.upsert_documents(YOUR_DOCUMENTS, {"batch_size": 10});

The default batch_size is 100 so lowering this should help significantly reduce the gpu usage.

@leesean5150
Copy link
Author

leesean5150 commented Oct 18, 2024

Thank you so much for your response, really appreciate it.

Noted on the model size as well as limiting the batch size for processing, I will certainly try that out.

However, this issue is not just restricted to my testing with the fastapi server, but seems to be the case when using a jupyter notebook. The gpu issues persist when i build the postgresml image from postgresml github master branch, where it would immediately reach 93% of gpu memory, before needing a restart to work with korvus. This is also the case when running the postgresml container by itself.

I have also tried 2 other images: ghcr.io/postgresml/postgresml:2.9.3 and ghcr.io/postgresml/postgresml:2.7.12, that dont seem to have the gpu issues, but there will be an error adding the pipeline to the collection even when using code from the documentation:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[2], [line 21](vscode-notebook-cell:?execution_count=2&line=21)
      [8](vscode-notebook-cell:?execution_count=2&line=8) # Initialize our Pipeline
      [9](vscode-notebook-cell:?execution_count=2&line=9) # Our Pipeline will split and embed the `text` key of documents we upsert
     [10](vscode-notebook-cell:?execution_count=2&line=10) pipeline = Pipeline(
     [11](vscode-notebook-cell:?execution_count=2&line=11)     "v1",
     [12](vscode-notebook-cell:?execution_count=2&line=12)     {
   (...)
     [19](vscode-notebook-cell:?execution_count=2&line=19)     },
     [20](vscode-notebook-cell:?execution_count=2&line=20) )
---> [21](vscode-notebook-cell:?execution_count=2&line=21) await collection.add_pipeline(pipeline)

Exception: error communicating with database: unexpected end of file

That being said, I assume that the postgresml main branch docker is the most up to date, which is why it works with korvus. Therefore, I was wondering if there was a way to implement the drop in gpu usage by postgresml after restarting the service without actually needing to restart the service, because the amount of gpu memory that is freed up is quite a lot.

@leesean5150
Copy link
Author

Ahh okay, i think the high consumption of gpu resources was due to the dashboard. If i don't run the dashboard, postgresml consumes the same amount of gpu resources as after i restart the service in my previous configuration, and my fastapi endpoints work fine without gpu memory issues. Thank you so much for your help and suggestions above!

@leesean5150
Copy link
Author

leesean5150 commented Oct 23, 2024

Hello again,

similar context to what was mentioned above, but this time i want to try implementing gpu cache clearing (SELECT pgml.clear_gpu_cache();) in the backend directly, but i couldnt find any documentation from korvus side to run the sql command.moreover, when running the command directly in the postgresml container, the gpu usage does not drop at all, so i was wondering if there is a way to run the command from korvus/python or any other workaround for this.

thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants