Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Postgres - Should I manually close connections ? #1869

Closed
gaspardc-met opened this issue May 17, 2024 · 6 comments · Fixed by #1919
Closed

Python Postgres - Should I manually close connections ? #1869

gaspardc-met opened this issue May 17, 2024 · 6 comments · Fixed by #1919
Labels
Type: question Usage question

Comments

@gaspardc-met
Copy link

gaspardc-met commented May 17, 2024

What would you like help with?

Hello,

I have recently started using adbc_driver_postgresql and adbc-driver-manager along with pandas read_sql to query data from a Postgres database.
With SQLalchemy, everyone mentions the Connection Pool that should help with my case, but no trace of a Connection Pool in the ADBC docs.
When my python webapp starts, several different database requests are made, most to the same table.

I do not know what are the best practices or the most efficient way to handle ADBC connections.

  • Should I use a context manager and close the connection after one use everytime ?
    This is the recommended use in the Pandas documentation (above 2.2.0)
    However, the dbapi.connect is long, always longer than the actual read_sql (see attached image for a profiling example), and is executed everytime.
    image

    from adbc_driver_postgresql import dbapi
    
    with dbapi.connect(uri) as adbc_conn:
        return read_sql(query, adbc_conn)
  • Should I just leave the connection open ?

  • Should I cache the connection resource ?

I'm sure that like any engineering problem there are tradeoffs to all of these, but I do not have comparison points here apart from execution time

thanks !

Environment:

  • python 3.11
  • pandas == 2.2.2
  • adbc_driver_postgresql==0.11.0
  • adbc-driver-manager==0.11.0
@gaspardc-met gaspardc-met added the Type: question Usage question label May 17, 2024
@lidavidm
Copy link
Member

You should reuse connections.

ADBC does not implement connection pooling. Use a dedicated connection pooling library on top of ADBC connections.

@gaspardc-met
Copy link
Author

Hello,

Thanks for your reply. I couldn't find resources on this. Should I use psycopg for the connection pooling ?
Since ADBC does not implement connection pooling, maybe this could be documented somewhere as an example in the docs if it's the expected practice ?

@lidavidm
Copy link
Member

I don't think in general DBAPI drivers are expected to implement connection pooling, so it's not really a consideration in the first place.

I'm not sure psycopg's pool will work with anything except psycopg. SQLAlchemy and others provide connection pools, but which one you want to use will depend on your application.

@lidavidm lidavidm added this to the ADBC Libraries 13 milestone May 25, 2024
@lidavidm
Copy link
Member

I think what we can do here is demo how to use a connection pool (like SQLAlchemy) with the ADBC drivers in the documentation.

lidavidm added a commit to lidavidm/arrow-adbc that referenced this issue Jun 14, 2024
@lidavidm
Copy link
Member

There's now an example here: #1919

lidavidm added a commit to lidavidm/arrow-adbc that referenced this issue Jun 14, 2024
@gaspardc-met
Copy link
Author

Very interesting, thank you
I would not have thought of doing this, so the doc is clearly beneficial 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: question Usage question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants