-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-858] [Enhancement] Connection is always closed after each query #5489
Comments
Thanks for opening here as well @joshuataylor! I think you're right, the behavior going on here is defined in Today, during a
It then closes each connection as it completes. Risks to be wary of here:
So the goal here would be to reuse / recycle more connections while dbt is running, while still guaranteeing that at the end of a run, dbt always closes all its connections. (In an ideal case, we'd also handle authentication in a single thread, and be able to reuse that auth across multiple threads — but that feels like it might be out of scope for this effort.) At its very simplest, the idea would be:
I think this is likely to be a big lift, requiring a deep dive into some internals of dbt's adapter and execution classes that we haven't touched in some time. I'm not sure when we'll be able to prioritize it. I agree that it feels important. Relevant codeThe context managers. The default behavior is to release a connection as soon as it's done being used. dbt-core/core/dbt/adapters/base/impl.py Lines 209 to 226 in e03d35a
Called once for each node that compiles/runs: dbt-core/core/dbt/task/base.py Line 312 in e03d35a
"Releasing" a connection, which actually means closing it: dbt-core/core/dbt/adapters/base/impl.py Lines 182 to 189 in e03d35a
dbt-core/core/dbt/adapters/base/connections.py Lines 182 to 195 in e03d35a
For a connection with dbt-core/core/dbt/adapters/base/connections.py Lines 123 to 160 in e03d35a
Cleanup that happens at the very end of runnable tasks: dbt-core/core/dbt/task/runnable.py Lines 438 to 439 in e03d35a
|
I'll have a dig through and see what if I can find an elegant solution that hopefully (🤞) won't impact connections such as Spark. As an alternative, in dbt-snowflake we could also check if the connection is closed, the token is valid then reuse the connection, as it should still be set on the handle. As another alternative, if we could solve this on the dbt-snowflake level in the interim this would be a big speed win. We could add the token when logged in to the connection, then add it to connection. This would involve updating the connection contract though, maybe adding a metadata or other key that connectors could use? |
@joshuataylor Ah - so, rather than reusing connections, just reusing the result of authentication? I don't have a good sense of whether there are any potential security risks with taking that approach. If it works, though, and substantially speeds up the process of opening new connections, that our current I'll reopen the |
Yes, for now if we can just reuse the token between requests that should be fine. We still need to make a HTTP request with Snowflake anyway, but using a Keep Alive connection would be faster as we don't have to handshake again. But we can leave that for later. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
We are running into this issue with our dbt on redshift, and it is making building models that are separated out for modularity annoying as each model needs to create a connection to redshift when it is run. If I have 5 separate models that each need a connection, this adds a significant time cost as compared to a single model that only needs one. I'm wondering if there are any plans to prioritize this? |
Could dbt-labs/dbt-snowflake#428 be used? This has been working out great I also have a local dev version that does a few tricks to cache etc, but my hacks should only be used in development :). |
Hi @jtcohen6
Could you please help me confirm this? I'm wondering if I can use 'dbt run' to execute a model. If that model refers to other objects, does it still consume only one connection even when it retrieves data from other objects? |
Adding a "We're still interested" note to this item. |
Is there an existing issue for this?
Current Behavior
Every time a query is executed, it is then closed. This occurs with 1->XX threads, tested up to 16.
When using dbt-snowflake this causes you to have to re-login every time you issue a query, which if you have hundreds of models this can cause a massive slowdown as authentication to Snowflake is slow, especially when you are a long distance away from the server (Perth, AU -> US East 1 is 250ms for example, so having to reconnect every query when you have hundreds of models is unpleasant).
I have logged an issue on Snowflake here - dbt-labs/dbt-snowflake#201 , but I believe this is on the dbt-core level.
Expected Behavior
A single connection is made. It would be even better if a login request for SF was made in a single thread, then that's reused for all. That would also fix MFA as well, I think? But that is out of scope
Steps To Reproduce
You can see it in the logs as well:
Relevant log output
10:24:59.898232 [debug] [ThreadPool]: Opening a new connection, currently in state closed
Environment
What database are you using dbt with?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: