Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2538] [Feature] Reuse connections on more adapters (specifically postgres) #83

Closed
2 tasks done
ckdake opened this issue May 5, 2023 · 15 comments
Closed
2 tasks done
Labels
enhancement New feature or request Stale

Comments

@ckdake
Copy link

ckdake commented May 5, 2023

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

During a dbt run, dbt repeatedly closes and opens connections to the database in use. We are using IAM authentication for postgres, so this means credentials are only valid for 15 minutes from their creation, so our dbt runs are failing ~15 minutes in.

This behavior was changed resolving dbt-labs/dbt-core#2645 . That issue impacts Snowflake, but the fix was made in a global way that impacts all databases.

A great fix here would be to apply this "force close" only to Snowflake.

Expected/Previous Behavior

When dbt starts up, each thread should open a connection to postgres using the credentials provided. The thread should maintain that connection to postgres until it catches an exception, or the thread terminates.

Steps To Reproduce

  1. set up dbt with a postgres database
  2. run dbt with --debug
  3. Note the repeated "Opening a new connection, currently in state closed" messages.

To really break things, use a RDS database with IAM credentials, and watch that reconnect cause a failure due to expired credentials after ~15 minutes

Relevant log output

No response

Environment

- OS:
- Python:3.11.1
- dbt (working version): (prior to https://github.com/dbt-labs/dbt-core/issues/2645 being merged. old!)
- dbt (regression version): 1.4.5

Which database adapter are you using with dbt?

postgres

Additional Context

I attempted to talk through this in slack here: https://getdbt.slack.com/archives/CMZ2Q9MA9/p1683290792489689

@ckdake ckdake added bug Something isn't working regression triage labels May 5, 2023
@github-actions github-actions bot changed the title [Regression] unnecessarily closes/opens connections to backends, specifically postgres [CT-2538] [Regression] unnecessarily closes/opens connections to backends, specifically postgres May 5, 2023
@ckdake
Copy link
Author

ckdake commented May 5, 2023

This is very similar to dbt-labs/dbt-core#5489

@dbeatty10
Copy link
Contributor

Thanks for opening this @ckdake !

Indeed, dbt-labs/dbt-core#5489 does look similar.

Did you get a chance to read the following?

I haven't done a deep-dive here to understand all the key details, but I wonder if dbt-labs/dbt-snowflake#428 addresses the issue for Snowflake enough for us to undo the relevant portions of dbt-labs/dbt-core#2645.

@dbeatty10 dbeatty10 changed the title [CT-2538] [Regression] unnecessarily closes/opens connections to backends, specifically postgres [CT-2538] [Bug] unnecessarily closes/opens connections to backends, specifically postgres May 5, 2023
@ckdake
Copy link
Author

ckdake commented May 5, 2023

I'm doing more of a deep dive here than planned, and learning a lot :)

It looks like dbt-labs/dbt-core#2645 was done to fix a particular thing for a particular snowflake use case. This was done in core and applies to all databases.

Later, https://github.com/dbt-labs/dbt-snowflake/pull/428/files was done to allow not doing this behavior, specifically for snowflake, and done in a way that only applies to snowflake.

There is a lot of good discussion in dbt-labs/dbt-core#5489 and the solution there for a default out-of-the-box configuration is probably the right one, but a "quick fix" here for me for postgres could consist of either:

  1. moving the "should i close()" configurable logic out of the snowflake adapter up to the top level
  2. moving the "close every time" logic out of the top level and into the snowflake adapter, since it was added for snowflake.

from the discussion in dbt-labs/dbt-core#5489 it looks like some other adapters may want to default to "close ever time", so 1^ may be a quick easy win that lets dbt users configure things in a way that works for them, without changing the current default behavior?

@ckdake
Copy link
Author

ckdake commented May 5, 2023

If you'd consider a patch for "move the 'should i close?' logic from the snowflake adapter to the top level", I'm happy to try out creating the patch.

@dbeatty10
Copy link
Contributor

Thanks for outlining those two options forward @ckdake 🤩 !

And thank you for offering to work an implementation! I'd like to get a consultation with either @jtcohen6 or @Fleid before making a decision on which direction to go.

Needs refinement

@jtcohen6 or @Fleid would love to get your preferences between these two paths forward:

  1. Promote the Snowflake-specific reuse_connections connection profiles configuration to be available to all adapters
  2. Reduce the "close every time" solution in Fix: close connections after use dbt-core#2650 to only apply to dbt-snowflake rather than all adapters

It seems that option 1 would be the least risky (since it would default to False) while empowering users to make a choice that works for their use case.

But it would be nice if option 2 allowed all users to automatically receive a sensible default rather than needing to opt-in to the behavior that would probably be best for them.

Summary of background context

"should i close()" logic (dbt-snowflake >= 1.4)

dbt-labs/dbt-snowflake#428 added a reuse_connections option to connection profiles for dbt-snowflake only.

It is a boolean flag indicating whether to reuse idle connections to help reduce total connections opened. Default is False.

During node execution (such as model and test), dbt opens connections against a Snowflake warehouse. Setting this configuration to True reduces execution time by verifying credentials only once for each thread.

"close every time" logic (dbt-core >= 0.17.2)

The purpose of closing open connections is so that the Snowflake thread can exit.

After dbt-labs/dbt-core#2650, the release argument to adapter.execute_macro no longer has any effect. Instead, always close the connection. close() calls _rollback() if there is an open transaction.

@jtcohen6 jtcohen6 added enhancement New feature or request and removed bug Something isn't working labels May 7, 2023
@jtcohen6 jtcohen6 changed the title [CT-2538] [Bug] unnecessarily closes/opens connections to backends, specifically postgres [CT-2538] [Feature] Reuse connections on more adapters (specifically postgres) May 7, 2023
@Fleid
Copy link

Fleid commented May 9, 2023

I think that short term, we're better going with option 1. We preserve the current behavior, add an option to switch, and minimize implementation risks.

I'm leaning that way because we are thinking of a deeper re-evaluation of how we manage connections during a run. I'd rather keep the breaking changes for when that time comes.

@dbeatty10
Copy link
Contributor

Sounds great @Fleid 👍

That would be awesome if you try out a PR for this one @ckdake:

  • move the "should i close()" configurable logic out of the snowflake adapter up to the top level

@ckdake
Copy link
Author

ckdake commented May 10, 2023

@dbeatty10 PR up, but after some hefty timeboxing here I'm struggling a bit figuring out the right way to test this: I haven't been successful in triggering a 'passing' test for the unchanged code that confirms a connection is closed after "doing something"

@dbeatty10
Copy link
Contributor

Awesome @ckdake !

Figuring out the testing approach will be a good thing to cover during the code review (which will be assigned to someone other than me). In the meantime, I added a comment in of at least one thing you'll want to do to be ready for review: dbt-labs/dbt-core#7587 (comment)

Copy link

github-actions bot commented Nov 7, 2023

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 7, 2023
@ckdake
Copy link
Author

ckdake commented Nov 7, 2023

My PR to fix this was closed to to the company I was working at shutting down, but the fix in there should still work, and it still needs eyes from someone at DBT on the right testing approach. I'd love to see this merged, and my current dbt stack is probably going to need this in 6-12 months.

@github-actions github-actions bot removed the Stale label Nov 8, 2023
@JackWolverson
Copy link

Is there any news on the use new connection that was in the PR that is on this thread ? we are using DBT 1.6 and see Re-using an available connection from the pool (formerly model.A, now model.B) in the logs after every model and is causing some marts to fail ( we arent sure why yet, but if we run just the marts by themselves they pass ) so having the ability to force a new connection each time would be really useful

@dbeatty10
Copy link
Contributor

Is there any news on the use new connection that was in the PR that is on this thread ?

dbt-labs/dbt-core#7587 was closed before we were able to review it. If someone revived the contents of that PR, we'd still need it to contain working test cases.

We're not sure when we'd be able to prioritize it working on this (either reviewing a PR or working on one ourselves). But we still agree that it feels important.

@mikealfare mikealfare transferred this issue from dbt-labs/dbt-core Feb 13, 2024
@Fleid Fleid unassigned Fleid and jtcohen6 Feb 14, 2024
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 12, 2024
Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants