-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] dbt-postgres transaction management: extraneous BEGIN
when listing schemas
#9434
Comments
Thanks for reaching out @Michael-cd30 ! I can see why you'd want to avoid extraneous warning messages in your PG logs. However, unless it it causing errors while building data sets in dbt this is unlikely to be a priority for us. Accordingly, I'm going to label this as "help wanted" for a motivated community member to solve. |
@Michael-cd30 How many threads do you have your dbt project configured for? If you configured dbt for 24 threads but there's a limit on the db user to 1 thread, I could see this scenario happening. |
Hi @alison985 The used profile (targetdev in screenshot) is configured to run 4 threads. |
@Michael-cd30 Technically a database user account could be running queries outside of dbt simultaneously. Also, technically, in some places you can limit total concurrent threads on the database regardless of user. However, I assume you already accounted for those. I also agree 1 model creation, with no dependencies or tests, having 24 warning messages is strange. It would be great to see I would have expected to start seeing an exponential back-off timestamps for something that's trying to get a thread. Other things that stand out to me as differences from my normal work(and therefore may be related) are:
Any chance that model has sub-queries or CTEs in it? Or a loop or macro it's calling? Though honestly, at this point I'd personally go to the database server logs instead of the dbt logs. Figure out what is happening there and then figure out what dbt must be doing to cause that message. |
@alison985 - We first saw the problem on our servers : a dedicated PostgreSQL server + a Docker server from which Apache Airflow runs DBT. Both are Linux. In order to write this issue, I've tested in a simpler context, directly from my Windows workstation. In this simple case, the PostgreSQL service runs locally, so I'm sure that no other command is running in parallel. The query uses CTE as recommended, but remains very simple and loop-free. The dbt-postgres connector is maintained by dbt-labs and the code is in the same repository as the core, which probably explains why the version is identical, see : https://github.com/dbt-labs/dbt-core/tree/main/plugins/postgres/dbt/adapters/postgres I've also looked at PostgreSQL logs activating Here's an extract:
and so on... I haven't tested but I'm quite sure the more "schema" my dbt-project will have, the more warning I'll receive. Regards, |
We have a similar problem in greenplum during dbt execution. |
BEGIN
when listing schemas
Thanks for that info @Michael-cd30 and @manticoreroko ! We're unable to be prioritize this ourselves right now, but we'd be willing to give a look at a PR submitted by a community member. Here's a couple places to start looking:
Theoretically, the code is already supposed to check if there's already a known transaction that is open or not, but it must not be working somehow. I'd suggest setting |
@Michael-cd30 were you able to find a fix for this? |
UPDATE: Turns out the below is misleading. I just tried again from the command line with threads = 1, and it failed. So the problem is that the behaviour is random. There appears to in fact be no dependence or logic based on threads and container vs command line. Why does dbt apparently sometimes (and in the case below, almost always) throw transactions at the DB too frequently? I am having a similar, puzzling experience. Running with dbt=1.8.8 I am ultimately needing dbt to run within a docker container. The container runs everything fine except snapshot on my largest new set of data. That is, on smaller new data ingest sets, snapshot runs fine in the same container. Everything else within the container -- all other seeds and runs -- work fine. This is the case irrespective of whether I have threads set to 4 or 1 (which I tried on the advice in the thread above). The failure on the client side is
After two weeks of badgering my postgresql host, I've finally got information from the logs corresponding to this failure:
From the command line ( |
Is this a new bug in dbt-core?
Current Behavior
"dbt run" on postgres tries to open transactions while another is already in progress.
As a result, PostgreSQL displays warnings.
A run on a single model raises 24 warnings.
Expected Behavior
"dbt run" should not try to run neested transactions on PostgreSQL.
Steps To Reproduce
Relevant log output
Environment
Which database adapter are you using with dbt?
postgres
Additional Context
No response
The text was updated successfully, but these errors were encountered: