[ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" #427

Kamalakar4Pelago · 2023-05-03T02:54:38Z

Is this a new bug in dbt-redshift?

I believe this is a new bug in dbt-redshift
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Airflow DAG task running dbt core models for redshift are failing with Runtime error " read operation timed out",
As a workaround - we deployed a hot fix to use 1.4.6 version, which is the prior working version in our case.

Expected Behavior

All models which were running in the 1.4.6 version should also be expected to be running with 1.5.0.

Steps To Reproduce

Daily Airflow DAG - using ECS Operator, we are running DBT models, Which suddenly started failing from 29Apr2023 onwards and importantly no change was introduced from our side.
task with run - dbt core task is failing with "read operation timed out" - as shown logs attached.

Relevant log output

[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:21.609000] �[0m16:51:21  Running with dbt=1.5.0
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:21.791000] �[0m16:51:21  Unable to do partial parsing because saved manifest not found. Starting full parse.
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.277000] �[0m16:51:23  Found 31 models, 14 tests, 0 snapshots, 0 analyses, 345 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics, 0 groups
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.280000] �[0m16:51:23  
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.760000] �[0m16:51:23  Concurrency: 4 threads (target='core')
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.760000] �[0m16:51:23  
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.778000] �[0m16:51:23  1 of 8 START sql table model core.mv_bookings .................................. [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.780000] �[0m16:51:23  2 of 8 START sql incremental model core.t1_bookings_all ........................ [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.781000] �[0m16:51:23  3 of 8 START sql incremental model core.t1_payments ............................ [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:23.792000] �[0m16:51:23  4 of 8 START sql table model core.t1_product_options ........................... [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:43.399000] �[0m16:51:43  4 of 8 OK created sql table model core.t1_product_options ...................... [�[32mSUCCESS�[0m in 19.60s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:43.402000] �[0m16:51:43  5 of 8 START sql table model core.t1_product_units ............................. [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:56.400000] �[0m16:51:56  1 of 8 ERROR creating sql table model core.mv_bookings ......................... [�[31mERROR�[0m in 32.62s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:56.402000] �[0m16:51:56  6 of 8 START sql table model core.t1_products_all .............................. [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:56.889000] �[0m16:51:56  2 of 8 ERROR creating sql incremental model core.t1_bookings_all ............... [�[31mERROR�[0m in 33.10s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:56.891000] �[0m16:51:56  7 of 8 START sql incremental model core.t1_promos .............................. [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:57.254000] �[0m16:51:57  5 of 8 OK created sql table model core.t1_product_units ........................ [�[32mSUCCESS�[0m in 13.85s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:51:57.256000] �[0m16:51:57  8 of 8 START sql incremental model core.t1_row_counts .......................... [RUN]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:10.836000] �[0m16:52:10  3 of 8 OK created sql incremental model core.t1_payments ....................... [�[32mSUCCESS�[0m in 47.04s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:25.167000] �[0m16:52:25  7 of 8 OK created sql incremental model core.t1_promos ......................... [�[32mSUCCESS�[0m in 28.27s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:27.653000] �[0m16:52:27  6 of 8 ERROR creating sql table model core.t1_products_all ..................... [�[31mERROR�[0m in 31.25s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:38.781000] �[0m16:52:38  8 of 8 OK created sql incremental model core.t1_row_counts ..................... [�[32mSUCCESS�[0m in 41.52s]
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:38.860000] �[0m16:52:38  
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:38.860000] �[0m16:52:38  Finished running 4 table models, 4 incremental models in 0 hours 1 minutes and 15.58 seconds (75.58s).
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:38.875000] �[0m16:52:38  
[2023-04-30 16:53:05,159] INFO - [2023-04-30T16:52:38.876000] �[0m16:52:38  �[31mCompleted with 3 errors and 0 warnings:�[0m
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.876000] �[0m16:52:38  
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.877000] �[0m16:52:38  �[33mRuntime Error in model mv_bookings (models/core/mv_bookings.sql)�[0m
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.877000] �[0m16:52:38    The read operation timed out
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.878000] �[0m16:52:38  
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.879000] �[0m16:52:38  �[33mRuntime Error in model t1_bookings_all (models/core/t1_bookings_all.sql)�[0m
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.879000] �[0m16:52:38    The read operation timed out
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.880000] �[0m16:52:38  
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.880000] �[0m16:52:38  �[33mRuntime Error in model t1_products_all (models/core/t1_products_all.sql)�[0m
[2023-04-30 16:53:05,160] INFO - [2023-04-30T16:52:38.881000] �[0m16:52:38    The read operation timed out

Environment

- OS: ubuntu 18.04
- Python: 3.8
- dbt-core:1.5.0
- dbt-redshift:1.5.0

Additional Context

No response

mp24-git · 2023-05-03T08:30:28Z

We are seeing the same issue when running our DBT code in docker against Redshift Serverless. Going back to 1.4.6 fixed the issue for us.

Docker Image: public.ecr.aws/docker/library/python:3.10-slim-bullseye
Python 3.10
dbt-core 1.5.0
dbt-redshift 1.5.0

fabientra · 2023-05-03T13:35:26Z

We had the same issue and had to rollback to a previous version

dbeatty10 · 2023-05-03T19:41:59Z

Thanks for alerting us about this @Kamalakar4Pelago, @mp24-git, and @fabientra

Are all of you using Redshift Serverless? Are any of you using cross-database references or RA3 instances? Trying to hone in on what is common between all of you so that we can try reproducing the issue on our end.

mp24-git · 2023-05-03T21:27:00Z

We are using Redshift Serverless without cross-database references. The issue happens mostly with large tables that we read from an external schema (which are eventually S3 parquet files). Smaller tables read from the same schema are ok.

I see that big tables fail after ~31 seconds but smaller tables seem to be fine

1 of 35 ERROR creating sql table model fdc_staging.bkpf ........................ [\u001b[31mERROR\u001b[0m in 31.24s]
2 of 35 OK created sql table model fdc_staging.bkpf_changes .................... [\u001b[32mSUCCESS\u001b[0m in 24.06s]
3 of 35 ERROR creating sql table model fdc_staging.bseg ........................ [\u001b[31mERROR\u001b[0m in 31.69s]
4 of 35 ERROR creating sql table model fdc_staging.bseg_changes ................ [\u001b[31mERROR\u001b[0m in 32.10s]

njrs92 · 2023-05-03T22:03:43Z

We are getting the same issue but am using a plain RA3 cluster. There are some uses of cross-database but fails on any table. The workaround was to bump connect_timeout to something obscenely high which gives the tables time to run.

dbeatty10 · 2023-05-03T22:04:17Z

Background context:

We switched from using psycopg2 in 1.4 to redshift_connector in 1.5

I'm wondering if the connection timeout is somehow coming into play? If so, might explain that ~30 second threshold.

Within your profiles.yml, could you try setting the following to see if the errors start taking ~40 seconds?

connect_timeout: 40

njrs92 · 2023-05-03T22:17:42Z

Yes we found that the connect_timeout directly linked to the time it takes to fail, we had it at 20sec, tested with 60, then 600 and finally 6000

Kamalakar4Pelago · 2023-05-04T00:33:42Z

Thanks for alerting us about this @Kamalakar4Pelago, @mp24-git, and @fabientra

Are all of you using Redshift Serverless? Are any of you using cross-database references or RA3 instances? Trying to hone in on what is common between all of you so that we can try reproducing the issue on our end.

We are on the Redshift [dc2. large] cluster.

fabientra · 2023-05-04T07:23:23Z

We are using a plain ra3. read timeout occured on a query using Spectrum I think. We will try to increase the connect_timeout

dbeatty10 · 2023-05-05T16:31:55Z

Potentially related?

[CT-2538] [Feature] Reuse connections on more adapters (specifically postgres) dbt-adapters#83

Edit: I'm guessing not related.

stisti · 2023-05-06T22:31:02Z

Issue still exists with release 1.5.1. We are using a small cluster of ra3.xlplus nodes (not serverless.)

Very short queries that read only few hundred rows might work. Anything larger times out.

elongl · 2023-05-08T12:02:13Z

Experiencing the same issue.

dbeatty10 · 2023-05-08T13:53:10Z

Thanks for confirming @stisti and @elongl.

The Redshift team is looking at this to determine a path forward.

Workaround

In the meantime, the workaround is to add an absurdly high timeout within your profiles.yml:

      connect_timeout: 999999

jialerchew · 2023-05-10T11:08:48Z

confirming that we're facing the same issue. Upgraded from 1.4 to 1.5 for redshift_connector.

RohitKJha30 · 2023-05-13T09:42:47Z

I faced the same issue after upgrading to version 1.5.0. Updated the connect_timeout to 999999 in the profile.yml as suggested by @dbeatty10 and the issues was fixed.

shaunavalon · 2023-05-18T08:01:25Z

We are also facing this issue after updating our ECS instances of DBT to 1.5.0. We are using Redshift Server with dc2.large nodes.

mgoers-beekeeper · 2023-05-24T15:33:10Z

Having the same issues here running our jobs deployed via dbt cloud. Had to rollback the upgrade to dbt 1.5 in dbt cloud.

davidwitk · 2023-06-15T10:54:49Z

Same issue here when trying to upgrade to v1.5 in dbt Cloud. I see that it seems to have been fixed with dbt-redshift 1.5.2.

@dbeatty10 How are the adapter versions updated in dbt Cloud? Is there a way to select the version of the Redshift adapter in dbt Cloud?

dataders · 2023-06-15T14:23:45Z

@chodera sorry to hear that you're still seeing this issue. You're right that you shouldn't be seeing it any longer.

Is there a way to select the version of the Redshift adapter in dbt Cloud?

You're also highlighting a great UX improvement opportunity for dbt Cloud. Coincidentally, I opened an internal feature request ticket to have the adapter version displayed yesterday; everyone agrees it's a no brainer. As far as selecting which adapter version you're using, I think a core value proposition of dbt Cloud is that you shouldn't have to worry about which patch version you're using; it should just work.

For the time being, can you please reach out to dbt Cloud support? If you are able to determine that you are indeed using dbt-redshift v1.5.2 definitely come back to here and let us know. Cheers!

Kamalakar4Pelago added bug Something isn't working triage labels May 3, 2023

github-actions bot changed the title ~~[Bug] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out"~~ [ADAP-506] [Bug] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" May 3, 2023

dbeatty10 added the regression label May 3, 2023

dbeatty10 changed the title ~~[ADAP-506] [Bug] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out"~~ [ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" May 3, 2023

dbeatty10 added awaiting_response and removed triage labels May 3, 2023

github-actions bot added triage and removed awaiting_response labels May 3, 2023

dbeatty10 added awaiting_response and removed triage labels May 3, 2023

github-actions bot added triage and removed awaiting_response labels May 3, 2023

fabientra mentioned this issue May 4, 2023

run_result uploads fail after upgrading dbt-redshift from 1.3 to 1.5 elementary-data/elementary#844

Closed

dbeatty10 removed the triage label May 5, 2023

jiezhen-chen mentioned this issue May 8, 2023

Default connect_timeout to None #433

Merged

6 tasks

Fleid self-assigned this May 9, 2023

Fleid added this to the 1.5.2 milestone May 9, 2023

Fleid mentioned this issue May 9, 2023

[ADAP-527] [PR Tracking] Default connect_timeout to None #433 #434

Closed

jaklan mentioned this issue May 15, 2023

[ADAP-549] [CT-2577] [Regression] Error Only %s and %% are supported in the query #441

Closed

2 tasks

github-actions bot added the triage label May 18, 2023

dbeatty10 mentioned this issue May 23, 2023

[ADAP-570] [Regression] Why dbt-redshift 1.5+ is available on pypi knowing the issues with it? #460

Closed

2 tasks

dbeatty10 removed the triage label May 23, 2023

github-actions bot added the triage label May 24, 2023

McKnight-42 closed this as completed in #433 May 24, 2023

Fleid added redshift_connector and removed triage labels May 24, 2023

jiezhen-chen mentioned this issue Jun 2, 2023

Explain changes to connect_timeout default dbt-labs/docs.getdbt.com#3460

Open

1 task

dbeatty10 mentioned this issue Jun 15, 2023

dbt Cloud support dbt-labs/docs.getdbt.com#3542

Closed

1 task

sphinks mentioned this issue Jun 21, 2023

[ADAP-634] [Bug] Redshift fails to run several threads #501

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" #427

[ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" #427

Kamalakar4Pelago commented May 3, 2023 •

edited

Loading

mp24-git commented May 3, 2023

fabientra commented May 3, 2023

dbeatty10 commented May 3, 2023

mp24-git commented May 3, 2023

njrs92 commented May 3, 2023

dbeatty10 commented May 3, 2023

njrs92 commented May 3, 2023

Kamalakar4Pelago commented May 4, 2023

fabientra commented May 4, 2023

dbeatty10 commented May 5, 2023 •

edited

Loading

stisti commented May 6, 2023

elongl commented May 8, 2023

dbeatty10 commented May 8, 2023

jialerchew commented May 10, 2023

RohitKJha30 commented May 13, 2023

shaunavalon commented May 18, 2023

mgoers-beekeeper commented May 24, 2023

davidwitk commented Jun 15, 2023

dataders commented Jun 15, 2023

[ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" #427

[ADAP-506] [Regression] post release 1.5.0 - models are failing with Runtime error - "Read operation timed out" #427

Comments

Kamalakar4Pelago commented May 3, 2023 • edited Loading

Is this a new bug in dbt-redshift?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

mp24-git commented May 3, 2023

fabientra commented May 3, 2023

dbeatty10 commented May 3, 2023

mp24-git commented May 3, 2023

njrs92 commented May 3, 2023

dbeatty10 commented May 3, 2023

njrs92 commented May 3, 2023

Kamalakar4Pelago commented May 4, 2023

fabientra commented May 4, 2023

dbeatty10 commented May 5, 2023 • edited Loading

stisti commented May 6, 2023

elongl commented May 8, 2023

dbeatty10 commented May 8, 2023

Workaround

jialerchew commented May 10, 2023

RohitKJha30 commented May 13, 2023

shaunavalon commented May 18, 2023

mgoers-beekeeper commented May 24, 2023

davidwitk commented Jun 15, 2023

dataders commented Jun 15, 2023

Kamalakar4Pelago commented May 3, 2023 •

edited

Loading

dbeatty10 commented May 5, 2023 •

edited

Loading