Snowflake queueing causes keypair auth'd connections to fail #14

drewbanin · 2020-05-22T15:34:44Z

Describe the feature

Snowflake keypair connections have 10 seconds to execute before the embedded JWT token expires. If many concurrent queries are opened against Snowflake when using keypair auth, Snowflake can fail to open connections with this error:

Failed to connect to DB: db.snowflakecomputing.com:443. JWT token is invalid.

I don't think there's too much we can do on dbt's side of things to prevent this, but we can consider doing the following things:

catch this error and retry opening the connection after some timeout
provide better tooling for users to understand how queueing affects dbt invocations
- This could be as simple as a handy Snowflake query you can run to analyze concurrently running dbt queries over time
- It would be awesome to include queued time in the logs or run_results.json file if Snowflake makes it available to us

Steps To Reproduce

Configure a Snowflake connection with keypair auth
Create a sample project with 64 models in it
Run dbt run --threads 64

Expected behavior

TBD. We can address this issue with:

Better docs and tooling
Retries
More information in dbt log output

System information

Which database are you using dbt with?

snowflake

The output of dbt --version:
Any

The operating system you're using:
Any

The output of python --version:
Any

The text was updated successfully, but these errors were encountered:

cgkirici · 2020-12-01T17:00:21Z

Hi Drew,

Getting this error even when running 1 model with 1 thread, and when using dbt debug to test the connection, which makes me think that the issue might not only be a case of running many concurrent queries. Or am I missing something?

Also snowflake documentation says this could be because 'Client machine's time is not synced with NTP server, and caused JWT Token to become invalid due to a token TTL timeout' link - could that be a potential reason?

drewbanin · 2020-12-01T20:03:26Z

hey @cgEarth - is it possible that there are a lot of queries running against the warehouse that are not executed by dbt? I think the operative thing to investigate is just if the Snowflake database is overloaded such that any query generated by dbt would block for the > 10 second window where tokens are still valid.

I actually haven't heard anything about this one since this issue was opened ~6 months ago. I do think that that a time desynchronization could be a likely culprit as well, or there could be some other variable that we're missing here

tnightengale · 2021-06-10T20:20:04Z

@drewbanin How would you feel about adding a retry option for the Snowflake connection more generally? I see that the BigQuery connection got https://github.com/fishtown-analytics/dbt/pull/2694/files, on specific errors recently :)

Our need is that sometimes Snowflake Shares (shared database objects) from external orgs are dropped/missing momentarily, and unfortunately it coincides with our runs. If you agree that broadening retry support would be helpful for anyone experiencing transient errors, I'd be happy to submit a PR.

jtcohen6 · 2021-06-10T21:25:59Z

@tnightengale I'm open to that! There's a more recent issue (dbt-labs/dbt-core#3303) discussing potential for retry on transient errors. I think that might be a good place to talk more about the errors associated with missing Snowflake shares; our ability to identify them as a transient error; and developing retry logic more generically.

JoshuaHuntley · 2021-09-30T20:03:04Z

@jtcohen6 Curious if there has been any additional information on this issue. My client is seeing these JWT issues pop up and our investigation has yielded no leads.

Snowflake load does not seem to predict the JWT failure issue. We've seen it happen at times where warehouse load is low with no queues but at other times when warehouse load has been high with longer queues the issue has not happened.

We've tried clusters as well as separating different jobs out onto different warehouses and playing with the number of threads. There does not seem to be rhyme or reason.

I'm also on dbt Slack if a more active conversation might be beneficial.

jtcohen6 · 2021-10-04T15:36:51Z

@JoshuaHuntley No additional information from my end as far as getting to the bottom of this one. Ultimately, we're at the whims of Snowflake's internal queueing and the information available from snowflake-connector-python.

There's definitely interest in adding more retry logic for transient/timeout errors across several adapters. That seems like our only point of intervention for handling this bug as well. I haven't thought out the exact implementation details, but I imagine it might look like reimplementing the base execute method within the dbt-snowflake connections module to catch this error and resubmit the query.

jtcohen6 · 2021-11-10T11:58:58Z

Resolved by #6

jtcohen6 transferred this issue from dbt-labs/dbt-core Oct 12, 2021

jtcohen6 added the type:enhancement New feature or request label Oct 12, 2021

jtcohen6 closed this as completed Nov 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snowflake queueing causes keypair auth'd connections to fail #14

Snowflake queueing causes keypair auth'd connections to fail #14

drewbanin commented May 22, 2020

cgkirici commented Dec 1, 2020 •

edited

Loading

drewbanin commented Dec 1, 2020

tnightengale commented Jun 10, 2021

jtcohen6 commented Jun 10, 2021

JoshuaHuntley commented Sep 30, 2021

jtcohen6 commented Oct 4, 2021

jtcohen6 commented Nov 10, 2021

Snowflake queueing causes keypair auth'd connections to fail #14

Snowflake queueing causes keypair auth'd connections to fail #14

Comments

drewbanin commented May 22, 2020

Describe the feature

Steps To Reproduce

Expected behavior

System information

cgkirici commented Dec 1, 2020 • edited Loading

drewbanin commented Dec 1, 2020

tnightengale commented Jun 10, 2021

jtcohen6 commented Jun 10, 2021

JoshuaHuntley commented Sep 30, 2021

jtcohen6 commented Oct 4, 2021

jtcohen6 commented Nov 10, 2021

cgkirici commented Dec 1, 2020 •

edited

Loading