-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[destination-snowflake] executes excessive metadata queries #37311
Comments
Thanks for reporting this @barakavra I added the suggestion in the team backlog for further discussion. |
thanks @marcosmarxm, just to stress this out, altering this behaviour could spare Snowflakers around 30 daily credits of snowflake (±30K/$ per year) |
Thanks for the info @barakavra @airbytehq/destinations can you take a look in this issue? |
Hello @marcosmarxm, do you have any news regarding this issue? |
Since this issue was filed, we did reduce the number of 'metadata queries' to one-per-stream per sync. This can probably be reduced even further to a single query for all streams in the sync. |
Grooming notes:
|
@evantahler kindly research metadata-only queries costs, it has snowflake's "cloud service credits", in case of high volume of queries can cost more than the warehouse itself. attached are our snowflake daily ±costs - i.e. on JDBC kafka connect connector, becore each flush a select dual command - removing this command saved us about 30% of the costs. |
Hi @evantahler @marcosmarxm - Is there any timeline on when this might be resolved? We're running into the same problem when using Snowflake as the destination. The queries below ran for over 7 hours the other day:
We're trying to finalize Airbyte as our ELT solution, and a resolution for this would be great to see! Thank you! |
Some additional notes from @stephane-airbyte:
|
The code in question can be found here: Lines 498 to 517 in 224db75
|
closing, as it should be fixed by #45422 |
Connector Name
destination-snowflake
Connector Version
3.7.0
What step the error happened?
None
Relevant information
i was monitoring the Snowflake queries from the Airbyte platform and there is a great performance improvement that should be implemented on the Snowflake Destination code. So there are 3 queries here that are executed many times during sync while you can cash the query results instead, all of them are related to metadata of the destination table, so for example instead of executing the following query 409860 times a week ( <NUM_OF_STREAMS><NUM_OF_CONNECTIONS> <NUM_OF_REFRESHES>= 409860) it could be <NUM_OF_REFRESHES>*<NUM_OF_CONNECTIONS> if we cash it in the beginning of each refresh per connector
as its only a small dataset the cost will worth it and wrap it with exception will guarantee that even if schema changes happens during the refresh the load will still executed as planned
Relevant log output
No response
Contribute
The text was updated successfully, but these errors were encountered: