Snowflake Connector - Reduce computing resources used for metadata queries #43452

Vee7574 · 2024-08-09T19:21:10Z

What

The Snowflake destination connector uses metadata queries to get the details of schemas, tables and columns. Customers have highlighted that the total computing resources used by these queries are adding up to multiple hours of compute time used per day on Snowflake cloud. It would be helpful to reduce the computing resources used for metadata queries by improving the way the connector gets metadata.

For more detailed context about the issue, please check the github ticket:

[destination-snowflake] executes excessive metadata queries #37311
#37311

How

To reduce the computing resources used for snowflake metadata queries, this PR includes changes to use SHOW queries to replace the information_schema queries.

After reviewing the improvement, additional optimizations may be implemented to review the minimum number of queries needed. This PR description will get updated after reviewing the improvements in the snowflake metadata queries.

Review guide

Note: This PR is still in draft mode, it is not ready for a formal review yet. Creating an initial PR to provide a preview into the type of changes being done.

User Impact

Users would see a reduction in the number of information_schema queries being executed and also see a reduction in the amount of computing resources used by metadata queries.

Can this PR be safely reverted and rolled back?

[ YES] YES 💚
NO ❌

vercel · 2024-08-09T19:21:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment

Name	Status	Preview	Comments	Updated (UTC)
airbyte-docs	⬜️ Ignored (Inspect)	Visit Preview		Sep 6, 2024 7:03pm

gisripa · 2024-08-27T21:33:51Z

airbyte-integrations/connectors/destination-snowflake/build.gradle

@@ -44,4 +46,6 @@ integrationTestJava {
 dependencies {
    implementation 'net.snowflake:snowflake-jdbc:3.14.1'
    implementation 'org.apache.commons:commons-text:1.10.0'
+    implementation 'org.json:json:20210307'


We don't need this. We already have Jackson dependency in our dependency chain.

stephane-airbyte · 2024-08-27T22:50:26Z

airbyte-cdk/java/airbyte-cdk/core/src/main/kotlin/io/airbyte/cdk/db/jdbc/DefaultJdbcDatabase.kt

-        return JdbcDatabase.Companion.toUnsafeStream<T>(
+        var connection = dataSource.connection
+
+        try {


connection.use. Also, keep connection a val instead of a var

Sure, I have changed the connection to a val. The code won't be changed to connection.use as we discussed since the connection needs to be open when the result set is returned

stephane-airbyte · 2024-08-27T22:51:06Z

airbyte-integrations/connectors/destination-snowflake/metadata.yaml

@@ -5,7 +5,7 @@ data:
  connectorSubtype: database
  connectorType: destination
  definitionId: 424892c4-daac-4491-b35d-c6688ba547ba
-  dockerImageTag: 3.11.9


why the change? If unneeded, I'd rather keep this out side of the current PR

stephane-airbyte · 2024-08-27T22:51:55Z

...lake/src/main/kotlin/io/airbyte/integrations/destination/snowflake/SnowflakeDatabaseUtils.kt

@@ -293,4 +293,8 @@ object SnowflakeDatabaseUtils {
            AirbyteProtocolType.UNKNOWN -> "VARIANT"
        }
    }
+
+    fun fromIsNullableSnowflakeString(isNullable: String?): Boolean {
+        return "true".equals(isNullable, ignoreCase = true)


remove and use String.toBoolean()

Removed the function and used String.toBoolean

stephane-airbyte · 2024-08-27T22:53:39Z

...io/airbyte/integrations/destination/snowflake/typing_deduping/SnowflakeDestinationHandler.kt

-                rowCount
+
+        val tableRowCountsFromShowQuery = LinkedHashMap<String, LinkedHashMap<String, Int>>()
+        var showColumnsResult: List<JsonNode> = listOf()


remove this and set the val inside your try block

stephane-airbyte · 2024-08-27T23:05:16Z

...io/airbyte/integrations/destination/snowflake/typing_deduping/SnowflakeDestinationHandler.kt

+                    val tableName = result["name"].asText()
+                    val rowCount = result["rows"].asText()
+
+                    tableRowCountsFromShowQuery


The indentation is super confusing here (probably enforced by our format command). Any way to change that, or is our formatter going to bark at you?
Also, you can simplify with map.computeIfAbsent(tableSchema) { LinkedHashMap() }

~~I think we can also use a linkedlist.withDefault, which would simplify this further~~

stephane-airbyte · 2024-08-27T23:29:29Z