Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination databricks: switch to oss jdbc driver #44033

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Aug 14, 2024

closes https://github.com/airbytehq/airbyte-internal-issues/issues/9120

also it seems like we don't need the databricks sdk at all?

The new driver has a slightly different interface (you can't directly supply a URL, it forces you to supply individual fields/properties). I tried to port over our existing stuff, but removed the transportMode=http and EnableArrow=0 things to see if they're still needed.

Databricks documentation doesn't even describe how to do oauth, it only says how to do PAT (https://docs.gcp.databricks.com/en/integrations/jdbc/oss.html#authenticate-the-driver). I copied our old stuff to the new interfaces naively, but it doesn't work

  • DatabricksSQLException: Communication link failure. Failed to connect to server. :https://dbc-6aebf761-f8d6.cloud.databricks.com:443accessToken must be defined
  • DatabricksSQLException: Communication link failure. Failed to connect to server. :https://dbc-6aebf761-f8d6.cloud.databricks.com:443Cannot invoke "com.databricks.sdk.core.oauth.OpenIDConnectEndpoints.getTokenEndpoint()" because "jsonResponse" is null).

notable changes in the oss driver:

  • timestamps with timezone now have a timezone directly from the driver
  • timestamps without timezone have .000 precision
  • Inline byte limit exceeded. Statements executed with disposition=INLINE can have a result size of at most 26214400 bytes. Please execute the statement with disposition=EXTERNAL_LINKS if you want to download the full result

which means:

  • the destinationhandler can parse directly to an Instant, instead of needing to go through LocalDateTime
  • tons of changes in the expected records

Copy link

vercel bot commented Aug 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 4, 2024 4:56pm

@edgao edgao force-pushed the edgao/databricks_jdbc branch from 4fcf030 to 98e94bf Compare August 14, 2024 16:54
@edgao edgao changed the base branch from master to edgao/databricks_password_auth August 14, 2024 16:54
Copy link
Contributor Author

edgao commented Aug 14, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @edgao and the rest of your teammates on Graphite Graphite

@edgao edgao force-pushed the edgao/databricks_password_auth branch from 8c959c8 to 86e4091 Compare August 14, 2024 17:02
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 98e94bf to 0993074 Compare August 14, 2024 17:02
@edgao edgao force-pushed the edgao/databricks_password_auth branch from 86e4091 to 75ee096 Compare August 14, 2024 17:33
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 0993074 to 53895b0 Compare August 14, 2024 17:33
@edgao edgao force-pushed the edgao/databricks_password_auth branch from 75ee096 to b74867f Compare August 14, 2024 17:59
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 53895b0 to ee2e5aa Compare August 14, 2024 17:59
Base automatically changed from edgao/databricks_password_auth to master August 14, 2024 18:11
@edgao edgao force-pushed the edgao/databricks_jdbc branch 2 times, most recently from 2af315e to d4753f6 Compare August 14, 2024 21:07
@@ -44,18 +44,23 @@ object DatabricksConnectorClientsFactory {
// EnableArrow=0 flag is undocumented and disables ArrowBuf when reading data
// Destinations only reads data for metadata or for comparison of actual data in tests. so
// we don't need it to be optimized.
val jdbcUrl =
"jdbc:databricks://${config.hostname}:${config.port}/${config.database};transportMode=http;httpPath=${config.httpPath};EnableArrow=0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gisripa do you remember what the /${config.database} thing is supposed to do? It sounds like we're passing in the database (i.e. unity catalog name?)

but the oss driver's docs sound like this is actually supposed to be the schema, which sounds super weird. I.e. their url format is jdbc:databricks://<server-hostname>:<port>/<schema>;

(I'm seeing some weird test failures with ConcurrentModificationException and am very confused about how that could be happening, no idea if this is related)

@edgao edgao force-pushed the edgao/databricks_jdbc branch from d4753f6 to 6ad7fd6 Compare August 21, 2024 15:10
@edgao edgao changed the base branch from master to edgao/mixed_case_sqlgen_test August 21, 2024 15:10
@edgao edgao force-pushed the edgao/mixed_case_sqlgen_test branch 2 times, most recently from 5555212 to 6ae90f0 Compare August 21, 2024 15:17
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 6ad7fd6 to 8ae325e Compare August 21, 2024 15:17
@edgao edgao force-pushed the edgao/mixed_case_sqlgen_test branch from 6ae90f0 to 7fc4b42 Compare August 21, 2024 16:45
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 8ae325e to 82e9adf Compare August 21, 2024 16:46
@edgao edgao force-pushed the edgao/mixed_case_sqlgen_test branch 2 times, most recently from 8508c4d to 3edff7b Compare August 22, 2024 15:49
Base automatically changed from edgao/mixed_case_sqlgen_test to master August 22, 2024 15:59
@edgao edgao force-pushed the edgao/databricks_jdbc branch from 82e9adf to b9168c4 Compare September 3, 2024 21:53
@edgao edgao force-pushed the edgao/databricks_jdbc branch from b9168c4 to 0aefd8f Compare September 4, 2024 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants