Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Databricks: Implement refreshes #40692

Merged
merged 2 commits into from
Aug 1, 2024
Merged

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Jul 2, 2024

closes https://github.com/airbytehq/airbyte-internal-issues/issues/8534. closes https://github.com/airbytehq/airbyte-internal-issues/issues/8835. closes https://github.com/airbytehq/airbyte-internal-issues/issues/8857#issuecomment-2259169235. Structurally identical to all the other refreshes PRs (e.g. #38713).

❗ this PR also unpins the current version, i.e. we will release the connector publicly here.

As a refresher:

  • pass the raw table suffix around to the relevant places
  • DestinationHandler needs to fetch an InitialStatus for the temp raw table in addition to the real raw table
    • I did some minor code clarification in this file
  • StorageOperation needs to implement some new methods
    • See the new integration test class for how we verify those new methods
    • some code shuffling in the StreamOperation to enable that integration test
  • some new expectedrecords.jsonl files (CDK bump pulled in some new test cases)
  • fixes a bug in check (wasn't deleting the test table). Added a test case for this.
    • (this only really matters for us - we had an older test table without generation_id, which caused check to fail. Real users wouldn't run into this... but it feels like best practice to clean up ourselves)
  • fixes a bug in StreamOperation, where we would write an empty file, which then caused an error in the COPY command

I'll do a prerelease image + set up a sync in the perf test workspace, but in the meantime, this should be ready for review.

Copy link

vercel bot commented Jul 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 31, 2024 4:29pm

@edgao edgao force-pushed the edgao/databricks_generation_id branch from 5ea3ddb to 4aa4151 Compare July 2, 2024 23:25
@edgao edgao force-pushed the edgao/databricks_refreshes branch from 1a8a7e1 to fb8cd1f Compare July 2, 2024 23:25
@edgao edgao force-pushed the edgao/databricks_generation_id branch from 4aa4151 to d7589bf Compare July 3, 2024 19:58
@edgao edgao force-pushed the edgao/databricks_refreshes branch from fb8cd1f to cd225fd Compare July 3, 2024 19:58
@edgao edgao force-pushed the edgao/databricks_generation_id branch from d7589bf to ee4d4d4 Compare July 3, 2024 21:08
@edgao edgao force-pushed the edgao/databricks_refreshes branch from cd225fd to e525669 Compare July 3, 2024 21:08
@edgao edgao force-pushed the edgao/databricks_generation_id branch from ee4d4d4 to 0c533f1 Compare July 3, 2024 21:10
@edgao edgao force-pushed the edgao/databricks_refreshes branch from e525669 to 3a1a034 Compare July 3, 2024 21:10
@edgao edgao force-pushed the edgao/databricks_generation_id branch from 0c533f1 to 100d7ad Compare July 3, 2024 21:13
@edgao edgao force-pushed the edgao/databricks_refreshes branch from 3a1a034 to 3b19eae Compare July 3, 2024 21:13
@edgao edgao force-pushed the edgao/databricks_generation_id branch from 100d7ad to 8d8804d Compare July 3, 2024 21:51
@edgao edgao force-pushed the edgao/databricks_refreshes branch from 3b19eae to 949b034 Compare July 3, 2024 21:51
@edgao edgao force-pushed the edgao/databricks_generation_id branch from 8d8804d to a08885a Compare July 5, 2024 22:54
@edgao edgao force-pushed the edgao/databricks_refreshes branch from 949b034 to 5c437fe Compare July 5, 2024 22:54
@edgao edgao force-pushed the edgao/databricks_generation_id branch from a08885a to 859842c Compare July 9, 2024 16:58
@edgao edgao force-pushed the edgao/databricks_refreshes branch from 5c437fe to b20cade Compare July 9, 2024 16:58
@edgao edgao force-pushed the edgao/databricks_generation_id branch from 859842c to 76745ef Compare July 9, 2024 17:11
@edgao edgao force-pushed the edgao/databricks_refreshes branch from b20cade to fc1c851 Compare July 9, 2024 17:11
@edgao edgao force-pushed the edgao/databricks_generation_id branch 3 times, most recently from 6dfd22e to fb7d6ae Compare July 9, 2024 21:21
@edgao edgao force-pushed the edgao/databricks_refreshes branch from fc1c851 to 7f388a3 Compare July 9, 2024 21:24
private fun prepareStagingTable(streamId: StreamId, destinationSyncMode: DestinationSyncMode) {
val rawSchema = streamId.rawNamespace
// TODO: Optimize by running SHOW SCHEMAS; rather than CREATE SCHEMA if not exists
destinationHandler.execute(sqlGenerator.createSchema(rawSchema))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the create schema call now happens via destinationhandler.createNamespaces, so remove it from here

Copy link
Contributor

@johnny-schmidt johnny-schmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

  • the overridden methods do what they seem like they should
  • the statements made about databricks in comments, etc are correct
  • the net new behavior makes sense with my understanding of what supporting refreshes means
  • the tests look thorough and correct but I didn't have time to go over them all in detail

@edgao edgao force-pushed the edgao/databricks_refreshes branch from c9a620d to 6fccc25 Compare July 31, 2024 16:19
@edgao edgao merged commit ae4d65e into master Aug 1, 2024
60 of 61 checks passed
@edgao edgao deleted the edgao/databricks_refreshes branch August 1, 2024 15:45
BenGalewsky pushed a commit to BenGalewsky/airbyte that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/destination/databricks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants