Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Destination Databricks: Support Azure storage (#15140) #15329

Merged
merged 12 commits into from
Oct 15, 2022
Merged

🎉 Destination Databricks: Support Azure storage (#15140) #15329

merged 12 commits into from
Oct 15, 2022

Conversation

abaerptc
Copy link
Contributor

@abaerptc abaerptc commented Aug 4, 2022

What

Adds Azure storage as an option for Databricks destination.
This PR resolves #15140.

How

Adds Azure storage as an alternative to S3.

Recommended reading order

  1. *.json - spec and sample configs
  2. Databricks*Config.java - abstracted cloud storage, with two implementations (S3 and Azure)
  3. Databricks*Destination.java - switches between S3 and Azure implementations depending on config
  4. Databricks*StreamCopierFactory.java - abstracted, with two implementations (S3 and Azure)
  5. Databricks*StreamCopier.java - abstracted, with existing S3 implementation refactored and Azure implementation added
  6. DatabricksSqlOperations.java - minor change to allow execution of multiple statements
  7. Dockerfile - bumped version
  8. build.gradle - new deps
  9. AzureBlobStorageConnectionChecker.java - adding new constructor to allow reuse in Databricks connector
  10. *Test.java
  11. *.md - docs

🚨 User Impact 🚨

No breaking changes.

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit Screen Shot 2022-08-04 at 3 17 46 PM
Integration

None.

Acceptance Screen Shot 2022-08-04 at 3 18 46 PM

Cannot run S3 acceptance test locally.

@abaerptc abaerptc requested a review from a team as a code owner August 4, 2022 19:24
@CLAassistant
Copy link

CLAassistant commented Aug 4, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Aug 4, 2022
@sajarin sajarin added internal and removed bounty labels Aug 5, 2022
@abaerptc
Copy link
Contributor Author

abaerptc commented Aug 9, 2022

The Azure integration test passes, but I don't have access to Databricks on AWS, and it's proving difficult to configure Azure Databricks to write to S3, so I'm not able to fully run the S3 integration tests. I have confirmed that the data gets written to S3 fine, but still need to confirm that the tables can be created from the S3 location in Databricks. Could someone from Airbyte please trigger the integration tests to run from this PR?

@marcosmarxm
Copy link
Member

marcosmarxm commented Aug 9, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2827034508
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2827034508
🐛 https://gradle.com/s/77uzxvmvh373e

Build Failed

Test summary info:

Could not find result summary

@abaerptc
Copy link
Contributor Author

abaerptc commented Aug 9, 2022

I pushed a new commit that should fix the error seen in the previous run of the S3 acceptance test.

The Azure acceptance test will require the addition of a new secrets file by an Airbyte member (see sample_secrets/azure_config.json).

@marcosmarxm
Copy link
Member

/test connector=connectors/destination-databricks

I added azure_config to Airbyte secret manager.

@tuliren
Copy link
Contributor

tuliren commented Aug 10, 2022

@abaerptc, thank you for the PR! Could you add me as a collaborator to your fork? In this way I can push changes to this branch if needed as I am working on merging it.

@marcosmarxm, there is another PR pending merging for Databricks. Would you mind me doing the coordination and merging it later?

@marcosmarxm
Copy link
Member

marcosmarxm commented Aug 10, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2832361238
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2832361238
🐛 https://gradle.com/s/xtf6vwhphaki2

Build Failed

Test summary info:

Could not find result summary

@marcosmarxm
Copy link
Member

@marcosmarxm, there is another PR pending merging for Databricks. Would you mind me doing the coordination and merging it later?

Sure @tuliren I'd just to confirm tests are running properly here.

@marcosmarxm
Copy link
Member

marcosmarxm commented Aug 10, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2832664101
❌ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/2832664101
🐛 https://gradle.com/s/nazigo4vbe2pe

Build Failed

Test summary info:

Could not find result summary

@abaerptc
Copy link
Contributor Author

@tuliren You're added as a collaborator.

@marcosmarxm
Copy link
Member

@abaerptc now the integration tests are running with your changes!

@abaerptc
Copy link
Contributor Author

@abaerptc now the integration tests are running with your changes!

@marcosmarxm I see a few failures with messages like "Invalid configuration value detected for fs.azure.account.key". I think you need to give your Databricks instance access to your Azure account via the Spark configuration property fs.azure.account.key.

@TamGB
Copy link

TamGB commented Sep 5, 2022

Awesome PR @abaerptc !

Just a side note, the AWS version supports destination pattern matching (see airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksS3StorageConfig.java), is this something you would consider adding? (of course, I am happy to help!)

@abaerptc
Copy link
Contributor Author

abaerptc commented Sep 6, 2022

Awesome PR @abaerptc !

Just a side note, the AWS version supports destination pattern matching (see airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksS3StorageConfig.java), is this something you would consider adding? (of course, I am happy to help!)

@TamGB I don't object to having pattern matching added. However, I don't have the availability to work on it. Please go ahead and implement it if you'd find it valuable.

@pkudinov
Copy link

Is this abstracted enough to add support for Databricks on Google Cloud (i.e. add GCS as a target bucket along with AWS S3, and Azure Blob)?

@abaerptc
Copy link
Contributor Author

Is this abstracted enough to add support for Databricks on Google Cloud (i.e. add GCS as a target bucket along with AWS S3, and Azure Blob)?

You should be able to add another implementation of the data storage method pretty easily based on what I've done here, yes.

@abaerptc
Copy link
Contributor Author

@marcosmarxm @tuliren Any update on getting this merged in?

@natalyjazzviolin
Copy link
Contributor

@abaerptc looks like there are conflicts, could you resolve them please?

@tuliren there is a lot of community interest for this PR, would you be able to move it up in your queue for review/merging? Thank you in advance :)

@abaerptc
Copy link
Contributor Author

@natalyjazzviolin Conflicts are resolved.

@marcosmarxm
Copy link
Member

So sorry @abaerptc for the long delay to reply. @tuliren is actively working to improve databricks destination for other features and to implement your contribution too. Soon he will return to you!

@tuliren
Copy link
Contributor

tuliren commented Oct 14, 2022

Sorry about the delay.

To test the Azure integration, we would need a separate Databricks cluster. I was not able to do that so far. Given that it has been delayed for a long time, I will do some local testing, and mute the integration test related to Azure for now.

I target to merge it today.

@tuliren
Copy link
Contributor

tuliren commented Oct 15, 2022

/test connector=connectors/destination-databricks

🕑 connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/3254956793
✅ connectors/destination-databricks https://github.com/airbytehq/airbyte/actions/runs/3254956793
No Python unittests run

Build Passed

Test summary info:

All Passed

Copy link
Contributor

@tuliren tuliren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@tuliren
Copy link
Contributor

tuliren commented Oct 15, 2022

/publish connector=connectors/destination-databricks run-tests=false

🕑 Publishing the following connectors:
connectors/destination-databricks
https://github.com/airbytehq/airbyte/actions/runs/3255000004


Connector Did it publish? Were definitions generated?
connectors/destination-databricks

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@tuliren tuliren merged commit 1b134d8 into airbytehq:master Oct 15, 2022
@tuliren
Copy link
Contributor

tuliren commented Oct 15, 2022

@abaerptc, thank you very much for your contribution. Sorry about the long delay.

Please don't forget to remove me from your repo.

jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
…rbytehq#15329)

* 🎉 Destination Databricks: Support Azure storage (airbytehq#15140)

* Update docs and version.

* Revert unintentional change to S3 config parsing.

* Revert unnecessary additional table property.

* change spec.json

* Ignore azure integration test

* Add issue link

* auto-bump connector version

Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Liren Tu <tuliren@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🎉 Destination Databricks: Support Azure storage
9 participants