-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Destination Databricks: Support Azure storage (#15140) #15329
Conversation
The Azure integration test passes, but I don't have access to Databricks on AWS, and it's proving difficult to configure Azure Databricks to write to S3, so I'm not able to fully run the S3 integration tests. I have confirmed that the data gets written to S3 fine, but still need to confirm that the tables can be created from the S3 location in Databricks. Could someone from Airbyte please trigger the integration tests to run from this PR? |
/test connector=connectors/destination-databricks
Build FailedTest summary info:
|
I pushed a new commit that should fix the error seen in the previous run of the S3 acceptance test. The Azure acceptance test will require the addition of a new secrets file by an Airbyte member (see sample_secrets/azure_config.json). |
I added azure_config to Airbyte secret manager. |
@abaerptc, thank you for the PR! Could you add me as a collaborator to your fork? In this way I can push changes to this branch if needed as I am working on merging it. @marcosmarxm, there is another PR pending merging for Databricks. Would you mind me doing the coordination and merging it later? |
/test connector=connectors/destination-databricks
Build FailedTest summary info:
|
Sure @tuliren I'd just to confirm tests are running properly here. |
/test connector=connectors/destination-databricks
Build FailedTest summary info:
|
@tuliren You're added as a collaborator. |
@abaerptc now the integration tests are running with your changes! |
@marcosmarxm I see a few failures with messages like "Invalid configuration value detected for fs.azure.account.key". I think you need to give your Databricks instance access to your Azure account via the Spark configuration property |
Awesome PR @abaerptc ! Just a side note, the AWS version supports destination pattern matching (see airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksS3StorageConfig.java), is this something you would consider adding? (of course, I am happy to help!) |
@TamGB I don't object to having pattern matching added. However, I don't have the availability to work on it. Please go ahead and implement it if you'd find it valuable. |
Is this abstracted enough to add support for Databricks on Google Cloud (i.e. add GCS as a target bucket along with AWS S3, and Azure Blob)? |
You should be able to add another implementation of the data storage method pretty easily based on what I've done here, yes. |
@marcosmarxm @tuliren Any update on getting this merged in? |
@natalyjazzviolin Conflicts are resolved. |
Sorry about the delay. To test the Azure integration, we would need a separate Databricks cluster. I was not able to do that so far. Given that it has been delayed for a long time, I will do some local testing, and mute the integration test related to Azure for now. I target to merge it today. |
/test connector=connectors/destination-databricks
Build PassedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
/publish connector=connectors/destination-databricks run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
@abaerptc, thank you very much for your contribution. Sorry about the long delay. Please don't forget to remove me from your repo. |
…rbytehq#15329) * 🎉 Destination Databricks: Support Azure storage (airbytehq#15140) * Update docs and version. * Revert unintentional change to S3 config parsing. * Revert unnecessary additional table property. * change spec.json * Ignore azure integration test * Add issue link * auto-bump connector version Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> Co-authored-by: Liren Tu <tuliren@gmail.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
Adds Azure storage as an option for Databricks destination.
This PR resolves #15140.
How
Adds Azure storage as an alternative to S3.
Recommended reading order
*.json
- spec and sample configsDatabricks*Config.java
- abstracted cloud storage, with two implementations (S3 and Azure)Databricks*Destination.java
- switches between S3 and Azure implementations depending on configDatabricks*StreamCopierFactory.java
- abstracted, with two implementations (S3 and Azure)Databricks*StreamCopier.java
- abstracted, with existing S3 implementation refactored and Azure implementation addedDatabricksSqlOperations.java
- minor change to allow execution of multiple statementsDockerfile
- bumped versionbuild.gradle
- new depsAzureBlobStorageConnectionChecker.java
- adding new constructor to allow reuse in Databricks connector*Test.java
*.md
- docs🚨 User Impact 🚨
No breaking changes.
Pre-merge Checklist
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
Integration
None.
Acceptance
Cannot run S3 acceptance test locally.