-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark failed to start: Driver unresponsive using vanilla Databricks cluster #3582
Comments
Which Azure region are you using? Can you check the Azure firewall logs for any traffic being denied from the subnet containing Databricks? Also in the cluster logs, anything of note? Ill try test it out this end, but might be a couple of days. |
I'm using region West Europe. I'll dive into the logs. Thank you. |
Only error in the cluster logs is the one mentioned. |
OK, if you can check the firewall logs. My guess is a file is trying to be downloaded but is blocked by the firewall - maybe a new FQDN dependancy. It's a couple of months since I've personally tested this. If nothing obvious it might be worth opening a support ticket via the Azure portal for Databricks who might be able to provide some guidance. |
I'll check. But progress is slow. Running from one error into another. |
Just spinning up a cluster. Wil let you know how I get on. |
Great thanks.
Sent from Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Marcus Robinson ***@***.***>
Sent: Wednesday, June 21, 2023 3:28:54 PM
To: microsoft/AzureTRE ***@***.***>
Cc: Vogten, Tom ***@***.***>; Author ***@***.***>
Subject: Re: [microsoft/AzureTRE] Spark failed to start: Driver unresponsive using vanilla Databricks cluster (Issue #3582)
Just spinning up a cluster. Wil let you know how I get on.
—
Reply to this email directly, view it on GitHub<#3582 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AZKQBVUS2OPZT2BVD5YZHE3XMLZJNANCNFSM6AAAAAAZNEPDDM>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
In the firewall logs I can see:
Will check the rule collection versus published rules. We recently added a note here - https://microsoft.github.io/AzureTRE/unreleased/tre-templates/workspace-services/databricks/ This service uses a JSON file to store the various network endpoints required by Databricks to function. If you hit networking related issues when deploying or using Databricks, please ensure this file https://github.com/microsoft/AzureTRE/blob/main/templates/workspace_services/databricks/terraform/databricks-udr.json contains the approprate settings for the region you are using. The required settings for each region can be extracted from this document: https://learn.microsoft.com/azure/databricks/resources/supported-regions. |
Versus the JSON file see some difference, might be recent changes or errors when JSON file has been created. metastore: artifact: Added them, now left with Will look to PR a fix. |
@tljjvogten If you can test the changes in the PR would be appreciated. Works for me. Easiest way is to create a copy of the Databricks workspace service from the PR branch into a Thanks for reporting this. |
@marrobi we applied the fix and managed to get a Databricks cluster up and running. Thanks for your quick response! |
Description
In my Azure TRE deployment I am trying to start a cluster in Databricks. I've added all the Databricks services and they are working fine. However, I get the error shown below. Am I missing something?
Spark failed to start: Driver unresponsive. Possible reasons: library conflicts, incorrect metastore configuration, and init script misconfiguration.
Steps
The steps I have tried are:
The text was updated successfully, but these errors were encountered: