Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Databricks Serverless support #3001

Merged
merged 3 commits into from
Aug 14, 2024

Conversation

eakmanrq
Copy link
Contributor

@eakmanrq eakmanrq commented Aug 12, 2024

Adds support for using Serverless compute from a notebook. I tested by plan/applying using a Serverless environment.

Two key changes:

  • Serverless does not support global temp views but does support session temp views. Global is at the application level while session is at the spark session level. I don't believe we support any sharing of state across sessions so changing this to session scoped views should be fine.
  • Serverless threw a unique exception when trying to access SparkContext when it was not available so handle this exception too (we already had logic in place for supporting databricks connect).

Note: This PR is build on the current constraints that the Python SQL Connector cannot query serverless. If that changes in the future, then the fix in the PR of having serverless use session temp views would become a problem if you want to mix SQL Connector and Databricks Connect. It might be that the global temp constraint is lifted by then but if not then the hybrid mode of using both would likely need to be disabled when using serverless.

@eakmanrq eakmanrq force-pushed the eakmanrq/add_databricks_serverless_support branch 2 times, most recently from d1679a1 to da254fe Compare August 12, 2024 22:01
@eakmanrq
Copy link
Contributor Author

Found out that when using Databricks SQL Connector + Databricks connect then you need the global temp views since they are trying to share temp objects across sessions (but still within the same application). Therefore now I just use the global temp views in all cases except when using databricks serverless.

Also added serverless support to databricks-connect.

@@ -260,8 +264,14 @@ def _df_to_source_queries(

def query_factory() -> Query:
temp_table = self._get_temp_table(target_table or "spark", table_only=True)
df.createOrReplaceGlobalTempView(temp_table.sql(dialect=self.dialect)) # type: ignore
temp_table.set("db", "global_temp")
if self.use_serverless:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like databricks specific. Should we just override query_factory it in the corresponding adapter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EG.:

def query_factory() -> Query:
            temp_table = self._get_temp_table(target_table or "spark", table_only=True)
            if self.use_serverless:
                ...
            else:
                super()....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think I will just completely override it since the _use_spark_session is also databricks specific.

Change: 91cb2a1

@eakmanrq eakmanrq force-pushed the eakmanrq/add_databricks_serverless_support branch 3 times, most recently from 3cd5b9b to 52e8a8e Compare August 14, 2024 03:12
@eakmanrq eakmanrq force-pushed the eakmanrq/add_databricks_serverless_support branch from 52e8a8e to 4417423 Compare August 14, 2024 03:25
@eakmanrq eakmanrq merged commit d2e22f7 into main Aug 14, 2024
21 checks passed
@eakmanrq eakmanrq deleted the eakmanrq/add_databricks_serverless_support branch August 14, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants