Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for Koalas DataFrames #5928

Merged
merged 1 commit into from
Sep 26, 2022
Merged

Test for Koalas DataFrames #5928

merged 1 commit into from
Sep 26, 2022

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Sep 24, 2022

resolves #5927

Continuing work merged in #5906

Description

Test case for Koalas DataFrames (the predecessor to pandas-on-Spark DataFrames AKA pandas API DataFrames).

Group of related pull requests

Checklist

@@ -57,6 +57,25 @@ def model(dbt, session):
return df
"""

KOALAS_MODEL = """
import databricks.koalas as ks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 Seems like this is only for databricks, I would put this one as a separate test in dbt-spark, maybe somewhere here? Just as a new test class should be good enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Koalas DataFrames can also be used on BigQuery even though the package is published by Databricks.

I did manual testing of Koalas on BigQuery here:
dbt-labs/dbt-bigquery#321

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Koalas is a separate package from Spark and should be installed in the target cluster to pass this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this info @ueshin 👍

Do you have any instructions we could follow for installing databricks.koalas on the cluster we use for our CI for dbt-spark (via CircleCI)?

I have this PR open for dbt-spark which you will automatically inherit for dbt-databricks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the target cluster is DBR >=7.1,<11.0, Koalas should be available as it's pre-installed; otherwise, you need to install it in the cluster UI.

Screen Shot 2022-09-26 at 13 35 42

@ChenyuLInx ChenyuLInx self-requested a review September 26, 2022 17:50
@dbeatty10 dbeatty10 merged commit e58edaa into main Sep 26, 2022
@dbeatty10 dbeatty10 deleted the dbeatty/test-koalas-df branch September 26, 2022 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes Skip Changelog Skips GHA to check for changelog file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1238] [Feature] Test for saving Koalas DataFrame in Python models
3 participants