Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor submission method and add command API as defualt #442

Merged
merged 4 commits into from
Aug 30, 2022

Conversation

ChenyuLInx
Copy link
Contributor

@ChenyuLInx ChenyuLInx commented Aug 29, 2022

resolves #424 #419

Description

Use Command API as default python model submission method. User can still create the notebook by adding submission_method:'notebook' in config for the model

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have run changie new to create a changelog entry

@cla-bot cla-bot bot added the cla:yes label Aug 29, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-spark contributing guide.

@ChenyuLInx
Copy link
Contributor Author

@ueshin Can you also review this? I somehow can't add you to reviewer

@ChenyuLInx ChenyuLInx requested review from jtcohen6 and stu-k August 30, 2022 14:47
DEFAULT_POLLING_INTERVAL = 3


class BasePythonJobHelper:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid getting these massive .py files how do folks feel about putting classes in separate files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean base class in one place then the ones inheritance it in separate ones? I don't really feel like a 300 line file is massive. Is there any advantage of breaking it down to 3 100 line file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh definitely not massive as of now but if the logic in these classes grows we this could get large quite quickly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense! But the logic here will likely stay the same, and we probably looking at refactoring it into multi adapter format in the longer term, or starting to adopt dbt-databricks for databricks specific submission. So I am inclined to leave it as is for now to avoid over-optimizing.

json={
"path": path,
"content": b64_encoded_content,
"language": "PYTHON",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it matters but looks like 'language' is uppercase here and lowercase elsewhere, maybe put this in a static variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the rest into a static variable but this one needs to be upper cased as the API is a different one and actually require uppercase.

self.polling_interval = DEFAULT_POLLING_INTERVAL

def get_timeout(self):
timeout = self.parsed_model["config"].get("timeout", 60 * 60 * 24)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move the timeout into a DEFAULT_TIMEOUT var?

context.destroy(context_id)


python_submission_helpers = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: upper case as this is a "global" var?

@ChenyuLInx
Copy link
Contributor Author

Incremental tests failing in integration test is being fixed at #445

self.polling_interval = DEFAULT_POLLING_INTERVAL

def get_timeout(self):
timeout = self.parsed_model["config"].get("timeout", DEFAULT_TIMEOUT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we not use timeout passed to submit_python_job anymore?

Copy link
Contributor Author

@ChenyuLInx ChenyuLInx Aug 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably cleaner way to set a timeout in the end since it pulls it from config. Really just a placeholder now. Any thoughts, suggestions?

And we don't really want user to call submit_python_job, see dbt-labs/dbt-core#5596

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am gonna just merge this, feel free to open an issue if we feel like other ways to do timeout is better!

@ChenyuLInx
Copy link
Contributor Author

Merging since it only fails on one test that's being fixed in #445

@ChenyuLInx ChenyuLInx merged commit cef098f into main Aug 30, 2022
@ChenyuLInx ChenyuLInx deleted the enhancement/update-submission-method branch August 30, 2022 23:49
@ueshin
Copy link
Contributor

ueshin commented Aug 31, 2022

Sorry, now I'm wondering why run_dbt(["run"]) in the test TestChangingSchemaSpark

doesn't fail?

nvm, maybe I'm missing something in dbt-databricks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1021] Avoid creating notebook as the default way of running python model
4 participants