Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add notebooks to default package suffixes #663

Closed
wants to merge 1 commit into from

Conversation

ryan-williams
Copy link
Contributor

(factored out of #612)

Notebooks should be included in code packages by default, I think?

We've put this in our scripts that invoke Metaflow, to get the desired behavior:

os.environ["METAFLOW_DEFAULT_PACKAGE_SUFFIXES"] = ".py,.R,.RDS,.ipynb"

@ryan-williams ryan-williams mentioned this pull request Aug 24, 2021
@savingoyal
Copy link
Collaborator

@ryan-williams
Copy link
Contributor Author

@ryan-williams:

maybe it’s borne of a misunderstanding on my end about the way notebooks are typically used in Metaflow.
We have a step that uses papermill to execute some notebooks on data artifacts (computed by previous steps in the flow); the notebooks in that case are “source code” analogous to .py files that would be included in the source package sent to Batch.

We are able to get them included by augmenting the METAFLOW_DEFAULT_PACKAGE_SUFFIXES env var, but the tutorials’ dealings with notebooks seemed similar to this, so I thought it might be necessary there as well, and useful to add by default. Let me know if that’s not the case.

@savingoyal:

The tutorials use notebooks as a scratch space to access the state of the workflow - outside of the workflow execution. That's the reason why we don't package .ipynb files. Also, these files can be rather voluminous (given that they may contain execution state). Unless there is a different pressing concern for including notebooks by default, we should err on the side of caution and tightly limit what gets packaged by default.

@ryan-williams:

gotcha, I see that now. I assumed they just mirrored the .py files as an alternate way to run the flows. My mistake, I’ll close that PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants