Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Auto-execute if locally dependent file is changed #87

Open
mpacer opened this issue Dec 22, 2016 · 4 comments
Open

Feature request: Auto-execute if locally dependent file is changed #87

mpacer opened this issue Dec 22, 2016 · 4 comments
Labels

Comments

@mpacer
Copy link

mpacer commented Dec 22, 2016

So I'm guessing this is a pretty much impossible feature request, but I was running into a tiny bug in responding to jupyter/nbconvert#500 and testing the resulting docs.

I changed the template_structure.html file and ran make clean but nothing was appearing in the resulting built docs.

I noticed pretty early on that this html was being embedded within another page, which I figured shouldn't be a problem if it was being "included" in some way.

Eventually I realised that it was a .ipynb file that was displaying it with

from IPython.display import HTML, display
with open('template_structure.html') as f:
    display(HTML(f.read()))

And in looking into nbsphinx's execution syntax, it was not re-executing the notebook (even though this dependency changed) because the notebook did have output and so it didn't trigger the auto rebuild.

It would be nice if it would be able to detect that a dependency exists and then reëxecute the notebook when that is needed, not only when there are no outputs.

How the notebook/nbconvert could make this more feasible

That would probably be easier if we were to store a list of file-system dependencies that are required by the notebook (maybe saving them in the metadata when executing the command that created the dependency). Though that amount of inspection is probably overkill for the average use case, it might be possible to have an nbconvert preprocessor that does something like this and returns the notebook in place with the relevant metadata.

Aside on partial execution

Since we're talking about pipe dream features, if you were able to do this, one thing that might make this even more cute and efficient would be to only execute up to latest cell whose dependencies have changed. I.e., where you pass in an integer associated with a flag to an nbconvert --execute command and it builds up to and including that cell ( nbconvert --execute --upto 5 would build cells 0 through 5). I don't think anything like this exists in the execute preprocessor today, but this doesn't seem too unreasonable of a feature to implement on the nbconvert side.

@mgeier
Copy link
Member

mgeier commented Dec 22, 2016

I'm sure there are a million things that are more urgent, but this is definitely an interesting proposal.

BTW, I love your use of the diaeresis, I haven't seen that before!

I think we should separate two things more clearly:

  1. The decision if Sphinx re-parses a source file (in our case a notebook) even if it was parsed before
  2. The decision if nbsphinx auto-executes a notebook during the Sphinx parsing process

I think we could definitely work on the first, I'm not so sure about the second.

Also, note that Sphinx already stores state between invocations. Should we use this state or create our own? Or both?

Another question is if this state should be only valid on a single machine, of if it should be shared between different users and machines.
I think only the former is feasible.

The problem with storing the dependencies in the notebook metadata is that when a cell is deleted, it might leave obsolete dependency information.
Having the dependency information for each cell would probably be a little better, but you could still change the code in an input cell without executing the cell, again leaving obsolete dependency information, this time in the cell metadata.

Apart from that, I think this would make the auto-execute heuristics less clear to the user. Maybe people would be surprised if their (probably long-running) notebook is executed only because some file on their hard disk had changed.
I'm not totally against this, but I would probably try to have two different "auto" options: "execute if no outputs are there" and "execute even if there outputs, but only if some dependencies have changed".
But OTOH, this seems a bit much.

Also, as I just realized while writing this comment, in your motivating case you wouldn't need to use "auto", you could just use the "always" setting and it would still only be executed when something changes (since Sphinx wouldn't parse it again anyway).

I don't know if a separate preprocessor would make sense, but this could be built into the preprocess() method of the ExecutePreprocessor. I guess it should be optional, because it will have to do some extra work, watching file accesses and whatnot.

Specifically, the preprocessor could put any dependency information (probably a list of file names) into the resources dictionary.
From there, it would be easy to pass the dependency files to Sphinx.

With this:

  • you would still have to execute the notebooks at least once, but any subsequent Sphinx calls would only execute the notebook if its dependencies change (or if the notebook itself changes).
  • still only "empty" notebooks would be auto-executed
  • notebooks with "execute always" settings would also take advantage of those dependencies (this would probably be relevant for your motivating example)

The list of dependency files might be quite large. I guess e.g. plotting something in matplotlib will read many source and data files. But since this list is only stored temporarily by Sphinx, the number of files shouldn't be a problem.

@mgeier
Copy link
Member

mgeier commented Dec 22, 2016

re partial execution: I think that's going too far. But anyway, such a thing should not be discussed here but rather at nbconvert or in a larger Jupyter context.

Initially, I was thinking about an "execute" metadata per cell, but then I simply went with the ExecutePreprocessor. I still think it's better this way and I don't think anyone would actually use this, since it would be quite confusing.

@mgeier
Copy link
Member

mgeier commented Jan 19, 2017

@mpacer Any more thoughts? Comments?

@mpacer
Copy link
Author

mpacer commented Jan 24, 2017

Haven't thought about it a lot since the New Year, but I'll read over this over again sometime over the next few days and try to think about steps forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants