You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to document my thoughts wrt caching, partly inspired by the exploration of different approaches by @chrisjsewell. I hope these would be useful.
Requirements
I consider the cache limited in scope to the task of building a book/site out of a collection of input files containing code to be executed and possibly other scripts.
Rebuilding the complete cache may take a few minutes, but is unlikely going to be much longer.
I expect that the execution will use the notebook abstraction, i.e. the input to the execution is a sequence of notebooks, with each notebook containing a kernel name and a sequence of cells to be executed.
The notebooks must adhere to the following contract:
They should rely on assets in a controlled location (e.g. same folder as the source files).
Their execution result should be the same regardless of the order in which it was carried out.
The notebooks may write additional files in a different specified location.
The caching logic should not determine whether the external dependencies (scripts/installed libraries) have invalidated the outputs of the notebook because it is too complex to implement.
The end users shouldn't learn how to operate the cache, beyond "wipe it clean".
Minimal implementation
Create a folder for the cache within sphinx build directory
Whenever the build process encounters a notebook, it hashes (kernel_name, code_cells), creates a subfolder with that hash, links the notebook execution context from that folder, executes the notebook, and writes it in that folder.
Invalidation is either sphinx clean or deleting the cache folder.
When collecting the execution artifacts, sphinx copies all files from these folders.
The text was updated successfully, but these errors were encountered:
Just to clarify, the cache has nothing to do with sphinx. Sphinx may use it, but it should be able to be used independently.
Indeed, keeping the cache folder within sphinx build folder is how I imagine sphinx could use the cache.
You mean re-running all the notebooks? Well Jupinx take a few hours to rebuild all theirs, so I think that's a bit optimistic.
Fair enough. I have a course that takes about an hour to build sequentially, indeed.
Additions/observations based on the above:
Isolating the outputs of each notebook into a folder is right now missing from Proposal for git based cache #6. Without that we cannot tell if a non-notebook artifact should be used or not.
I am wondering if the expectations of what exactly the notebook output produces make the cache useful broader than for book building. These seem rather specific.
I agree that the option of manual invalidation seems useful for very long courses. How about cache location being configurable. Computationally cheap projects could store everything in sphinx build folder, and benefit from easier cleanup, while the more expensive ones would need a more fine-grained cache control and have it outside? At the same time, if there's a good CLI external to sphinx that doesn't advertise fine-grained cache manipulation too much, also an external cache folder doesn't hurt.
I would like to document my thoughts wrt caching, partly inspired by the exploration of different approaches by @chrisjsewell. I hope these would be useful.
Requirements
Minimal implementation
(kernel_name, code_cells)
, creates a subfolder with that hash, links the notebook execution context from that folder, executes the notebook, and writes it in that folder.sphinx clean
or deleting the cache folder.The text was updated successfully, but these errors were encountered: