Cache requirements and a minimal implementation #7

akhmerov · 2020-02-22T14:52:06Z

I would like to document my thoughts wrt caching, partly inspired by the exploration of different approaches by @chrisjsewell. I hope these would be useful.

Requirements

I consider the cache limited in scope to the task of building a book/site out of a collection of input files containing code to be executed and possibly other scripts.
Rebuilding the complete cache may take a few minutes, but is unlikely going to be much longer.
I expect that the execution will use the notebook abstraction, i.e. the input to the execution is a sequence of notebooks, with each notebook containing a kernel name and a sequence of cells to be executed.
The notebooks must adhere to the following contract:
- They should rely on assets in a controlled location (e.g. same folder as the source files).
- Their execution result should be the same regardless of the order in which it was carried out.
- The notebooks may write additional files in a different specified location.
The caching logic should not determine whether the external dependencies (scripts/installed libraries) have invalidated the outputs of the notebook because it is too complex to implement.
The end users shouldn't learn how to operate the cache, beyond "wipe it clean".

Minimal implementation

Create a folder for the cache within sphinx build directory
Whenever the build process encounters a notebook, it hashes (kernel_name, code_cells), creates a subfolder with that hash, links the notebook execution context from that folder, executes the notebook, and writes it in that folder.
Invalidation is either sphinx clean or deleting the cache folder.
When collecting the execution artifacts, sphinx copies all files from these folders.

The text was updated successfully, but these errors were encountered:

chrisjsewell · 2020-02-22T17:10:53Z

Hmm, I agree with ~most of these points.

Rebuilding the complete cache may take a few minutes, but is unlikely going to be much longer.

You mean re-running all the notebooks? Well Jupinx take a few hours to rebuild all theirs, so I think that's a bit optimistic.

Create a folder for the cache within sphinx build directory

Just to clarify, the cache has nothing to do with sphinx. Sphinx may use it, but it should be able to be used independently.

akhmerov · 2020-02-22T19:45:10Z

Just to clarify, the cache has nothing to do with sphinx. Sphinx may use it, but it should be able to be used independently.

Indeed, keeping the cache folder within sphinx build folder is how I imagine sphinx could use the cache.

You mean re-running all the notebooks? Well Jupinx take a few hours to rebuild all theirs, so I think that's a bit optimistic.

Fair enough. I have a course that takes about an hour to build sequentially, indeed.

Additions/observations based on the above:

Isolating the outputs of each notebook into a folder is right now missing from Proposal for git based cache #6. Without that we cannot tell if a non-notebook artifact should be used or not.
I am wondering if the expectations of what exactly the notebook output produces make the cache useful broader than for book building. These seem rather specific.
I agree that the option of manual invalidation seems useful for very long courses. How about cache location being configurable. Computationally cheap projects could store everything in sphinx build folder, and benefit from easier cleanup, while the more expensive ones would need a more fine-grained cache control and have it outside? At the same time, if there's a good CLI external to sphinx that doesn't advertise fine-grained cache manipulation too much, also an external cache folder doesn't hurt.

akhmerov mentioned this issue Feb 23, 2020

Hash Based Cache #8

Merged

chrisjsewell added the discussion label Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache requirements and a minimal implementation #7

Cache requirements and a minimal implementation #7

akhmerov commented Feb 22, 2020 •

edited

Loading

chrisjsewell commented Feb 22, 2020

akhmerov commented Feb 22, 2020 •

edited

Loading

Cache requirements and a minimal implementation #7

Cache requirements and a minimal implementation #7

Comments

akhmerov commented Feb 22, 2020 • edited Loading

Requirements

Minimal implementation

chrisjsewell commented Feb 22, 2020

akhmerov commented Feb 22, 2020 • edited Loading

akhmerov commented Feb 22, 2020 •

edited

Loading

akhmerov commented Feb 22, 2020 •

edited

Loading