Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability between cells - using previous output without regenerating. #194

Open
avimar opened this issue Aug 21, 2024 · 5 comments
Open

Comments

@avimar
Copy link

avimar commented Aug 21, 2024

One of the features in the getting started, and of notebooks in general I thought, it using the previous cell's output.

I naively tried with a random number generator in one cell, then a second cell with results based on that.

I'm quite surprised to realize that it didn't cache or take the output from the previous cell, but re-ran it entirely. I imagine you want want this sometimes, but it was definitely not what I expected.

On second viewing, of course that's how JS works with file imports. But I'm in a notebook, and I really expected cached values! That seemingly would also mean any external inputs - e.g. AI would be re-done, and any computations would also be re-calculated.

image

@benjreinhart
Copy link
Contributor

benjreinhart commented Aug 21, 2024

We need to write a longer post on our thoughts here, but this is an intentional yet non-obvious design choice we made. We are currently stateless, unlike Jupyter or livebook which are stateful. There are pros and cons with this choice, but among the pros was the idea that each cell is a file of TypeScript code and thus anything you can / would do in a "real" setting you do in Srcbook. At first this may seem weird but we took the bet that overtime this is superior because it seamlessly allows importing code from your app / exporting srcbook to an new app more doable than if you had a bunch of random statements.

However, the issue you pointed out is one we want to provide a solution for. Sometimes you want to cache the result of some code that was executed and right now there's no built-in way to do this. For now, you could have one cell run its computation and then write to a file or database, having subsequent cells read from the file or DB instead of import the cell. We're working on a built-in way to do this though so that you can be just as productive as a stateful cell notebook.

Here's one potential idea:

// cell a
import cache from '@srcbook/cache';
export default cache(() => {
  return expensiveComputation();
});
// cell b
import computation from 'cell';

// computed on first invocation, cached on subsequent calls
computation();

We're going to play around with a couple interfaces but I imagine it would be something kinda like this. What do you think?

@swk777
Copy link
Contributor

swk777 commented Aug 22, 2024

excellent job on the project and I'd like to share some thoughts

1.I'm curious about how users might adapt to the stateless design, as it's quite different from traditional notebooks. Many users are accustomed to persistent cell outputs, which they find helpful for incremental analysis.

2.the potential impact on performance, especially for cells with complex computations or API calls.

3.The proposed caching solution adds an extra layer of complexity that users must manage. This goes against the principle of reducing cognitive load, which is a key benefit of notebook environments. By requiring users to handle data persistence manually, we risk overwhelming them with technical details that detract from their primary task of data analysis or code development.

@avimar
Copy link
Author

avimar commented Aug 22, 2024

A "cache output" checkbox next to the run button (user settings if it's automatically checked) sounds like a great UI.

But I have no clue technically how to implement that. You'd be memoizing the function, and storing it somewhere, but I don't know how you'd do that auto-magically.

@Yonom
Copy link

Yonom commented Aug 24, 2024

My first guess would be to keep a persistent context (global object) across executions:

import vm from "node:vm";

const notebookContext = vm.createContext();

const runCell = (codeInCell: string) => {
  vm.executeInContext(codeInCell, notebookContext);
}

But it's a whole can of worms to handle async, pending callbacks, IO handles etc. You also probably want to wrap the whole thing in a separate process or isolate to be able to suspend/restart notebooks individually. I don't know if a project handling this already exists?

@benjreinhart
Copy link
Contributor

benjreinhart commented Aug 26, 2024

When we first started this project, I wrote an implementation of cells using the vm module to be more like Jupyter. It was limited in what it can do and I ran into a few different problems that seemed like deal breakers. Unfortunately, I don't remember the specifics, so maybe we'll have to take another stab at this.

I suspect the vm module alone will not be powerful enough to match the behavior of a Jupyter-like experience. I'm willing to be proven wrong here, so if anyone is interested in a taking a stab at this we can coordinate / work together.

Lastly, even if we have an obvious solution to something like Jupyter, there are problems with that experience that I want to avoid introducing in Srcbook if possible (e.g., reproducibility) so we need to at least consider that in any proposed design. Marimo has worked hard to solve some of these problems, but you can only go so far when using a language that supports mutability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants