Stability between cells - using previous output without regenerating. #194

avimar · 2024-08-21T09:20:12Z

One of the features in the getting started, and of notebooks in general I thought, it using the previous cell's output.

I naively tried with a random number generator in one cell, then a second cell with results based on that.

I'm quite surprised to realize that it didn't cache or take the output from the previous cell, but re-ran it entirely. I imagine you want want this sometimes, but it was definitely not what I expected.

On second viewing, of course that's how JS works with file imports. But I'm in a notebook, and I really expected cached values! That seemingly would also mean any external inputs - e.g. AI would be re-done, and any computations would also be re-calculated.

benjreinhart · 2024-08-21T18:20:12Z

We need to write a longer post on our thoughts here, but this is an intentional yet non-obvious design choice we made. We are currently stateless, unlike Jupyter or livebook which are stateful. There are pros and cons with this choice, but among the pros was the idea that each cell is a file of TypeScript code and thus anything you can / would do in a "real" setting you do in Srcbook. At first this may seem weird but we took the bet that overtime this is superior because it seamlessly allows importing code from your app / exporting srcbook to an new app more doable than if you had a bunch of random statements.

However, the issue you pointed out is one we want to provide a solution for. Sometimes you want to cache the result of some code that was executed and right now there's no built-in way to do this. For now, you could have one cell run its computation and then write to a file or database, having subsequent cells read from the file or DB instead of import the cell. We're working on a built-in way to do this though so that you can be just as productive as a stateful cell notebook.

Here's one potential idea:

// cell a
import cache from '@srcbook/cache';
export default cache(() => {
  return expensiveComputation();
});

// cell b
import computation from 'cell';

// computed on first invocation, cached on subsequent calls
computation();

We're going to play around with a couple interfaces but I imagine it would be something kinda like this. What do you think?

swk777 · 2024-08-22T06:50:37Z

excellent job on the project and I'd like to share some thoughts

1.I'm curious about how users might adapt to the stateless design, as it's quite different from traditional notebooks. Many users are accustomed to persistent cell outputs, which they find helpful for incremental analysis.

2.the potential impact on performance, especially for cells with complex computations or API calls.

3.The proposed caching solution adds an extra layer of complexity that users must manage. This goes against the principle of reducing cognitive load, which is a key benefit of notebook environments. By requiring users to handle data persistence manually, we risk overwhelming them with technical details that detract from their primary task of data analysis or code development.

avimar · 2024-08-22T08:49:46Z

A "cache output" checkbox next to the run button (user settings if it's automatically checked) sounds like a great UI.

But I have no clue technically how to implement that. You'd be memoizing the function, and storing it somewhere, but I don't know how you'd do that auto-magically.

Yonom · 2024-08-24T07:22:41Z

My first guess would be to keep a persistent context (global object) across executions:

import vm from "node:vm";

const notebookContext = vm.createContext();

const runCell = (codeInCell: string) => {
  vm.executeInContext(codeInCell, notebookContext);
}

But it's a whole can of worms to handle async, pending callbacks, IO handles etc. You also probably want to wrap the whole thing in a separate process or isolate to be able to suspend/restart notebooks individually. I don't know if a project handling this already exists?

benjreinhart · 2024-08-26T03:19:03Z

When we first started this project, I wrote an implementation of cells using the vm module to be more like Jupyter. It was limited in what it can do and I ran into a few different problems that seemed like deal breakers. Unfortunately, I don't remember the specifics, so maybe we'll have to take another stab at this.

I suspect the vm module alone will not be powerful enough to match the behavior of a Jupyter-like experience. I'm willing to be proven wrong here, so if anyone is interested in a taking a stab at this we can coordinate / work together.

Lastly, even if we have an obvious solution to something like Jupyter, there are problems with that experience that I want to avoid introducing in Srcbook if possible (e.g., reproducibility) so we need to at least consider that in any proposed design. Marimo has worked hard to solve some of these problems, but you can only go so far when using a language that supports mutability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability between cells - using previous output without regenerating. #194

Stability between cells - using previous output without regenerating. #194

avimar commented Aug 21, 2024 •

edited

Loading

benjreinhart commented Aug 21, 2024 •

edited

Loading

swk777 commented Aug 22, 2024 •

edited

Loading

avimar commented Aug 22, 2024

Yonom commented Aug 24, 2024 •

edited

Loading

benjreinhart commented Aug 26, 2024 •

edited

Loading

Stability between cells - using previous output without regenerating. #194

Stability between cells - using previous output without regenerating. #194

Comments

avimar commented Aug 21, 2024 • edited Loading

benjreinhart commented Aug 21, 2024 • edited Loading

swk777 commented Aug 22, 2024 • edited Loading

avimar commented Aug 22, 2024

Yonom commented Aug 24, 2024 • edited Loading

benjreinhart commented Aug 26, 2024 • edited Loading

avimar commented Aug 21, 2024 •

edited

Loading

benjreinhart commented Aug 21, 2024 •

edited

Loading

swk777 commented Aug 22, 2024 •

edited

Loading

Yonom commented Aug 24, 2024 •

edited

Loading

benjreinhart commented Aug 26, 2024 •

edited

Loading