Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serializing internal state #344

Open
Kixiron opened this issue Nov 28, 2020 · 5 comments
Open

Support serializing internal state #344

Kixiron opened this issue Nov 28, 2020 · 5 comments

Comments

@Kixiron
Copy link
Contributor

Kixiron commented Nov 28, 2020

I'm working on a user-sided application with the goal of fast response times, and I've really been wanting a way to cache the internal state of dataflows so that it can be quickly recreated across restarts, enabling as fast of a restart time as possible while also skipping work that was already done in previous lifetimes of the program

@munro
Copy link

munro commented Sep 9, 2021

@Kixiron have you discovered any way to make this possible? I'm also interested—I would really love a DB that's like SQLite, but differential 😻

I wandered around the code base a bit, I'm not sure if it's possible without patching—or wrapping all the objects to log state because the subgraph fields are private. But these are the areas of interest that I saw from digging around:

let mut operator = subscope.into_inner().build(self);

pub struct SubgraphBuilder<TOuter, TInner>
where
TOuter: Timestamp,
TInner: Timestamp,
{
/// The name of this subgraph.
pub name: String,
/// A sequence of integers uniquely identifying the subgraph.
pub path: Vec<usize>,
/// The index assigned to the subgraph by its parent.
index: usize,
// handles to the children of the scope. index i corresponds to entry i-1, unless things change.
children: Vec<PerOperatorState<TInner>>,
child_count: usize,
edge_stash: Vec<(Source, Target)>,
// shared state written to by the datapath, counting records entering this subgraph instance.
input_messages: Vec<Rc<RefCell<ChangeBatch<TInner>>>>,
// expressed capabilities, used to filter changes against.
output_capabilities: Vec<MutableAntichain<TOuter>>,
/// Logging handle
logging: Option<Logger>,
/// Progress logging handle
progress_logging: Option<ProgressLogger>,
}

pub subgraph: &'a RefCell<SubgraphBuilder<G::Timestamp, T>>,

@munro
Copy link

munro commented Sep 9, 2021

image

paths: Rc<RefCell<HashMap<usize, Vec<usize>>>>,

Worker.paths is also looking very interesting!

@Kixiron
Copy link
Contributor Author

Kixiron commented Sep 9, 2021

Unfortunately not, my hopes are mostly in disk backed differential arrangements but I don't think there's much progress towards that

@munro
Copy link

munro commented Sep 9, 2021

disk backed differential arrangements

Sameee, I would love that please — even just applying simple maps would be fine for me right now as well — not that the incremental is hard to write, but I would like to not, if I don't have to. Have you been exploring what it would take for what you're imagining?

I'm wondering what would happen if I just started applying Serialize & Deserialize to things until something interesting happens 🤣

@Kixiron
Copy link
Contributor Author

Kixiron commented Sep 9, 2021

By and large it's a significantly more complex problem than just adding Serialize to things, dataflow construction isn't the expensive part of reviving a dataflow, the expense lies in rebuilding indices over massive amounts of data data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants