Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write to Icechunk as intermediate store? #601

Open
TomNicholas opened this issue Oct 28, 2024 · 3 comments
Open

Write to Icechunk as intermediate store? #601

TomNicholas opened this issue Oct 28, 2024 · 3 comments

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Oct 28, 2024

Maybe useful? Could simply roll back to state before failed stage. Also then it's Icechunk 's problem to worry about atomic writes... Idea from:

A reason to not run backup tasks is if the filesystem does not support atomic writes. Cloud object stores generally are atomic (see https://cubed-dev.github.io/cubed/user-guide/reliability.html#stragglers), but local filesystems are not.

Discussing with @applio, we thought this PR should be changed so that the default was based on the store rather than the executor. Backup tasks would be off by default, except if the store was a well-known cloud store like S3 or GCS.

Originally posted by @tomwhite in #600 (comment)

@TomNicholas TomNicholas changed the title Write to Icechunk as intermediate store? Maybe useful? Could simply roll back to state before failed stage. Also then it's Icechunk 's problem to worry about atomic writes... Idea from: A reason to not run backup tasks is if the filesystem does not support atomic writes. Cloud object stores generally are atomic (see https://cubed-dev.github.io/cubed/user-guide/reliability.html#stragglers), but local filesystems are not. Write to Icechunk as intermediate store? Oct 28, 2024
@tomwhite
Copy link
Member

The interesting case is HPC, where there are multiple nodes (and hence the possibility of stragglers), but an intermediate store that uses a shared filesystem that does not support atomic writes. Icechunk might be useful to provide atomicity - but perhaps there are other ways too?

Also, I'd like to move to general blob stores (not just Zarr stores) for the intermediate store, so we have more control over the chunk access pattern (to control memory usage). Work like #464 will enable this.

@TomNicholas
Copy link
Member Author

TomNicholas commented Dec 13, 2024

Using icechunk to store intermediate data might be helpful for resuming computations - each completed stage of the plan would write a new commit to one icechunk store that holds all intermediate data and the resuming code simply checks out the correct commit then restarts from that.

@tomwhite
Copy link
Member

Using icechunk to store intermediate data might be helpful for resuming computations - each completed stage of the plan would write a new commit to one icechunk store that holds all intermediate data and the resuming code simply checks out the correct commit then restarts from that.

This would work nicely at the level of whole intermediate arrays - but not for resuming from a partially written intermediate array - but that may be OK (and is actually the current level of support for resume).

It also might be worth thinking about how this could work for incremental updates of regions of arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants