Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateful Notebooks #9298

Merged
merged 42 commits into from
Oct 1, 2024
Merged

Stateful Notebooks #9298

merged 42 commits into from
Oct 1, 2024

Conversation

shubham3121
Copy link
Member

Description

Please include a summary of the change, the motivation, and any additional context that will help others understand your PR. If it closes one or more open issues, please tag them as described here.

Affected Dependencies

List any dependencies that are required for this change.

How has this been tested?

  • Describe the tests that you ran to verify your changes.
  • Provide instructions so we can reproduce.
  • List any relevant details for your test configuration.

Checklist

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rasswanth-s rasswanth-s marked this pull request as ready for review September 26, 2024 06:14
rasswanth-s and others added 5 commits September 26, 2024 13:06
modified purge_worker_pool to purge_workers

Co-authored-by: Shubham Gupta <shubhamgupta3121@gmail.com>
Co-authored-by: Shubham Gupta <shubhamgupta3121@gmail.com>
@shubham3121 shubham3121 changed the title [WIP] Stateful Notebooks Stateful Notebooks Sep 26, 2024
Copy link
Contributor

@yashgorana yashgorana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we planning on adding the checkpoint code to all the notebooks in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using dev-latest creates issues with latest image not getting pulled during k8s development.. can we avoid adding this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are issues we face ? Right now the problem is that we want a way to spin up a cluster with the same tag as checkpoint is created, but since the DEVSPACE_TIMESTAMP is dynamic we cannot create images with a static tag. Also, Currently we're tagging an image with both DEVSPACE_TIMSTAMP tag and tag-latest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that image:dev-latest does not always get updated if you re-deploy after any code changes, breaking development experience and creating confusion. I previously had to revert this exact change from Rasswanth as well - because it just never worked.

dev-latest might work in CI because there's only one image, but let's not add dev-latest anywhere as it ends up creating confusion, and just causes divergence in expectation while debugging locally, in notebooks and CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the image tag is constantly changing, then the info about the image tag in the db is stale right ? Means the pools cannot use the redeployed versions anyways

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion:

  1. set the tags such that timestamp is always on top.
    • dev-${DEVSPACE_TIMESTAMP}
    • dev-latest
  2. don't change global.version value - keep it dev-${DEVSPACE_TIMESTAMP},
  3. use dev-latest for worker pool

Resolved in: 9e98cf4

@shubham3121
Copy link
Member Author

are we planning on adding the checkpoint code to all the notebooks in this PR?

Yes planning to do that as a new PR. We will create checkpoint at end of each notebook and then upload them as artifacts in CI.

update bigquery notebook to filter big query image
@yashgorana
Copy link
Contributor

are we planning on adding the checkpoint code to all the notebooks in this PR?

Yes planning to do that as a new PR. We will create checkpoint at end of each notebook and then upload them as artifacts in CI.

Let's not use artifacts; reason being 1) issues with conflicting caches across multiple version/branches/nightlies/pr like we have with current pip cache 2) goals was so that local testing can leverage these checkpoint files directly.

I'd recommend keeping it simple and checking them into the repo itself. That way everyone can run it locally almost instantly and we don't have to worry about wrong checkpoint versions being loaded for some other syft version on CIs.

@shubham3121
Copy link
Member Author

shubham3121 commented Sep 30, 2024

are we planning on adding the checkpoint code to all the notebooks in this PR?

Yes planning to do that as a new PR. We will create checkpoint at end of each notebook and then upload them as artifacts in CI.

Let's not use artifacts; reason being 1) issues with conflicting caches across multiple version/branches/nightlies/pr like we have with current pip cache 2) goals was so that local testing can leverage these checkpoint files directly.

I'd recommend keeping it simple and checking them into the repo itself. That way everyone can run it locally almost instantly and we don't have to worry about wrong checkpoint versions being loaded for some other syft version on CIs.

But don't we do that for K8s logs ? For each k8s run in CI the logs are available for us to download at end of tests. We were thinking of a similar behaviour for checkpoints as well, atleast when tests fail.

I also agree local testing is the priority but I still think we can leverage it in CI atleast for notebooks and maybe just scenario notebooks.

Also caching uses a different plugin and we don't want to persist checkpoints across PRs, instead have a copy available if things break in CI for local debugging.

@yashgorana
Copy link
Contributor

are we planning on adding the checkpoint code to all the notebooks in this PR?

Yes planning to do that as a new PR. We will create checkpoint at end of each notebook and then upload them as artifacts in CI.

Let's not use artifacts; reason being 1) issues with conflicting caches across multiple version/branches/nightlies/pr like we have with current pip cache 2) goals was so that local testing can leverage these checkpoint files directly.
I'd recommend keeping it simple and checking them into the repo itself. That way everyone can run it locally almost instantly and we don't have to worry about wrong checkpoint versions being loaded for some other syft version on CIs.

But don't we do that for K8s logs ? For each k8s run in CI the logs are available for us to download at end of tests. We were thinking of a similar behaviour for checkpoints as well, atleast when tests fail.

I also agree local testing is the priority but I still think we can leverage it in CI atleast for notebooks and maybe just scenario notebooks.

Also caching uses a different plugin and we don't want to persist checkpoints across PRs, instead have a copy available if things break in CI for local debugging.

ah i must've confused cache with artifacts. okay this looks cool.

Copy link
Contributor

@yashgorana yashgorana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work @shubham3121 @rasswanth-s =)

@yashgorana yashgorana merged commit 9af1e36 into dev Oct 1, 2024
39 checks passed
@yashgorana yashgorana deleted the shubham/notebooks-with-state branch October 1, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants