-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to persist dataset even if exception is thrown #504
Conversation
for more information, see https://pre-commit.ci
Deploying datachain-documentation with Cloudflare Pages
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #504 +/- ##
=======================================
Coverage 87.10% 87.11%
=======================================
Files 92 92
Lines 9852 9832 -20
Branches 1350 1349 -1
=======================================
- Hits 8582 8565 -17
Misses 917 917
+ Partials 353 350 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
src/datachain/catalog/catalog.py
Outdated
@@ -1218,6 +1218,10 @@ def register_dataset( | |||
preview=dataset_version.preview, | |||
job_id=dataset_version.job_id, | |||
) | |||
|
|||
session = Session.get_current_context() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
session
can be also passed to DataChain explicitly (check from_storage
, for example) ... it's probably expected that we pass it directly into all the datacnain factories even when we use contextmanager ... not sure get_current_context
is the right approach ... at least it might be changing the assumption ... can we check docs / existing tests please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the logic along with example in the file where the variable session is used. PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to move forward, but this design Session -> Catalog -> Sessions looks a bit weird to me.
Thanks. Modified the chain logic. |
Refactoring upon the changes at #494 from the comments, this introduces the way to allow a way to persist dataset even if exception is thrown. With this change, the cleanup is now done at the active context. So, we can use the session context as checkpoints now. Example implementation is as:
datachain/tests/scripts/feature_class_exception.py
Lines 1 to 34 in 029e677
In this file,
local_test_datachain
should only persist since it is the only dataset where the exception doesn't occur in current context.Closes #500