Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not pre-load dataset version preview from string #642

Merged
merged 1 commit into from
Nov 29, 2024

Conversation

mattseddon
Copy link
Member

@mattseddon mattseddon commented Nov 28, 2024

This PR ensures that we don't load a dataset version's preview to a dict from a string until it has been requested. This should mean that we can continue to use the pattern of loading all version for a dataset via catalog.get_dataset without taking the performance hit of loading all of the dataset version previews via json.loads

@mattseddon mattseddon self-assigned this Nov 28, 2024
Copy link

cloudflare-workers-and-pages bot commented Nov 28, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: a94eac4
Status: ✅  Deploy successful!
Preview URL: https://ce149a29.datachain-documentation.pages.dev
Branch Preview URL: https://do-not-load-preview.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.73%. Comparing base (3bd22ad) to head (a94eac4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/dataset.py 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #642      +/-   ##
==========================================
+ Coverage   87.70%   87.73%   +0.03%     
==========================================
  Files         112      112              
  Lines       10672    10676       +4     
  Branches     1437     1437              
==========================================
+ Hits         9360     9367       +7     
+ Misses        954      950       -4     
- Partials      358      359       +1     
Flag Coverage Δ
datachain 87.68% <90.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon mattseddon marked this pull request as ready for review November 29, 2024 00:44
@mattseddon mattseddon requested a review from a team November 29, 2024 00:44
Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

@cached_property
def preview(self) -> Optional[list[dict]]:
if isinstance(self._preview_data, str):
return json.loads(self._preview_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use orjson as it is faster?
See #178

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider but was scared... should be fine 😬

@mattseddon mattseddon merged commit 08f4625 into main Nov 29, 2024
38 checks passed
@mattseddon mattseddon deleted the do-not-load-preview branch November 29, 2024 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants