Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add DatasetListRecord and DatasetListVersion dataclasses to only extract required fields when listing datasets #634

Merged
merged 1 commit into from
Nov 27, 2024

Conversation

mattseddon
Copy link
Contributor

@mattseddon mattseddon commented Nov 26, 2024

Related to https://github.com/iterative/studio/issues/10849 and replaces #628 (see replaced PR for some details)

This PR adds DatasetListRecord and DatasetListVersion dataclasses and uses those classes within Catalog's ls_datasets and list_datasets_versions methods. Using these new classes means that we can significantly cut down the data being processed when listing datasets. This should give us a performance boost in both the CLI and SaaS and hopefully prevent the slow query alerts in the related issue.

Note: There is some duplication added in this PR, it does not seem practical to extend the original dataclasses in this instance.

Copy link

cloudflare-workers-and-pages bot commented Nov 26, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 1133dab
Status: ✅  Deploy successful!
Preview URL: https://867ae8d3.datachain-documentation.pages.dev
Branch Preview URL: https://add-dataset-list-dataclass.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 9 lines in your changes missing coverage. Please review.

Project coverage is 87.70%. Comparing base (f759eff) to head (1133dab).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/dataset.py 87.75% 3 Missing and 3 partials ⚠️
src/datachain/data_storage/metastore.py 90.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #634      +/-   ##
==========================================
- Coverage   87.74%   87.70%   -0.04%     
==========================================
  Files         112      112              
  Lines       10605    10672      +67     
  Branches     1431     1437       +6     
==========================================
+ Hits         9305     9360      +55     
- Misses        946      954       +8     
- Partials      354      358       +4     
Flag Coverage Δ
datachain 87.64% <88.88%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon mattseddon force-pushed the add-dataset-list-dataclass branch 3 times, most recently from 9f4ddf0 to 53053f6 Compare November 26, 2024 23:39
@mattseddon mattseddon changed the title add DatasetList dataclass to extract only needed fields from metastore add DatasetListRecord and DatasetListVersion dataclasses to extract required fields only when listing datasets Nov 26, 2024
@mattseddon mattseddon self-assigned this Nov 26, 2024
@mattseddon mattseddon changed the title add DatasetListRecord and DatasetListVersion dataclasses to extract required fields only when listing datasets add DatasetListRecord and DatasetListVersion dataclasses to only extract required fields when listing datasets Nov 27, 2024
@mattseddon mattseddon marked this pull request as ready for review November 27, 2024 00:31
@mattseddon mattseddon requested a review from a team November 27, 2024 00:31
Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

@mattseddon mattseddon force-pushed the add-dataset-list-dataclass branch 2 times, most recently from 57edb12 to b86791f Compare November 27, 2024 09:26
@mattseddon mattseddon force-pushed the add-dataset-list-dataclass branch from b86791f to 1133dab Compare November 27, 2024 21:44
@mattseddon mattseddon merged commit 3bd22ad into main Nov 27, 2024
38 checks passed
@mattseddon mattseddon deleted the add-dataset-list-dataclass branch November 27, 2024 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants