Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datachain: generalize data access functions into collect(), and collect_flatten #121

Merged
merged 4 commits into from
Jul 22, 2024

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jul 22, 2024

Closes #67.

Copy link

codecov bot commented Jul 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.85%. Comparing base (7f7ed1a) to head (ffa5c72).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #121      +/-   ##
==========================================
+ Coverage   85.71%   86.85%   +1.13%     
==========================================
  Files          93       88       -5     
  Lines        9493     9364     -129     
  Branches     1897     1876      -21     
==========================================
- Hits         8137     8133       -4     
+ Misses       1024      899     -125     
  Partials      332      332              
Flag Coverage Δ
datachain 86.78% <100.00%> (+1.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry skshetry force-pushed the generalize-data-access branch from 08b4fbc to d0637ff Compare July 22, 2024 09:01
Copy link

cloudflare-workers-and-pages bot commented Jul 22, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: ffa5c72
Status: ✅  Deploy successful!
Preview URL: https://b61cffb2.datachain-documentation.pages.dev
Branch Preview URL: https://generalize-data-access.datachain-documentation.pages.dev

View logs

@skshetry skshetry force-pushed the generalize-data-access branch from d0637ff to f119a3a Compare July 22, 2024 09:06
@skshetry skshetry marked this pull request as ready for review July 22, 2024 09:06
@skshetry skshetry force-pushed the generalize-data-access branch from f119a3a to 06df3f1 Compare July 22, 2024 09:13
Copy link
Contributor

@ilongin ilongin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, now it's much clearer how to fetch data as those redundant methods were confusing.

examples/json-csv-reader.py Outdated Show resolved Hide resolved
src/datachain/lib/dc.py Outdated Show resolved Hide resolved
@skshetry skshetry force-pushed the generalize-data-access branch from e882ab2 to 7a091aa Compare July 22, 2024 11:48
src/datachain/lib/dc.py Outdated Show resolved Hide resolved
src/datachain/lib/dc.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Contributor

@skshetry We should coordinate this with #111. Are you ready to merge this, or should I go ahead with #111?

@skshetry
Copy link
Member Author

skshetry commented Jul 22, 2024

Feel free to merge. I will handle conflicts (if any) and merge in the morning.

def collect(self, col: str) -> Iterator[DataType]: ... # type: ignore[overload-overlap]

@overload
def collect(self, *cols: str) -> Iterator[tuple[DataType, ...]]: ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a way to force the output to be a tuple, working with a variable cols is going to be awkward otherwise as the corner cases of len(cols) in [0,1] need to be handled.

Copy link
Member Author

@skshetry skshetry Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. One idea that I have is we can make the first argument to be (arg: tuple[str] | str, *args: str).

But that complicates signature.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dmpetrov dmpetrov merged commit 1833918 into main Jul 22, 2024
19 checks passed
@dmpetrov dmpetrov deleted the generalize-data-access branch July 22, 2024 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generalize data access functions
6 participants