Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 'partition_by' requirement from 'group_by' method #649

Merged
merged 2 commits into from
Dec 2, 2024

Conversation

dreadatour
Copy link
Contributor

This will allow us to run queries like this:

from datachain import C, DataChain, func
from sqlalchemy import case


(
  DataChain.from_values(name=["foo", "bar", "baz"], anxiety=[0.0, 1.0, 0.5], boredom=[0.7, 0.0, 0.05])
  .mutate(
    anxiety=case((C("anxiety") > 0.1, 1), else_=0),
    boredom=case((C("boredom") > 0.1, 1), else_=0),
  )
  .group_by(
    anxiety=func.sum("anxiety"),
    boredom=func.sum("boredom"),
  )
  .show()
)

@dreadatour dreadatour requested a review from a team December 2, 2024 06:03
@dreadatour dreadatour self-assigned this Dec 2, 2024
Copy link

cloudflare-workers-and-pages bot commented Dec 2, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 84f702c
Status: ✅  Deploy successful!
Preview URL: https://a72b4a0d.datachain-documentation.pages.dev
Branch Preview URL: https://improve-group-by.datachain-documentation.pages.dev

View logs

@dreadatour
Copy link
Contributor Author

Will add tests for group_by without partition_by in the follow-up PR.

@skshetry skshetry changed the title Remove 'partition_by' requirement from 'group_dy' method Remove 'partition_by' requirement from 'group_by' method Dec 2, 2024
Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.64%. Comparing base (bf7d670) to head (84f702c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/lib/dc.py 33.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #649      +/-   ##
==========================================
+ Coverage   87.62%   87.64%   +0.02%     
==========================================
  Files         111      111              
  Lines       10599    10598       -1     
  Branches     1436     1435       -1     
==========================================
+ Hits         9287     9289       +2     
+ Misses        948      946       -2     
+ Partials      364      363       -1     
Flag Coverage Δ
datachain 87.59% <60.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dreadatour dreadatour merged commit 00e03c8 into main Dec 2, 2024
37 of 38 checks passed
@dreadatour dreadatour deleted the improve-group-by branch December 2, 2024 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants