-
Notifications
You must be signed in to change notification settings - Fork 21
Add GroupBy.aggregate (and tpch-1 query to examples) #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lineitem = lineitem.assign( | ||
[ | ||
( | ||
lineitem.get_column_by_name("l_extended_price") | ||
* (1 - lineitem.get_column_by_name("l_discount")) | ||
).rename("l_disc_price"), | ||
( | ||
lineitem.get_column_by_name("l_extended_price") | ||
* (1 - lineitem.get_column_by_name("l_discount")) | ||
* (1 + lineitem.get_column_by_name("l_tax")) | ||
).rename("l_charge"), | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this syntax though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shwina - are you ok with this syntax?
Personally I think it's worse than what's in any existing dataframe library, and I can't imagine any user ever wanting to write code like this
but maybe it's just me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please correct me if I'm wrong, but I thought the goal of the standard right now is to provide an API focused on third-party library developers (not end users). This is why we have been comfortable sacrificing syntactic crispness or an expressive API in favor of being the "lowest common denominator" that all libraries can implement.
I think this necessarily means the API isn't quite as nice to work with for the end-user.
For example, changing get_column_by_name
to just [ ]
in the code above would be a massive boost in readability, but we explicitly decided against it because (IIRC) we wanted library authors to have the freedom to decide what [ ]
should mean for their library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, I agree with you 100% that this looks a mess. It's a question whether library developers are going to be OK with dealing with a messy API to get cross-library compatibility in return...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you 100% that this looks a mess
Well I'm glad we could find some common ground 😄
Let's discuss more next week - I'm genuinely interested in finding a solution that works for everybody
My current prediction is that, unless the standard drastically improves, that libraries will just support pandas and Polars and ignore the standard completely
The end result for cudf will be that you'll be no better off than you are now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was saying...(emphasis mine)
I'm pretty upset about having to use df.get_column_by_name("a") instead of a simpler df["a"] or col("a"). This will obfuscate our code and impair readability, and therefore we may consider keeping our duplicate logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. We should shorten the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being addressed in #290
977a80d
to
fdc1c55
Compare
fdc1c55
to
21be6ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the move to df.col
thanks for your review we can rename to |
closes #274
the gist of the PR is that it lets you write