-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby-agg all columns based on one column #828
Comments
Hi, yes, first would be a good way to use this. Initially I didn't think we'd need 'last', but negating a date isn't something numpy likes. Maybe you can try:
Indices is a bit problematic, as explained in #579 Regards, Maarten |
Hi Maarten,
Alternatively, this also worked to accomplish the same thing:
Obviously, it's not as concise, but I found it to still be much faster than using Pandas. |
Yeah there's definitely an issue around types in aggregates. num = 59
adf = vaex.from_arrays(ts=np.arange(1, num+1), ts2=np.arange('2005-02-01T00:00:00', '2005-02-01T00:00:59', dtype='datetime64[s]'),
x=np.random.randint(1, 1000, num), y=np.random.randint(1, 1000, num).astype(np.int64))
print(adf.dtypes)
print(adf)
print(adf.first(adf.x, adf.ts))
print(adf.first(adf.x, adf.ts2))
print(adf.first(adf.y, adf.ts))
print(adf.first(adf.y, adf.ts2)) Output: ts int32
ts2 datetime64[s]
x int32
y int64
dtype: object
# ts ts2 x y
0 1 2005-02-01 00:00:00 493 196
1 2 2005-02-01 00:00:01 924 204
2 3 2005-02-01 00:00:02 402 227
3 4 2005-02-01 00:00:03 156 378
4 5 2005-02-01 00:00:04 140 178
... ... ... ... ...
54 55 2005-02-01 00:00:54 999 11
55 56 2005-02-01 00:00:55 10 940
56 57 2005-02-01 00:00:56 706 801
57 58 2005-02-01 00:00:57 680 511
58 59 2005-02-01 00:00:58 556 35
493
924
905
196 |
Hi Maarten, Yes, we definitely need 'last'. It is actually pretty important. Please consider implementing it in the next version, I don't suppose it will have much memory impact I am unable to reverse order the date, by changing the order_expression as suggested by markbarna. Apart from groupby, I am also using BinnerTime. I am hoping to get the temperature reading at the start and the end (last) of every hour interval. Thank you |
Albeit I am not sure such a message is a very constructive one, please, be aware that @maartenbreddels has initiated a PR for this with #1848. Bests, |
My apology sir.
I could have conveyed my message in a different way.
Thank you for informing me about PR #1848.
Best regards.
On Sunday, February 20, 2022, 02:32:52 PM GMT+8, yohplala ***@***.***> wrote:
Yes, we definitely need 'last'. It is actually pretty important.
Albeit I am not sure such a message is a very constructive one, please, be aware that @maartenbreddels has initiated a PR for this with #1848.
He is requesting some testing, that I have unfortunately not been able to provide.
If you would like to give it a try, this may speed things up.
Bests,
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
It is on my to-do list, but other things had priority. Perhaps this coming week if I have the time. |
Note that we merged #1848, so in the next release (4.9) we should have a better first/last aggregator. |
Thank you, sir.
On Monday, March 7, 2022, 11:25:06 PM GMT+8, Maarten Breddels ***@***.***> wrote:
Note that we merged #1848, so in the next release (4.9) we should have a better first/last aggregator.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
Hello,
I have a dataset that looks roughly like this:
I need to group by category and then select the values for x and y that correspond to the maximum date within each group. With Pandas, I would do:
Is there a way to achieve this using Vaex?
I was able to select the minimum date within each group using:
The documentation for
vaex.agg.first
doesn't indicate exactly how to use theorder_expression
parameter so I wasn't sure if there is a way to reverse-order the date expression. Is this possible?Thank you
The text was updated successfully, but these errors were encountered: