-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cumulative calculation based on previous row's result #7658
Comments
This would be much slower than a python loop. As it must run every single element on our expression interpreter. This is optimized for columns, not single elements. |
@ritchie46 Thanks for the feedback. Could it really be that much slower than looping in Python? As an experiment - I attempted to modify https://gist.github.com/cmdlineluser/f9d657bdf04ba64e2ab9a63cdefa2dc4 With the df from earlier enlarged to
The rust version takes
Python for loop takes
The gap continues to widen as the number of rows increases. I used values in the struct as placeholders:
I tried to return a struct of |
Experimenting with this further, it seems like creating a single row of structs is a decently fast operation:
We can then
It runs in Maybe it can be improved? |
I am pleased that you asked this issue here, @cmdlineluser. I am the person that posted that original question at StackOverflow. |
@buckleyc Hello there! Did you happen to try the Edit: It appears that I wasn't testing with large enough data. The |
@cmdlineluser, I finally got around to testing usability of this method, and it is daunting. The issue quickly becomes the complexity/difficulty of trying to shove more than a simple financial calculation within the .cum_fold(). For this financial simulation, I can pre-calculate various columns across the life of a sample timeline (e.g., cumulative inflation, cumulative cost of living adj, potential period expenses). Polars is great for this.
Any ideas how best to approach this complexity within Polars, or is this level of row complexity best left to a different tool? |
@buckleyc Yeah, apologies. It was perhaps improper of me to make that suggestion as it's not a actual solution to the problem, and could probably be considered a misuse of polars syntax. "preoccupied with whether they could, they didn't stop to think if they should" - as they say. I thought a generalized "accumulate" type function (similar to |
Hello @buckleyc @cmdlineluser I have be digging around for something like this also. Unfortunately @ritchie46 already pinged w/ discouraging thoughts. But I will leave my use case and comments here in case this changes. Feature requested:
My use case:
Originally posted here https://discord.com/channels/908022250106667068/911186243465904178/1138871006065344522 |
Problem description
Taken from this stackoverflow question.
I'm not sure if there is an actual name for this:
"Where a row-based calculation depends on the result from the previous row."
Basically when you need to resort to a for loop in Python:
It sort of looks like a
.cumfold
operation - but then then result is also brought forward to the next row.Would it be possible to encode this behaviour in polars to essentially have the for loop executed in rust?
Hypothetical syntax:
Where you could choose a target for the pre-calculation result, post-calculation, and then define a "formula" for the calculation itself.
pl.element()
is used here as a placeholder for the "running total" value.Thanks.
The text was updated successfully, but these errors were encountered: