-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Issue Using Chained Accessors with Multiple dtypes & Performance Tips #4546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
not a bug see #4531 for an example this is a chained accessor and sometimes can work depending on whether the frame has multiple dtypes and the memory layout, but is not guaranteed nor recommended syntax you should set via iloc[row,col] |
Ok, so, say for performance reasons (http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots...) I wanted to avoid doing repeated dot references to df.iloc[0]. Is there a way I can safely set df.iloc[0] to a local variable and access its contents with another accessor? |
it's not going to be the bottleneck |
I want to iterate through a data frame and make changes to a row relative to the values in the previous row. I'm hunting for ways to make the each iteration run faster. Ideally I'd use something like a rolling_apply with a window of 2, but rolling_apply only works for a single column. |
much better to create a mask and then assign a new frame put up a sample frame and what you want the final to look like and I'll show u |
Here's an example of my old code. I'm trying to make the code in the iterate function's for loop run as fast as possible. Essentially I want every C element to add itself to the product of the current A element and previous B element.
|
In my full code, the iterating loop is something that gets implemented by user defined class, so it isn't necessarily known ahead of time. It could, for example, have conditional statements that change the current row before the next iteration is run. |
This calculation is trivially vectorized
Try this one
And they do the same calculation
|
in pandas there are very very few inplace operations by default (actually only setting); this is on purpose. It is almost always faster to construct a new calculated result that to change an existing data structure. |
Yeah I see your point. Very nice. So, how would I be able to reformulate a conditional statement like this?
|
if it was not recurrent then df.loc[df['c']>1,'b'] -= 0.1 however since this looks like feedback, you might need to iterate |
Understood. Could this be a use case to warrant the development of a rolling_apply that works across entire rows? This situation (non-recurrent conditional operations that depend on previous row contents) shows up a lot for me when I try to use a dataframe as a basis for a time-dependent simulation. |
yes this could be an enhancement you might want to construct an That's the whole problem with So another option is to look at: http://pandas.pydata.org/pandas-docs/dev/enhancingperf.html also if you don't the indexing features of keep in mine optimizing your computation should be the last thing you are doing, first profile! |
Thanks for the tips! |
The below code should print 99.9, but instead prints 10.5. This bug goes away if column b in the dataframe is set to all float values instead of int values. I am running Pandas 0.12.0 and Python 2.7.3.
The text was updated successfully, but these errors were encountered: