.iterrows takes too long and generate large memory footprint

When using df.iterrows on large data frame, it takes a long time to run and consumes huge amount of memory.

The name of the function implies that it is an iterator and should not take much to run. ~~However~~, ~~in the method it uses builtin method 'zip'~~, ~~which can sometimes generate huge temporary list of tuples if optimisation is not done correctly~~.

Below is the code which can reproduce the issue on a box with 16GB memory.

``` python
s1 = range(30000000)
s2 = np.random.randn(30000000)
ts = pd.date_range('20140101', freq='S', periods=30000000)
df = pd.DataFrame({'s1': s1, 's2': s2}, index=ts)
for r in df.iterrows():
    break # expected to return immediately, yet it takes more than 2 minutes and uses 4G memory
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

.iterrows takes too long and generate large memory footprint #7683

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

.iterrows takes too long and generate large memory footprint #7683

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions