Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedups for calculate area #93

Closed
w-k-jones opened this issue Mar 21, 2022 · 3 comments
Closed

Speedups for calculate area #93

w-k-jones opened this issue Mar 21, 2022 · 3 comments
Assignees
Labels
enhancement Addition of new features, or improved functionality of existing features
Milestone

Comments

@w-k-jones
Copy link
Member

The function calculate area in tobac.analysis is quite slow for large segmentation masks due to looping over each feature individually. This would be a good candidate for improving performance by vectorizing this loop using scipy.labelled_comprehension or scipy.binned_statistic

@w-k-jones w-k-jones added this to the Version 1.3 milestone Mar 21, 2022
@w-k-jones w-k-jones self-assigned this Mar 21, 2022
@w-k-jones w-k-jones added the enhancement Addition of new features, or improved functionality of existing features label Mar 21, 2022
@w-k-jones
Copy link
Member Author

It turns out that the majority of the time it calculate_area is spend adding a column to the dataframe with the results. I guess this is good motivation to move to using xarray instead...

@freemansw1
Copy link
Member

Interesting. That's similar to what we saw in the tracking module. You may be able to get good performance by constructing the column first as a dict/list/'numpy array and then adding the column in as one vectorized operation. That has generally worked well for us.

@w-k-jones
Copy link
Member Author

Turns out I was wrong, and in fact I'd just not properly reloaded my changes due to relative imports and it just appeared that way, when in reality it was still running the old, slow code. Turns out it doesn't actually take 6 seconds to add a new array from a numpy array of values...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Addition of new features, or improved functionality of existing features
Projects
None yet
Development

No branches or pull requests

2 participants