-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to validation checks #189
Conversation
* Use asarray to avoid unnecessary copies of arrays * Lazily convert to array to conserve memory * Compare shapes directly
If we have a Series, we can avoid expensive allocation to ndarray.
@aaraney # aa = pd.Series(range(1000))
In [93]: %timeit _array_attr(aa, "ndim").ndim
425 ns ± 41.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [94]: %timeit np.asarray(aa).ndim
4.45 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
Ah, I see. That makes a lot of sense. Thanks for clarifying and providing the above example, @groutr. My only request would be adding some type hints to the function declaration. The return type hint might be difficult to nail down for all cases, so I think the straight path of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I think we just want to bump the patch version in python/metrics/src/hydrotools/metrics/_version.py
to 1.2.2
I tested these changes locally and it is indeed faster. Thanks for the contribution!
Thanks again, @groutr! |
Some minor improvements to _validation.py that center around the usage of
np.asarray
. Usingnp.asarray
overnp.array
can avoid unnecessary copies (ie, when the input is already an array).When comparing the shapes, we don't need to hold all the arrays in memory.