-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor of features tests #522
Refactor of features tests #522
Conversation
In case it matters, I believe that I've isolated the issues with coverage for Still digging in on that, which may take a while. I'd prefer to deal with the issues related to coverage in a separate PR to keep this one from growing too much more. |
I'm 👍 on merging this despite the coverage dip. Agree or disagree @perrygeo? I'd like to run a few more tests locally before I do. |
@sgillies suggestions to test by hand, since they were changes from the previous implementation:
|
return numpy.allclose(values, values.astype(dtype)) | ||
|
||
else: | ||
return numpy.array_equal(values, values.astype(dtype)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there a more efficient way to test cast safety without making a copy and doing an element-wise comparison? I've never used it but http://docs.scipy.org/doc/numpy/reference/generated/numpy.can_cast.html looks like it might do what we need here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perrygeo It doesn't work against values in an array:
numpy.can_cast(numpy.array([1,2,3]), numpy.int8)
==> False
using other values for casting
can produce undesirable results.
numpy.can_cast(numpy.array([1,2,3, 65564]), numpy.int8, casting='same_kind')
==> True
Of course, we can use it in a loop and give it a single scalar value at a time, because then it actually checks values:
[numpy.can_cast(numpy.array(v), numpy.int8) for v in [1, 2, 3]]
==> [True, True, True]
but that is less efficient than what we are doing now.
Unfortunately less useful than it's name suggests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bummer. Well let's stick to this method for now but keep an eye out for more efficient ways (possibly test array.min and array.max ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perrygeo I thought about that, but I think it would get confused by 1, 2, 3.4, 4.1, 5
. I can't see a way around an approach that avoids evaluating each value.
If this becomes a performance bottleneck, we can always move this over to Cython. I suspect this is low on our list of performance worst offenders though...
I like it. The tests are much cleaner and faster. Overall it seems more robust and I'm always a fan of the fail-fast strategy. I want to do some integration testing with some real data™, particularly in the context of |
I've tested it against a few datasets and it works as expected; same results with valid geojsons, now raises an exception on invalid features which I think is a good thing. All the Sticking with the current 👍 on merging. |
Old tests pass for me. That was my sanity check. |
This PR involves a major refactor of the tests for
features.py
andrio/features.py
.Goals:
Along the way, I dropped WKB support from
_features.pyx
(wasn't used) and moved as much of the validation logic forshapes
andsieve
as possible from the Python functions (wrappers) to the base Cython implementations to keep these more tightly encapsulated.rasterize
does too much in the Python layer to easily move into the Cython implementation. I also refactored the validation logic for data types used by rasterize into function indtypes
to enhance accessibility and testability.I also consolidated the several individual files with tests for various parts of
features
into a single test file; makes it somewhat easier to manage when there is one test file to one source file.I used pytest fixtures heavily to use very small test images and files; this contributed a significant speed increase to the test suite - toward #450.
I found and fixed a few isolated bugs along the way.
I made a couple things a bit more strict:
bounds
does not returnNone
on exception, it now just passes that exception through.rasterize
no longer ignores invalid shapes; they will raise an exception instead.Note: the decrease in coverage is due to mis-reporting of coverage in
_features.pyx
andrio/features.py
files; much more of the functionality is actually covered than is indicated here. I believe that actual coverage has actually increased over the prior implementation.I've found awkward behavior with
coverage
that is proving hard to explain especially for the CLI tests; if I run the same call to theclick
test runner 2x in the same test function, I get coverage but if I run it only once, I don't. This is partly why coverage was recorded higher previously, because the test functions before did everything and the kitchen sink in a single function.