Refactor of features tests #522

brendan-ward · 2015-11-09T05:07:09Z

This PR involves a major refactor of the tests for features.py and rio/features.py.

Goals:

Make tests as small, simple, and fast as possible
Use pytest fixtures for repeated inputs
Increase test coverage

Along the way, I dropped WKB support from _features.pyx (wasn't used) and moved as much of the validation logic for shapes and sieve as possible from the Python functions (wrappers) to the base Cython implementations to keep these more tightly encapsulated. rasterize does too much in the Python layer to easily move into the Cython implementation. I also refactored the validation logic for data types used by rasterize into function in dtypes to enhance accessibility and testability.

I also consolidated the several individual files with tests for various parts of features into a single test file; makes it somewhat easier to manage when there is one test file to one source file.

I used pytest fixtures heavily to use very small test images and files; this contributed a significant speed increase to the test suite - toward #450.

I found and fixed a few isolated bugs along the way.

I made a couple things a bit more strict:

bounds does not return None on exception, it now just passes that exception through.
rasterize no longer ignores invalid shapes; they will raise an exception instead.

Note: the decrease in coverage is due to mis-reporting of coverage in _features.pyx and rio/features.py files; much more of the functionality is actually covered than is indicated here. I believe that actual coverage has actually increased over the prior implementation.

I've found awkward behavior with coverage that is proving hard to explain especially for the CLI tests; if I run the same call to the click test runner 2x in the same test function, I get coverage but if I run it only once, I don't. This is partly why coverage was recorded higher previously, because the test functions before did everything and the kitchen sink in a single function.

brendan-ward · 2015-11-09T19:06:23Z

In case it matters, I believe that I've isolated the issues with coverage for rio/features.py being due to interactions between click and coverage, since I'm able to reproduce the behavior without using pytest (so we can't blame it on pytest fixtures importing things before coverage is ready).

Still digging in on that, which may take a while. I'd prefer to deal with the issues related to coverage in a separate PR to keep this one from growing too much more.

sgillies · 2015-11-09T22:25:30Z

I'm 👍 on merging this despite the coverage dip. Agree or disagree @perrygeo?

I'd like to run a few more tests locally before I do.

brendan-ward · 2015-11-09T22:45:20Z

@sgillies suggestions to test by hand, since they were changes from the previous implementation:

any suitably large, complex geojson that may have contained invalid geometries and been silently ignored by rasterize. They will now fail with exceptions and could break people's data pipelines. My view is that cleaning geometry belongs elsewhere, and we should insist on clean geometry.
any case where you might have been relying on bounds returning None, also within any processing pipelines.

perrygeo · 2015-11-09T23:03:48Z

rasterio/dtypes.py

+        return numpy.allclose(values, values.astype(dtype))
+
+    else:
+        return numpy.array_equal(values, values.astype(dtype))


I wonder if there a more efficient way to test cast safety without making a copy and doing an element-wise comparison? I've never used it but http://docs.scipy.org/doc/numpy/reference/generated/numpy.can_cast.html looks like it might do what we need here.

@perrygeo It doesn't work against values in an array:
numpy.can_cast(numpy.array([1,2,3]), numpy.int8) ==> False

using other values for casting can produce undesirable results.

numpy.can_cast(numpy.array([1,2,3, 65564]), numpy.int8, casting='same_kind') ==> True

Of course, we can use it in a loop and give it a single scalar value at a time, because then it actually checks values:

[numpy.can_cast(numpy.array(v), numpy.int8) for v in [1, 2, 3]] ==> [True, True, True] but that is less efficient than what we are doing now.

Unfortunately less useful than it's name suggests.

bummer. Well let's stick to this method for now but keep an eye out for more efficient ways (possibly test array.min and array.max ?)

@perrygeo I thought about that, but I think it would get confused by 1, 2, 3.4, 4.1, 5. I can't see a way around an approach that avoids evaluating each value.

If this becomes a performance bottleneck, we can always move this over to Cython. I suspect this is low on our list of performance worst offenders though...

perrygeo · 2015-11-09T23:30:45Z

I like it. The tests are much cleaner and faster. Overall it seems more robust and I'm always a fan of the fail-fast strategy.

I want to do some integration testing with some real data™, particularly in the context of rasterstats since it relies heavily on rasterize. I'll report back tomorrow with some results.

perrygeo · 2015-11-10T14:57:08Z

I've tested it against a few datasets and it works as expected; same results with valid geojsons, now raises an exception on invalid features which I think is a good thing. All the rasterstats tests pass.

Sticking with the current can_cast_dtype and removing the inverted raster logic sounds good to me.

👍 on merging.

Refactor of features tests

sgillies · 2015-11-10T21:21:37Z

Old tests pass for me. That was my sanity check.

Refactor of features tests

15772d8

sgillies added this to the pre 1.0 milestone Nov 9, 2015

perrygeo reviewed Nov 9, 2015
View reviewed changes

sgillies added a commit that referenced this pull request Nov 10, 2015

Merge pull request #522 from brendan-ward/overhaul_features_tests

da93f2b

Refactor of features tests

sgillies merged commit da93f2b into rasterio:master Nov 10, 2015

brendan-ward deleted the overhaul_features_tests branch November 10, 2015 21:26

sgillies modified the milestones: pre 1.0, 0.30 Nov 11, 2015

This was referenced Nov 12, 2015

Standardize or normalize use of bounds and bounding boxes #525

Open

Test coverage is mis-reported for rio modules #526

Closed

Add test coverage for Cython modules #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of features tests #522

Refactor of features tests #522

brendan-ward commented Nov 9, 2015

brendan-ward commented Nov 9, 2015

sgillies commented Nov 9, 2015

brendan-ward commented Nov 9, 2015

perrygeo Nov 9, 2015

brendan-ward Nov 10, 2015

perrygeo Nov 10, 2015

brendan-ward Nov 10, 2015

perrygeo commented Nov 9, 2015

perrygeo commented Nov 10, 2015

sgillies commented Nov 10, 2015

Refactor of features tests #522

Refactor of features tests #522

Conversation

brendan-ward commented Nov 9, 2015

brendan-ward commented Nov 9, 2015

sgillies commented Nov 9, 2015

brendan-ward commented Nov 9, 2015

perrygeo Nov 9, 2015

Choose a reason for hiding this comment

brendan-ward Nov 10, 2015

Choose a reason for hiding this comment

perrygeo Nov 10, 2015

Choose a reason for hiding this comment

brendan-ward Nov 10, 2015

Choose a reason for hiding this comment

perrygeo commented Nov 9, 2015

perrygeo commented Nov 10, 2015

sgillies commented Nov 10, 2015