-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pixel sampling mode #294
Conversation
Co-authored-by: Ashwin Nair <ash1995@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks okay to me, just had some questions about the API and ways that we can make things more clear.
Just realized that a lot of our documentation uses |
The |
The enum module was added in Python 3.4 |
@calebrob6 @isaaccorley (and anyone else): this is a pretty important design change so I would like to get your opinions on this. DesignTo the best of my knowledge, my original thought process for the sampler design (and why I chose CRS size instead of pixel size) was as follows:
In hindsight, 2) is not necessarily true. Because the index itself is largely abstracted from the user, the user doesn't necessarily think about the fact that the index is in CRS units and that the size may also be in CRS units. Actually, 1) isn't necessarily true either. We could compute the bounds of the image in pixel coordinates relative to the origin and store than in the rtree, then convert those back to relative pixel coords (relative to a particular image) during indexing. But that seems way harder than it needs to be. In fact, despite being the person who designed this, I myself have made the mistake of thinking that DiscussionThe solution proposed in this PR seems very logical and straightforward. I think we only have a couple remaining decisions to make:
Based on the fact that my original design used CRS units, I'm leaning towards supporting both. I think for computer scientists, pixel units make much more sense. However, for remote sensing scientists, I think CRS units might be more intuitive. One of the advantages of pixel units over CRS units is that you don't need to think too hard about ensuring that your size is an integer multiple of the CRS units. If my CRS uses 30 m/pixel units and I ask for Are there any corner cases where either CRS units or pixel units could be problematic? My biggest concern is w.r.t. #278. If we want to avoid reprojecting images in a dataset (like how @calebrob6 does in the ChesapeakeCVPR dataset), we start to hit problems where the resulting images are not necessarily the exact same size. @calebrob6 what was the specific problem you ran into again, was it that the bounding box you warp ends up rotated and so the bounding box of that bounding box has a different size? If we fix #278 and only reproject when absolutely necessary, we may end up with problems with CRS units or pixel units. I want to ensure that the image we sample is always the same pixel dimensions, even if that means the spatial distance changes. |
I think so -- one reasonable example I can think of is a RasterDataset made up of tiles that have different pixel resolutions. A user may want to specify 1km crops (e.g. sampling in CRS units) then have a transform step that resamples the resulting images to a fixed size.
Pixel units is more natural to me, but I can see the argument for either. |
|
@calebrob6 how's that? Not sure the best way to do the docs. |
Looks good -- I can finish off the rest of this too (the above are basically notes from a conversation with Adam). We think that it is okay to merge this with the added documentation, then work on how GeoDataset / RasterDataset should be indexed / how the samplers should interact with them (see #409). |
This will also need unit tests to ensure that |
@adamjstewart Does anything have to change in the index for this to work? I think I'll need help writing this test out. |
Changes in last commit:
Remaining things I'll work on now:
For |
2b025a2
to
3104706
Compare
Hmm, possible enum values don't seem to show up in the docs. Not sure what to do about that, maybe a bug in Sphinx? Anyway, I think this looks good to me now. |
* Add pixel sampling mode * Fix maxy indexing error Co-authored-by: Ashwin Nair <ash1995@gmail.com> * Add sample_mode docstrings, default to PIXELS * Replace sample_mode with units * Update to use enum * Fix mypy, tuple, and flake8 issues * Fix isort and pydocstyle problems * Update sampler docs to discuss unit sampling mode * Various fixes * Add units arg to GridGeoSampler * Update benchmark script * Add tests * Document enum values * mypy fixes Co-authored-by: Ashwin Nair <ash1995@gmail.com> Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
This would still need updated docs, benchmarks, tests, etc., but let's verify functionality first. Resolves #279