Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize or normalize use of bounds and bounding boxes #525

Open
brendan-ward opened this issue Nov 12, 2015 · 13 comments
Open

Standardize or normalize use of bounds and bounding boxes #525

brendan-ward opened this issue Nov 12, 2015 · 13 comments
Milestone

Comments

@brendan-ward
Copy link
Contributor

Our use of bounds and bounding boxes is not altogether consistent (see comments on a recent PR).

Part of the problem is that bounds in some contexts, especially those derived from features follow the (xmin, ymin, xmax, ymax) convention.

Most other places in rasterio, e.g., the bounds property on a raster, follow rasterio's BoundingBox representation (left, bottom, right, top). This relies on the affine transform of the raster.

Depending on the direction of the y dimension, these two forms may be equivalent (where y is decreasing from upper left). In the case of y increasing from upper left (e.g., Affine.identity() or numpy indexes), these are not equivalent, and issues ensue when we don't account for this.

One approach would be to create a series of normalization functions, that we could use for interchange between functions that expect one form or another. I've started to do so, but I'm now questioning this approach.

def normalize_bounds(bbox):
    """
    Normalizes an instance of a BoundingBox (left, bottom, right, top) to
    bounds (xmin, ymin, xmax, ymax)

    Parameters
    ----------
    bbox: BoundingBox or bounds tuple

    Returns
    -------
    bounds tuple: (xmin, ymin, xmax, ymax)
    """

    return (
        min(bbox[0], bbox[2]),
        min(bbox[1], bbox[3]),
        max(bbox[0], bbox[2]),
        max(bbox[1], bbox[3])
    )


def normalize_boundingbox(bounds, affine):
    """
    Normalizes bounds (xmin, ymin, xmax, ymax) to a
    BoundingBox (left, bottom, right, top).

    Parameters
    ----------
    bounds: bounds or BoundingBox tuple
    affine: affine.Affine object

    Returns
    -------
    BoundingBox tuple: (left, bottom, right, top).
    """

    if isinstance(bounds, BoundingBox):
        return bounds

    x = [min(bounds[0], bounds[2]), max(bounds[0], bounds[2])]
    if affine.a < 0:
        x.reverse()

    y = [min(bounds[1], bounds[3]), max(bounds[1], bounds[3])]
    if affine.e < 0:
        y.reverse()

    return BoundingBox(x[0], y[0], x[1], y[1])

One downside to this approach is that the user needs to know when to use these based on the function signature or docstring of the method they are passing bounds or bounding boxes into.

Overall, this presents mostly in dealing with interactions between features and rasters, so we can deal with more of the conversions internally in those cases.

@sgillies @perrygeo thoughts?

@perrygeo
Copy link
Contributor

@brendan-ward That's an excellent framing of the problem for inverted y-axis rasters! I'll take a deeper look next week but for now, initial thought is that we might be able to handle this is an implementation detail within base.DatasetReader methods (window_bounds, ul, bounds, get_window, etc) so that users wouldn't need to deal with them explicitly.

@perrygeo
Copy link
Contributor

Side note: I'm working on a project right now that uses positive y-axis rasters and I'm documenting some of difficulties in this gist.

@perrygeo perrygeo added this to the 1.0 milestone Mar 7, 2016
@perrygeo
Copy link
Contributor

perrygeo commented Apr 4, 2016

Reviving this idea, and TBH I'm a little confused, maybe due to terminology or maybe due to the current implementation. My confusion stems from

 >>> src.bounds
BoundingBox(left=101985.0, bottom=2611485.0, right=339315.0, top=2826915.0)

So the dataset "bounds" returns a "BoundingBox" in bounds (xmin, ymin, xmax, ymax) order. 😕

Maybe we need more distinct terminology and variable names to match?

Seems like the toughest problem is knowing when to use bounds vs boundingbox - to truly support south-up (postiive affine.e) rasters we'd need to sweep the codebase for all uses of the bounds-like objects, determine if they were intended to use bounds or BoundingBox, and normalize as appropriate. Some of those assumptions might go down into the GDAL C api too.

Another things I'd like to do here is start building a test suite of expected behaviors with positive affine.e raster.

@brendan-ward what are your thoughts on getting this in 1.0. It seems like a big lift but it might be an API-changing deal so it might be better to tackle sooner than later. Maybe there are smaller steps we can take towards this for 1.0, leaving the bulk of the work to post-1.0?

@brendan-ward
Copy link
Contributor Author

@perrygeo I think this is a good idea to cleanup pre 1.0, but not sure how to go about it. I've been confused and frustrated by the differences between them in the past and would like to save users from the same.

I think that bounds in the standard xmin, ymin, xmax, ymax are easy to deal with in many places; they align nicely with the general concept of a bounding box. It is the translation of those to the terms left, bottom, right, top that requires the affine or at least knowing the sign of y. I'm not sure why the latter was used for src.bounds - perhaps it was an oversight due to them matching for y-negative rasters?

Are you thinking of going as deep as changing the return from src.bounds?

@perrygeo
Copy link
Contributor

perrygeo commented Apr 5, 2016

I'm with you, I've definitely struggled with inverted rasters and continue to do so every time I encounter them. I don't think we should do anything as drastic as change src.bounds behavior but...

Big picture, my goal would be to just use any raster regardless of the sign of affine.e without having to make special considerations.

I think the reality is that an assumption of affine.e < 0 is deeply embedded in much of our software stack. Origin is the upper-left; that's almost burned into my mental model of how rasters work. To fully cleanse the code of that assumption would be a big effort and I personally haven't been up to the task.

Maybe shorter term, we just need to document what does and does not currently work with inverted y rasters as a first step towards clarifying our thoughts.

@sgillies
Copy link
Member

sgillies commented Apr 5, 2016

I'm confused by the discussion and unsure of what the problem and solutions are.

If you're examining a dataset that has a "north-oriented" CRS, the Y coordinate in that space decreases from the upper left while the numpy array index increases. These two coordinate systems are flipped with respect to each other and that's just how it is. Better documentation and abstractions are good, but the difference between the coordinate systems isn't a problem to solve.

Is there a problem with "south-oriented" datasets? I never see these in practice, but let's get one of these into our test fixtures, see what it breaks, and work out how to deal with it.

@perrygeo
Copy link
Contributor

perrygeo commented Apr 5, 2016

It's not a matter of the CRS, rather the interpretation of the transform when the affine.e is > 0. They're not common but I've seen them mainly is weather/climate data distributed at NetCDFs

When we grab the bounds, they are in WNES order.

$ rio info --bounds inverted-pos-y.tif
-180.0 90.0 180.0 -90.0

Lots of code assumes that bounds are WSEN. For example, with S/N flipped...

$ rio merge inverted-pos-y.tif /tmp/foo2.tif
Traceback (most recent call last):
  ...
  File "/Users/mperry/env/smos/lib/python2.7/site-packages/rasterio/tools/merge.py", line 96, in merge
    dest = np.zeros((first.count, output_height, output_width), dtype=dtype)
ValueError: negative dimensions are not allowed

That's just one example, the assumption that the row 0, column 0 is the north-west corner pops up all over the place. Having a handle on where things fail is the first step - I'll add that test raster and start building a test suite for it.

@sgillies
Copy link
Member

sgillies commented Apr 5, 2016

For reference: http://www.gdal.org/frmt_netcdf.html.

There is no universal way of storing georeferencing in NetCDF files.

Wonderful.

The driver first tries to follow the CF-1 Convention from UNIDATA looking for the Metadata named "grid_mapping". If "grid_mapping" is not present, the driver will try to find an lat/lon grid array to set geotransform array. The NetCDF driver verifies that the Lat/Lon array is equally space.

If those 2 methods fail, NetCDF driver will try to read the following metadata directly and set up georeferencing.

  • spatial_ref (Well Known Text)
  • GeoTransform (GeoTransform array)

or,

  • Northernmost_Northing
  • Southernmost_Northing
  • Easternmost_Easting
  • Westernmost_Easting

@perrygeo I'm curious about what's going on in these NetCDF files with the flipped transforms. Is the georeferencing underspecified? Can this config option (bottom of the driver page) help?

GDAL_NETCDF_BOTTOMUP=[YES/NO] : Set the y-axis order for import, overriding the order detected by the driver. This option is usually not needed unless a specific dataset is causing problems (which should be reported in GDAL trac).

(Hashtag GDAL config option)

@perrygeo
Copy link
Contributor

perrygeo commented Apr 5, 2016

Yeah, NETCDFs have their own issues (I forgot to mention https://github.com/perrygeo/ncvrt which I created to deal with them). Let's put that format aside for now.

The issue I'm trying to isolate here is problems that arise from having a positive affine.e value. Technically a 100 x 100 geotiff with a

Affine(5.0, 0.0, 1000,
       0.0, -5.0, 1000.0)

If you read it as an array, flipped the bands (np.flipud) and wrote them to a GeoTIFF with an affine of

Affine(5.0, 0.0, 1000,
       0.0, 5.0, 500.0)

.
They should be valid and spatially aligned. But many analyses with the flipped geotiff will fail simply because affine.e > 0 which violates implicit assumptions of a top-left origin in much bounds-handling code. I just need to spent a little time to identify where those assumptions are.

@perrygeo
Copy link
Contributor

perrygeo commented Apr 6, 2016

I wrote a flip.py script to test this out.

python flip.py .tests/data/RGB.byte.tif RGB.flip.tif

No unit tests yet but just a high-level overview of what rio commands work with an inverted raster

Command Result Notes
rio bounds both CW, WSEN order
rio calc
rio clip X error about invalid bounds
rio convert
rio info --bounds X WNES order
rio info --res X Y res is negative if affine.e > 0
rio mask This works because the case is explicitly handled
rio merge X ValueError: negative dimensions are not allowed
rio overview
rio rasterize X "GeoJSON outside bounds of --like raster"
rio sample
rio shapes --as-mask
rio stack X flipped when stacked with mixed affine.e, assumed origin UL
rio warp *output changed to affine.e < 0 (but correct nonetheless)

I should note that QGIS does a great job of rendering inverted Y rasters properly.

@perrygeo
Copy link
Contributor

perrygeo commented Apr 6, 2016

And on the DatasetReader, the only problematic methods seem to be res and bounds for the reasons listed above (see rio info).

Changing these to be correct for inverted rasters may have implications and break other things. For instance, mask handles the inverted case explicitly and relies on the current behavior of bounds.

@brendan-ward
Copy link
Contributor Author

@perrygeo presumably if this was standardized upstream, then we could remove the specific handling in rio mask.

Thank you for the thorough testing of rio commands here!

@perrygeo
Copy link
Contributor

I'm gonna punt on this for 1.0 - however we decide to deal with positive affine.e rasters it will not break the API so it can be added post-1.0. For now, this ticket can serve to document the known bugs with inverted y datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants