Standardize or normalize use of bounds and bounding boxes #525

brendan-ward · 2015-11-12T16:22:27Z

Our use of bounds and bounding boxes is not altogether consistent (see comments on a recent PR).

Part of the problem is that bounds in some contexts, especially those derived from features follow the (xmin, ymin, xmax, ymax) convention.

Most other places in rasterio, e.g., the bounds property on a raster, follow rasterio's BoundingBox representation (left, bottom, right, top). This relies on the affine transform of the raster.

Depending on the direction of the y dimension, these two forms may be equivalent (where y is decreasing from upper left). In the case of y increasing from upper left (e.g., Affine.identity() or numpy indexes), these are not equivalent, and issues ensue when we don't account for this.

One approach would be to create a series of normalization functions, that we could use for interchange between functions that expect one form or another. I've started to do so, but I'm now questioning this approach.

def normalize_bounds(bbox):
    """
    Normalizes an instance of a BoundingBox (left, bottom, right, top) to
    bounds (xmin, ymin, xmax, ymax)

    Parameters
    ----------
    bbox: BoundingBox or bounds tuple

    Returns
    -------
    bounds tuple: (xmin, ymin, xmax, ymax)
    """

    return (
        min(bbox[0], bbox[2]),
        min(bbox[1], bbox[3]),
        max(bbox[0], bbox[2]),
        max(bbox[1], bbox[3])
    )


def normalize_boundingbox(bounds, affine):
    """
    Normalizes bounds (xmin, ymin, xmax, ymax) to a
    BoundingBox (left, bottom, right, top).

    Parameters
    ----------
    bounds: bounds or BoundingBox tuple
    affine: affine.Affine object

    Returns
    -------
    BoundingBox tuple: (left, bottom, right, top).
    """

    if isinstance(bounds, BoundingBox):
        return bounds

    x = [min(bounds[0], bounds[2]), max(bounds[0], bounds[2])]
    if affine.a < 0:
        x.reverse()

    y = [min(bounds[1], bounds[3]), max(bounds[1], bounds[3])]
    if affine.e < 0:
        y.reverse()

    return BoundingBox(x[0], y[0], x[1], y[1])

One downside to this approach is that the user needs to know when to use these based on the function signature or docstring of the method they are passing bounds or bounding boxes into.

Overall, this presents mostly in dealing with interactions between features and rasters, so we can deal with more of the conversions internally in those cases.

@sgillies @perrygeo thoughts?

The text was updated successfully, but these errors were encountered:

perrygeo · 2015-11-12T16:47:11Z

@brendan-ward That's an excellent framing of the problem for inverted y-axis rasters! I'll take a deeper look next week but for now, initial thought is that we might be able to handle this is an implementation detail within base.DatasetReader methods (window_bounds, ul, bounds, get_window, etc) so that users wouldn't need to deal with them explicitly.

perrygeo · 2015-11-16T10:56:19Z

Side note: I'm working on a project right now that uses positive y-axis rasters and I'm documenting some of difficulties in this gist.

perrygeo · 2016-04-04T23:58:58Z

Reviving this idea, and TBH I'm a little confused, maybe due to terminology or maybe due to the current implementation. My confusion stems from

 >>> src.bounds
BoundingBox(left=101985.0, bottom=2611485.0, right=339315.0, top=2826915.0)

So the dataset "bounds" returns a "BoundingBox" in bounds (xmin, ymin, xmax, ymax) order. 😕

Maybe we need more distinct terminology and variable names to match?

Seems like the toughest problem is knowing when to use bounds vs boundingbox - to truly support south-up (postiive affine.e) rasters we'd need to sweep the codebase for all uses of the bounds-like objects, determine if they were intended to use bounds or BoundingBox, and normalize as appropriate. Some of those assumptions might go down into the GDAL C api too.

Another things I'd like to do here is start building a test suite of expected behaviors with positive affine.e raster.

@brendan-ward what are your thoughts on getting this in 1.0. It seems like a big lift but it might be an API-changing deal so it might be better to tackle sooner than later. Maybe there are smaller steps we can take towards this for 1.0, leaving the bulk of the work to post-1.0?

brendan-ward · 2016-04-05T15:06:55Z

@perrygeo I think this is a good idea to cleanup pre 1.0, but not sure how to go about it. I've been confused and frustrated by the differences between them in the past and would like to save users from the same.

I think that bounds in the standard xmin, ymin, xmax, ymax are easy to deal with in many places; they align nicely with the general concept of a bounding box. It is the translation of those to the terms left, bottom, right, top that requires the affine or at least knowing the sign of y. I'm not sure why the latter was used for src.bounds - perhaps it was an oversight due to them matching for y-negative rasters?

Are you thinking of going as deep as changing the return from src.bounds?

perrygeo · 2016-04-05T15:54:03Z

I'm with you, I've definitely struggled with inverted rasters and continue to do so every time I encounter them. I don't think we should do anything as drastic as change src.bounds behavior but...

Big picture, my goal would be to just use any raster regardless of the sign of affine.e without having to make special considerations.

I think the reality is that an assumption of affine.e < 0 is deeply embedded in much of our software stack. Origin is the upper-left; that's almost burned into my mental model of how rasters work. To fully cleanse the code of that assumption would be a big effort and I personally haven't been up to the task.

Maybe shorter term, we just need to document what does and does not currently work with inverted y rasters as a first step towards clarifying our thoughts.

sgillies · 2016-04-05T17:26:06Z

I'm confused by the discussion and unsure of what the problem and solutions are.

If you're examining a dataset that has a "north-oriented" CRS, the Y coordinate in that space decreases from the upper left while the numpy array index increases. These two coordinate systems are flipped with respect to each other and that's just how it is. Better documentation and abstractions are good, but the difference between the coordinate systems isn't a problem to solve.

Is there a problem with "south-oriented" datasets? I never see these in practice, but let's get one of these into our test fixtures, see what it breaks, and work out how to deal with it.

perrygeo · 2016-04-05T17:57:24Z

It's not a matter of the CRS, rather the interpretation of the transform when the affine.e is > 0. They're not common but I've seen them mainly is weather/climate data distributed at NetCDFs

When we grab the bounds, they are in WNES order.

$ rio info --bounds inverted-pos-y.tif
-180.0 90.0 180.0 -90.0

Lots of code assumes that bounds are WSEN. For example, with S/N flipped...

$ rio merge inverted-pos-y.tif /tmp/foo2.tif
Traceback (most recent call last):
  ...
  File "/Users/mperry/env/smos/lib/python2.7/site-packages/rasterio/tools/merge.py", line 96, in merge
    dest = np.zeros((first.count, output_height, output_width), dtype=dtype)
ValueError: negative dimensions are not allowed

That's just one example, the assumption that the row 0, column 0 is the north-west corner pops up all over the place. Having a handle on where things fail is the first step - I'll add that test raster and start building a test suite for it.

sgillies · 2016-04-05T20:32:10Z

For reference: http://www.gdal.org/frmt_netcdf.html.

There is no universal way of storing georeferencing in NetCDF files.

Wonderful.

The driver first tries to follow the CF-1 Convention from UNIDATA looking for the Metadata named "grid_mapping". If "grid_mapping" is not present, the driver will try to find an lat/lon grid array to set geotransform array. The NetCDF driver verifies that the Lat/Lon array is equally space.

If those 2 methods fail, NetCDF driver will try to read the following metadata directly and set up georeferencing.

spatial_ref (Well Known Text)

GeoTransform (GeoTransform array)

or,

Northernmost_Northing

Southernmost_Northing

Easternmost_Easting

Westernmost_Easting

@perrygeo I'm curious about what's going on in these NetCDF files with the flipped transforms. Is the georeferencing underspecified? Can this config option (bottom of the driver page) help?

GDAL_NETCDF_BOTTOMUP=[YES/NO] : Set the y-axis order for import, overriding the order detected by the driver. This option is usually not needed unless a specific dataset is causing problems (which should be reported in GDAL trac).

(Hashtag GDAL config option)

perrygeo · 2016-04-05T20:58:04Z

Yeah, NETCDFs have their own issues (I forgot to mention https://github.com/perrygeo/ncvrt which I created to deal with them). Let's put that format aside for now.

The issue I'm trying to isolate here is problems that arise from having a positive affine.e value. Technically a 100 x 100 geotiff with a

Affine(5.0, 0.0, 1000,
       0.0, -5.0, 1000.0)

If you read it as an array, flipped the bands (np.flipud) and wrote them to a GeoTIFF with an affine of

Affine(5.0, 0.0, 1000,
       0.0, 5.0, 500.0)

.
They should be valid and spatially aligned. But many analyses with the flipped geotiff will fail simply because affine.e > 0 which violates implicit assumptions of a top-left origin in much bounds-handling code. I just need to spent a little time to identify where those assumptions are.

perrygeo · 2016-04-06T13:21:27Z

I wrote a flip.py script to test this out.

python flip.py .tests/data/RGB.byte.tif RGB.flip.tif

No unit tests yet but just a high-level overview of what rio commands work with an inverted raster

Command	Result	Notes
rio bounds	√	both CW, WSEN order
rio calc	√
rio clip	X	error about invalid bounds
rio convert	√
rio info --bounds	X	WNES order
rio info --res	X	Y res is negative if affine.e > 0
rio mask	√	This works because the case is explicitly handled
rio merge	X	ValueError: negative dimensions are not allowed
rio overview	√
rio rasterize	X	"GeoJSON outside bounds of --like raster"
rio sample	√
rio shapes --as-mask	√
rio stack	X	flipped when stacked with mixed affine.e, assumed origin UL
rio warp	√	*output changed to affine.e < 0 (but correct nonetheless)

I should note that QGIS does a great job of rendering inverted Y rasters properly.

perrygeo · 2016-04-06T13:35:26Z

And on the DatasetReader, the only problematic methods seem to be res and bounds for the reasons listed above (see rio info).

Changing these to be correct for inverted rasters may have implications and break other things. For instance, mask handles the inverted case explicitly and relies on the current behavior of bounds.

brendan-ward · 2016-04-06T15:47:16Z

@perrygeo presumably if this was standardized upstream, then we could remove the specific handling in rio mask.

Thank you for the thorough testing of rio commands here!

perrygeo · 2016-06-13T16:00:01Z

I'm gonna punt on this for 1.0 - however we decide to deal with positive affine.e rasters it will not break the API so it can be added post-1.0. For now, this ticket can serve to document the known bugs with inverted y datasets.

brendan-ward added the enhancement label Nov 12, 2015

perrygeo added this to the 1.0 milestone Mar 7, 2016

This was referenced Apr 18, 2016

ENH: Optional Read-Only RasterIO backend pydata/xarray#790

Closed

Problem with reproject and identity affine #674

Closed

perrygeo modified the milestones: post 1.0, 1.0 Jun 13, 2016

jdmcbr mentioned this issue Aug 15, 2016

add a rasterio.clip function? #848

Open

ogail mentioned this issue Jun 14, 2017

Train MS COCO error zhreshold/mxnet-ssd#90

Closed

perrygeo mentioned this issue Jun 28, 2017

Float precision read/write windows #1074

Merged

j08lue mentioned this issue Dec 7, 2017

WIP:use non-square test image #1226

Closed

dorukozturk mentioned this issue Jan 11, 2018

Raster bbox OpenDataAnalytics/gaia#112

Merged

groutr mentioned this issue May 3, 2021

disjoint_bounds incorrect behavior #2165

Open

snowman2 modified the milestones: post 1.0, post-1.3 Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize or normalize use of bounds and bounding boxes #525

Standardize or normalize use of bounds and bounding boxes #525

brendan-ward commented Nov 12, 2015

perrygeo commented Nov 12, 2015

perrygeo commented Nov 16, 2015

perrygeo commented Apr 4, 2016

brendan-ward commented Apr 5, 2016

perrygeo commented Apr 5, 2016

sgillies commented Apr 5, 2016

perrygeo commented Apr 5, 2016

sgillies commented Apr 5, 2016

perrygeo commented Apr 5, 2016

perrygeo commented Apr 6, 2016

perrygeo commented Apr 6, 2016

brendan-ward commented Apr 6, 2016

perrygeo commented Jun 13, 2016

Standardize or normalize use of bounds and bounding boxes #525

Standardize or normalize use of bounds and bounding boxes #525

Comments

brendan-ward commented Nov 12, 2015

perrygeo commented Nov 12, 2015

perrygeo commented Nov 16, 2015

perrygeo commented Apr 4, 2016

brendan-ward commented Apr 5, 2016

perrygeo commented Apr 5, 2016

sgillies commented Apr 5, 2016

perrygeo commented Apr 5, 2016

sgillies commented Apr 5, 2016

perrygeo commented Apr 5, 2016

perrygeo commented Apr 6, 2016

perrygeo commented Apr 6, 2016

brendan-ward commented Apr 6, 2016

perrygeo commented Jun 13, 2016