Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treatment of AREA_OR_POINT #805

Closed
DahnJ opened this issue Oct 2, 2024 · 8 comments · Fixed by #811
Closed

Treatment of AREA_OR_POINT #805

DahnJ opened this issue Oct 2, 2024 · 8 comments · Fixed by #811
Labels
documentation Documentation related issue question Further information is requested

Comments

@DahnJ
Copy link
Contributor

DahnJ commented Oct 2, 2024

Code Sample & Problem description

I have two sample file that only differ on their AREA_OR_POINT specification (sample_tifs.zip)

❯ diff <(gdalinfo pixel-is-area-sub.tif) <(gdalinfo pixel-is-point-sub.tif)
2c2
< Files: pixel-is-area-sub.tif
---
> Files: pixel-is-point-sub.tif
36c36
<   AREA_OR_POINT=Area
---
>   AREA_OR_POINT=Point

when I read them, I would expect the coordinates of one of those to be shifted, as mentioned in the docstring for open_rasterio (link):

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

However, I don't observe this behaviour

>>> import rioxarray
>>> pt = rioxarray.open_rasterio('pixel-is-point-sub.tif')
>>> ar = rioxarray.open_rasterio('pixel-is-area-sub.tif')
>>> all(pt['x'] == ar['x'])
True
>>> all(pt['y'] == ar['y'])
True
>>> pt.rio.transform() == ar.rio.transform()
True
>>> pt['spatial_ref'].attrs == ar['spatial_ref'].attrs
True

How were the files created: I have taken a sample of SRTM data and have altered the metadata property using gdal_edit

❯ cp pixel-is-point-sub.tif pixel-is-area-sub.tif
❯ gdal_edit -mo "AREA_OR_POINT=Area" pixel-is-area-sub.tif

Am I doing something wrong here, or have I misunderstood the AREA_OR_POINT property?

Expected Output

I expected the coordinates to be shifted for one of the options and thus differ between them.

Perhaps that expectation is wrong and the coordinates are simply always shited to the same location with rioxarray. But then it would suggest to me that for PixelIsPoint (Raster Space docs), the coordinates, as read by rioxarray, are actually half a pixel away from where the point was sampled, which would be surprising to me.

I may have also potentially created the files in the wrong way.

Environment Information

python -c "import rioxarray; rioxarray.show_versions()"
rioxarray (0.17.0) deps:
  rasterio: 1.3.11
    xarray: 2024.9.0
      GDAL: 3.9.2
      GEOS: 3.13.0
      PROJ: 9.5.0
 PROJ DATA: /Users/mysusr/miniconda3/envs/myenv/share/proj
 GDAL DATA: /Users/myusr/miniconda3/envs/myenv/share/gdal

Other python deps:
     scipy: 1.14.1
    pyproj: 3.6.1

System:
    python: 3.12.6 | packaged by conda-forge | (main, Sep 30 2024, 17:55:20) [Clang 17.0.6 ]
executable: /Users/myusr/miniconda3/envs/myenv/bin/python
   machine: macOS-14.5-arm64-arm-64bit

Installation method

Conda

Conda environment information (if you installed with conda):


Environment (conda list):
$ conda list | grep -E "rasterio|xarray|gdal"
gdal                      3.9.2           py312h936c49d_4    conda-forge
libgdal                   3.9.2                hce30654_4    conda-forge
libgdal-core              3.9.2                h3535123_5    conda-forge
libgdal-fits              3.9.2                h248c7bc_4    conda-forge
libgdal-grib              3.9.2                h6d3d72d_4    conda-forge
libgdal-hdf4              3.9.2                h3847bb8_4    conda-forge
libgdal-hdf5              3.9.2                h2def128_4    conda-forge
libgdal-jp2openjpeg       3.9.2                hd61e619_4    conda-forge
libgdal-kea               3.9.2                h7b2de0b_4    conda-forge
libgdal-netcdf            3.9.2                h5e0d008_4    conda-forge
libgdal-pdf               3.9.2                h587d690_4    conda-forge
libgdal-pg                3.9.2                h147afc8_4    conda-forge
libgdal-postgisraster     3.9.2                h147afc8_4    conda-forge
libgdal-tiledb            3.9.2                h27a95ea_4    conda-forge
libgdal-xls               3.9.2                habc1c91_4    conda-forge
rasterio                  1.3.11          py312h6ce65b9_2    conda-forge
rioxarray                 0.17.0             pyhd8ed1ab_0    conda-forge
xarray                    2024.9.0           pyhd8ed1ab_0    conda-forge
xarray-spatial            0.4.0              pyhd8ed1ab_0    conda-forge

keywords for easy search in the future: Raster Space, GTRasterTypeGeoKey, PIXEL_OR_AREA, PixelIsArea, PixelIsPoint, pixelCenter

@DahnJ DahnJ added the bug Something isn't working label Oct 2, 2024
@snowman2
Copy link
Member

snowman2 commented Oct 2, 2024

https://gdal.org/en/latest/user/raster_data_model.html

AREA_OR_POINT: May be either "Area" (the default) or "Point". Indicates whether a pixel value should be assumed to represent a sampling over the region of the pixel or a point sample at the center of the pixel. This is not intended to influence interpretation of georeferencing which remains area oriented.

@DahnJ
Copy link
Contributor Author

DahnJ commented Oct 2, 2024

Thanks for the quick answer @snowman2 !

I interpret your answer as saying: coordinates should stay the same in the in-memory xarray dataset both cases. The only difference is how they are then intepreted. It's up to the user to make the right choice. Am I interpreting it correctly?

To me, this makes the docstring for open_rasterio quite confusing, specifically:

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

I read that as: in-memory representation's coordinates always represent the center of the pixel. That seems to me in direct contradiction with the above, where in the PixelIsArea case, the coordinates (without any shifting) would refer to the top-left corner of the pixel.


More relevant links for future visitors:

@Kirill888
Copy link
Contributor

Kirill888 commented Oct 2, 2024

@DahnJ

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel

That refers to coordinate values stored in the x,y coordinate arrays. Here "pixel coordinate" refers to the physical location in the world of the center of the pixel. Image coordinate of 0.0,0.0 always refers to the top left corner of the top left pixel, and Affine matrix maps from pixel image coordinates to world coordinates. Point a.x[i], a.y[j] is always world location of the pixel center for pixel a[j, i], regardless of the underlying format of the input file.

But I agree "shifted" is probably not the best word to use to describe it.

@DahnJ
Copy link
Contributor Author

DahnJ commented Oct 3, 2024

Thank you @Kirill888 for the answer.

We've seen that, for both Point and Area

  • Information surfaced by gdalinfo does not change
  • Coordinates & transforms in rioxarray do not change

You also mention

  • Image coordinate of 0.0,0.0 always refers to the top left corner of the top left pixel
  • Point a.x[i], a.y[j] is always world location of the pixel center for pixel a[j, i]

Then what does change? I'm failing to see any difference between files with Area and Point.

@Kirill888
Copy link
Contributor

The difference is in what transform is stored in the geotiff tags inside the file. By the time rasterio library sees the transform it has been normalised by gdal to be equivalent to “area mode”. It’s really just a quirk of geotiff spec.

@DahnJ
Copy link
Contributor Author

DahnJ commented Oct 4, 2024

I see. And it's something that even gdalinfo doesn't surface, since it already transforms the geotransform during reading the file to produce the info?

If that's the case, then I think that the line

The x and y coordinates are generated automatically from the file's geoinformation, shifted to the center of each pixel (see"PixelIsArea" Raster Space <http://web.archive.org/web/20160326194152/http://remotesensing.org/geotiff/spec/geotiff2.5.html#2.5.2>_ for more information). (source)

is likely to lead people astray rather then help. It has done that for me and a handful of other people I've asked and the area/point distinction seems to even have managed to confuse the author of the GDAL code. If this transformation is truly an implementation detail that GDAL takes care of then I'd say rioxarray does not need to refer to it.

I can see value in saying something like

The x and y coordinates are generated from the file's geoinformation and refer to the center of the pixel.

This clarifies the meaning of the coordinates without sending them into a rabbit hole that ultimately doesn't help them.

What do you think @snowman2? Happy to open a PR.

@snowman2
Copy link
Member

snowman2 commented Oct 4, 2024

Yes, a PR with clarification is welcome.

@snowman2 snowman2 added question Further information is requested documentation Documentation related issue windows and removed bug Something isn't working windows labels Oct 4, 2024
DahnJ added a commit to DahnJ/rioxarray that referenced this issue Oct 4, 2024
@DahnJ
Copy link
Contributor Author

DahnJ commented Oct 4, 2024

@snowman2 great, thanks. Made the change here #811

snowman2 pushed a commit that referenced this issue Oct 5, 2024
* docs(_io.py): clearer docstring

see #805
for discussion

* docs(_io.py): add back "automatically"

* style(_io.py): trim whitespace
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation related issue question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants