Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Projection Extension #485

Merged
merged 56 commits into from
Dec 20, 2019
Merged

Projection Extension #485

merged 56 commits into from
Dec 20, 2019

Conversation

philvarner
Copy link
Collaborator

@philvarner philvarner commented Jun 8, 2019

PR Checklist:

  • This PR has no breaking changes.
  • I have added my changes to the CHANGELOG or a CHANGELOG entry is not required.
  • API only: I have run npm run generate-all. to update the generated OpenAPI files.

@philvarner philvarner changed the title Projection WIP: Projection Extension Jun 8, 2019
@m-mohr
Copy link
Collaborator

m-mohr commented Jun 8, 2019

Some unsorted thoughts:

Having it in a separate extension is better, thanks.

I'm not sure whether I'd remove the epsg code from EO.

We decided some versions ago to keep epsg and drop WKT/PROJ as STAC is focused on search. You can search for EPSG codes quite well, but nobody would ever search for a PROJ/WKT string, we argued. Is this still what we think/want?

I think this extension should have a required field so that validation works.

Should we also allow WKT?

A GeoJSON geometry that is not in EPSG:4326 is not GeoJSON, therefore we shouldn't name it GeoJSON...

What is the centroid useful for?

@philvarner
Copy link
Collaborator Author

I'm not sure whether I'd remove the epsg code from EO.

I will re-add it.

@philvarner
Copy link
Collaborator Author

I think this extension should have a required field so that validation works.

I will add one

@philvarner
Copy link
Collaborator Author

GeoJSON geometry that is not in EPSG:4326 is not GeoJSON, therefore we shouldn't name it GeoJSON...

Good point, will update words.

@philvarner
Copy link
Collaborator Author

What is the centroid useful for?

There are a lot of tools, for example Kibana Maps, which can generate heatmaps based on point data. This would allow tools to do this without needing to do the (relatively expensive) calculation at render-time based on the geometry or bbox.

@philvarner
Copy link
Collaborator Author

We decided some versions ago to keep epsg and drop WKT/PROJ as STAC is focused on search. You > can search for EPSG codes quite well, but nobody would ever search for a PROJ/WKT string, we
argued. Is this still what we think/want?
Should we also allow WKT?

As a general point, I'd like STAC to add more features around usage of data, beyond the existing exploration/query focus. Ideally, we'd like to do as much of our compute-time spatial filtering, etc, based on the metadata in the STAC entities without needing to retrieve any of the asset bits until we actually need the pixels. For example, the COG headers to do the offset retrieval without needing to fetch the header for each request.

This is one of those options. We're using PROJ4 internally instead of just EPSG code, though I need to ask my coworkers why we chose to do that. I also wanted to be as general as possible, in case there are projections that don't have EPSG codes (even though I can't think of any and don't think I'll have any use of that.

This reverts commit 6ba4911.
@m-mohr
Copy link
Collaborator

m-mohr commented Jun 9, 2019

I will re-add it.

It was just my opinion. We should discuss it in a broader group, at least with @matthewhanson. Best on the next bi-weekly call.

I will add one

It will also require a JSON Schema, of course ;-)

There are a lot of tools, for example Kibana Maps, which can generate heatmaps based on point data. This would allow tools to do this without needing to do the (relatively expensive) calculation at render-time based on the geometry or bbox.

Oh, I see. At one hand I'd like to keep the spec and fields always relatively small, on the other hand I see that this can be useful. Hmm...

As a general point, I'd like STAC to add more features around usage of data, beyond the existing exploration/query focus. Ideally, we'd like to do as much of our compute-time spatial filtering, etc, based on the metadata in the STAC entities without needing to retrieve any of the asset bits until we actually need the pixels. For example, the COG headers to do the offset retrieval without needing to fetch the header for each request.

Very similar for us. We also add processing information (e.g. offset, scale, no data values) for processing as we need it within openEO. As I said above, it's hard to keep things simple and useable at the same time. We don't want to be the next ISO 19115, right? ;-)

This is one of those options. We're using PROJ4 internally instead of just EPSG code, though I need to ask my coworkers why we chose to do that. I also wanted to be as general as possible, in case there are projections that don't have EPSG codes (even though I can't think of any and don't think I'll have any use of that.

There are indeed some projections that have no EPSG code. I think there were some in the context of MODIS data. I also remember this discussion when we removed PROJ (and WKT) from the EO extension.
Should we rename proj:crs to proj:proj (well, interesting combination though) so that we could also add proj:wkt at some point? Or whould we allow both in proj:crs? I'm always unsure which to allow in addition to EPSG. In openEO we started with proj, added WKT, but overall it feels like a mess.

@geospatial-jeff
Copy link

geospatial-jeff commented Jun 9, 2019

I'm always unsure which to allow in addition to EPSG. In openEO we started with proj, added WKT, but overall it feels like a mess.

From my experience, WKT is the messiest with the least standardization between vendors (see here). As mentioned above, EPSG is the most searchable and is more standardized than WKT but still has some issues. My biggest issue with EPSG is its inability to describe custom coordinate systems, only those which reside in the EPSG database. PROJ4 is the best for explicitly describing a spatial reference and provides a very flexible representation but is harder to search.

Should we rename proj:crs to proj:proj (well, interesting combination though) so that we could also add proj:wkt at some point?

I think a proj:type field which indicates the type of spatial reference expressed in proj:crs would be helpful. These two fields could then be used to define the spatial reference of an item in cases where there is no valid EPSG value (where proj:epsg is null) while giving the user flexibility in their choice of representation.

You can search for EPSG codes quite well, but nobody would ever search for a PROJ/WKT string, we argued

This got me thinking about searches on PROJ strings. A simple implementation could decompose the PROJ string into a JSON of its parts, potentially exposing more advanced searches on the item's spatial reference than possible with EPSG. Consider the PROJ string for WGS 84 UTM Zone 11 (EPSG:32611):

+proj=utm +zone=11 +ellps=WGS84 +datum=WGS84 +units=m +no_defs

{
  "proj:type": "proj4",
  "proj:crs": {
    "proj": "utm",
    "zone": "11",
    "ellps": "WGS84",
    "datum": "WGS84",
    "units": "m"
  }
}

This would allow searching across individual components of the spatial reference (such as datum) which is not always possible when searching with EPSG.

@matthewhanson
Copy link
Collaborator

As we talked about in the sprint, my overall concern here is bloating the individual Items. I get that these records are small size wise, relative to assets, but if we end up adding lots of extensions for pretty specific purposes, the size of a STAC Item can get very large and contain a lot of data many users will not be concerned about.

For instance, as the creator of the Landsat and sentinel catalogs I would not be inclined to include the information for this extension.

However, one alternative, which we talked briefly about, is to think about adding details usage information in standardizes assets. That is, we could have extensions that don't add this information to the STAC Item, but rather then define a specific asset that the Item should contain. For instance, a projection asset that is an JSON asset with a specific structure.

I could also see this approach being used for other more complex info that is for a specific use, such as the visualization extension (#470). This way, if you want to use that info after getting the STAC Item, you just need to fetch that JSON asset, but it doesn't force everyone to fetch it.

@philvarner
Copy link
Collaborator Author

I appreciate the concerns with bloating the items, but I feel that this is fundamental spatial information about the items and assets, rather than a "pretty specific purpose", even to the point that this should be part of core rather than an extension.

By not including the type of information in the Landsat and Sentinel-2 catalogs, it limits the utility of it for a significant class of users. In my experience, we found almost all the data we needed in the catalogs, but not quite, so we had to pull it from the source data or metadata files anyway, which was pretty costly in terms of development and run-time.

@m-mohr
Copy link
Collaborator

m-mohr commented Jun 11, 2019

The PR as it is right now would require all users to also provide a PROJ string in addition to the EPSG code. Why should I be required to add both if it could simply be a EPSG code for most(?) use cases? I think I would only require proj:epsg...

From my experience, WKT is the messiest with the least standardization between vendors (see here).

Is that solved with WKT2?

@geospatial-jeff
Copy link

Is that solved with WKT2?

Yes WKT2 fixes a lot of the standardization issues within the spec itself but there are still lots of vendors who don't use WKT2. I've always preferred PROJ over WKT but since GDAL recently added support for WKT2 in 3.0 and already supports both OGC and ESRI style WKT strings, it could be a nice alternative.

I think I would only require proj:epsg...

Agreed. I see this extension primarily as a way of defining the spatial reference of the item in cases where an EPSG code is not satisfactory, so it makes sense to keep the other fields optional.

@philvarner
Copy link
Collaborator Author

Updated so only proj:epsg is required. Also made explicitly that these should be in Properties rather than in the Item root. Since Assets are not defined to have Properties (maybe they should?), I removed that as a target type for the extension attributes.

However, I see that we have at least two extensions (sar and checksum) that add properties directly into Asset -- so maybe we should have the convention that extension attributes are only added into Properties, and change those ones?

@m-mohr m-mohr requested a review from matthewhanson June 11, 2019 17:05
@m-mohr
Copy link
Collaborator

m-mohr commented Jun 11, 2019

Also made explicitly that these should be in Properties rather than in the Item root.

All other extension only just say "Item fields" and not "Item Properties fields". So either it's clear or we should do it for all extensions.

Since Assets are not defined to have Properties (maybe they should?)

Feels like too much nesting. We basically only use properties because GeoJSON wants us to. I would also be fine to have everything top level ;-)

However, I see that we have at least two extensions (sar and checksum) that add properties directly into Asset

EO has that, too.

so maybe we should have the convention that extension attributes are only added into Properties, and change those ones?

Not sure, see above. For example, we don't have properties in collections. I think extensions could add anywhere, but properties are required for Items due to the GeoJSON spec.

@philvarner
Copy link
Collaborator Author

One example of common projection without an EPSG code is Sinusoidal, used by MODIS.

Copy link
Collaborator

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added three comments with additional minor suggestions.

@philvarner
Copy link
Collaborator Author

@m-mohr just addressed all of those open comments you had.

Copy link
Collaborator

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. I found two other issues:

  1. The example is not a valid STAC Item, see the corresponding comment.
  2. All Items (not Collections!) in which you replaced eo:epsg with proj:epsg need to add the proj extension to the stac_extensions field.

@m-mohr
Copy link
Collaborator

m-mohr commented Dec 18, 2019

Okay, looks good now as far as I can see. But it's hard to see with so many commits and comments ;-)

@philvarner philvarner requested review from matthewhanson and removed request for matthewhanson December 19, 2019 13:51
@matthewhanson matthewhanson merged commit 09f5239 into radiantearth:dev Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new extension prio: must-have required for release associated with
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants