Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crs$wkt = NA possible? #1359

Closed
mdsumner opened this issue Apr 11, 2020 · 16 comments
Closed

crs$wkt = NA possible? #1359

mdsumner opened this issue Apr 11, 2020 · 16 comments

Comments

@mdsumner
Copy link
Member

mdsumner commented Apr 11, 2020

I see obstructive error with

library(sf)
crs <- st_crs("+proj=longlat +datum=WGS84")
## assume we don't have PROJ/GDAL library available, we're preserving "input"
crs$wkt <- NA_character_ 
st_sfc(st_point(cbind(1, 1)), crs = crs)
Geometry set for 1 feature 
geometry type:  POINT
dimension:      XY
bbox:           xmin: 1 ymin: 1 xmax: 1 ymax: 1
Error in CPL_crs_parameters(x) : crs not found

and even if constructing manually, the error is pervasive:

example(read_sf)
g <- st_geometry(nc)
## again, purely simulating construction/preservation from non-PROJ/GDAL environment
attr(g, "crs")$wkt <- NA_character_ 

g
Geometry set for 100 features 
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Error in CPL_crs_parameters(x) : crs not found

plot(g)
Error in CPL_crs_parameters(x) : crs not found

This in CRAN 0.9-1, and also in trunk. Will you consider a PR for allowing NA wkt string and non-NA input?

@mdsumner
Copy link
Member Author

Side note, now in hindsight I wish that $input was a list with $epsg, $proj, $misc (the last for "unknown" or similar) so that we could more readily record what we had.

@mdsumner
Copy link
Member Author

Oh, I see that crs$proj and crs$epsg is still fine, for print and plot.

If it's intended for that to be supported then nothing to do from my perspective (there's a few things that ought work under that regime, but I'll follow up pending).

@edzer
Copy link
Member

edzer commented Apr 11, 2020

When using sf, why would you

assume we don't have PROJ/GDAL library available

?

@mdsumner
Copy link
Member Author

I'm not using sf, I'm simply creating its format. (I'm developing software)

@edzer
Copy link
Member

edzer commented Apr 11, 2020

So why is that an sf issue then?

@mdsumner
Copy link
Member Author

mdsumner commented Apr 11, 2020

It may not be.

Will you consider a PR for allowing NA wkt string and non-NA input?
Oh, I see that crs$proj and crs$epsg is still fine, for print and plot.

If it's intended for that to be supported then nothing to do from my perspective (there's a few things that ought work under that regime, but I'll follow up pending).

Is it intended that old-style crs-class with $proj and $epsg elements will be supported as they are now?

@edzer
Copy link
Member

edzer commented Apr 11, 2020

Will you consider a PR for allowing NA wkt string and non-NA input?

No: see #1146 and #1225.

Oh, I see that crs$proj and crs$epsg is still fine, for print and plot.
If it's intended for that to be supported then nothing to do from my perspective (there's a few things that ought work under that regime, but I'll follow up pending).

For now, that works, but everything goes through wkt -> GDAL -> exportToProj4. proj4strings are, and will remain one possible way of constructing crs objects with st_crs(), but I am planning to deprecate crs$proj4string queries, and will soon start with printing a warning with it since the conversion wkt -> proj4string is lossy, and may introduce hard assumptions about axis order. @rsbivand pointed me to this thread, where we were adviced to stop using proj4strings. That was 2012; it is now 2020, and we're following that advice.

I can't see why anyone developing software now and using PROJ would want to stick to using proj4strings for representing spatial reference systems.

@mdsumner
Copy link
Member Author

I don't want to represent these things with proj4strings. We always knew they were problematic, incomplete - like all metadata for ever. I just asked whether you would continue to be deliberately obstructive, and I have my answer.

@edzer
Copy link
Member

edzer commented Apr 11, 2020

Breaking changes are never nice, this one was announced loudly months ago, and we knew it would come one day since 2012.

I know we have different opinions about what R packages should look like, and I heard you rambling on twitter about sf being an uber GIS. Nevertheless I'd be interested to hear what kind of progress I'm deliberately obstructing, here, and whether you see forward-looking alternatives.

@mdsumner
Copy link
Member Author

Well it's really too late for sf, given how monolithic it is and impossible to tease apart, but - not being hard-error-strict about a basic piece of metadata that has always been incomplete and always will be would be good. I just don't think you can understand though, I've been talking at this brick wall for so long and you just don't do the kind of work I do (or something). We'll get around it, sf is just a format - and there's no api for creating it (actually there is - sfheaders the best thing in spatial for years, but this will choke it) - the delicious irony is the A in GDAL - you force us to use a massive software library that's irrelevant most of the time, when it's key power is abstraction from format. We use this massively successful format for convenience and you shove PROJ and GDAL in the door when it's completely unnecessary and we could just preserve the string that came with the data. Let the user deal with it, hard errors don't allow information to flow you just get hard workarounds and compounding information destruction.

I'll make crs$proj and crs$epsg till my arms fall off, and I'll leave notes so a user can proceed when the error occurs. I won't make crs$wkt or crs$input because it will make sf fail right now. I could create $input which would at least preserve the information at hand, but ... hard-fail.

You won't ever understand progress that is not of your making, that is clear.

@rsbivand
Copy link
Member

rsbivand commented Apr 11, 2020 via email

@mdsumner
Copy link
Member Author

Ok I'll try again, there's a lot of baggage of mine up there ... (thanks Roger, it's not about Antarctic so much just the extreme variety of projection languages, and why not allow transmission before validation??).

I can do my job very well by creating sf objects without PROJ and GDAL. It's actually important that I can do that, and sf has established itself as a popular format so there's quite a lot of good influence that comes from workable sf objects. The details of why that's necessary are completely irrelevant and literally just boring stuff about my job, but I happen to know many other contexts where it's also true for many other people. (I'm aware of the massive crowd of sf-fans that don't happen to ask for this particular flexibility, but I hope their influence is not great).

No one (to a first approximation) speaks WKT. A handful can construct a proj string. No one barring a few freaks is going to verbatim specify a WKT2 string.

The PROJ library will accept a wide variety of inputs (version dependent yes, i.e. no longer"+init=epsg...") but you can shove "4326" or "WGS84" or "+proj=laea +lat_0=-90 +datum=WGS84" or "EPSG:4326" into proj_create and it will happily go forth. Not everything is read by a GDAL driver. Lots of formats have partial proj, scraps of params, and within limited contexts (i.e. my job) these fragments are sufficient, for that holy day when PROJ::proj_create() is leveraged out to rescue some embarrassing partial crs spec and elevate it into the holy land of WKT2.

Please allow sf to allow structure(list(input = <whatever I have>, wkt = NA_character_), class = "crs") to not be an error. If we have sf we can put proj_create(crs$input) via st_crs() and be fully sanctioned, but why not allow an input that would otherwise be valid and accepted by PROJ to be ok? I'm reluctant to say that it looks like a power-move to enforce sf-constructors to actually use sf, because I don't think it is intended that way. The format has currency, and there's value in using the format from existing contexts.

I know this is late, it always is -but there was no announcement about a format change, and it's taken these months to see this tiny obstacle for what it is.

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Apr 11, 2020

This is over my head but I have one practical question in relation to sfheaders mentioned above: is it related to this? dcooley/sfheaders#51

Concrete examples can help and, at the risk of going off-topic, I think I may have one. I recently added these lines of code to allow objects created by sfheaders, which enables 100k+ OD pairs to be converted into linestrings in seconds, a revelation when using national OD datasets, to be given a CRS attribute if sf is available itsleeds/od@1daa692 :

    if(requireNamespace("sf")) {
      sf::st_crs(od_sfc) = sf::st_crs(z)
    }

Seems like an OK work-around to me, leveraging PROJ if it's available but using an NA CRS if not. Wonder, are there better ways of doing this?

If there are already, or in the pipeline, from perspectives of experienced developers and in terms of performance, maintainability of code and modularity/flexibility and other considerations, I'm all ears.

A huge + of open source is that it offers choice and if the various components fit well together that's great. I see no evidence of deliberate obstruction and generally find that sf plugs relatively well into other open source things like dplyr, data.table (realisation: rbindlist() is around 1000 times faster than rbind() for sf objects, I reduced a 3 hour job to combine 400k routes into 7 seconds and could barely believe it!). If sf can be made more modular without negatives I'm all in favour.

Without wanting to stick my oar in, knowledge of the negatives of "allowing NA wkt string and non-NA input" could help understand decisions being made, weighed against potential positives that this issue has highlighted.

As with another related issue in sfheaders dcooley/sfheaders#49 decisions made here may be none of my business, and I respect the right of technology creators to make final decisions about the direction of travel and of others to fork/re-use components if they would like open technologies to be different. I do have an interest because of what I'm doing in the od package and want to do things in a future-proof way that supports open source software, like we all do!

@edzer
Copy link
Member

edzer commented Apr 11, 2020

You won't ever understand progress that is not of your making, that is clear.

Mike, I take this as a personal attack, and don't accept that. Please abide by the code of conduct of this repository found in the README.md.

I'm willing to look at how far we can take sf with unvalidated crs objects lacking an wkt field -- if this can be done non-disruptively, who cares. A worry I can see now is that users (not you, but users of your CRAN packages) do something, create an object with unvalidated invalid crs, try to plot it with ggplot2 and then come here and expect help with the errors they get from sf. If I knew you'd be the one to catch all such issues, I'd be happy, but then you don't (want to) use sf. Developing software can be creative, but I'm seeing more and more of my time going into dealing with all kind of user and maintenance issues, many of them entirely non-creative. Wanting something comes at a cost.

Another solution would be to simply class your objects differently, and provide an st_as_sf method with what Robin proposes above: requiring sf and setting the crs. That would make it clear where things go wrong, when they do.

I do see the wkt challenge with sfheaders - note that even though sfheaders is not a revdep of sf I brought up dcooley/sfheaders#49, not Dave, not one of the sfheaders fan users.

But again, I'm willing to try this out, provided we can keep the discussion friendly.

edzer added a commit that referenced this issue Apr 14, 2020
* field wkt needs to be NA_character_
* postpones validation and wkt filling
* prints a message that an unvalidated crs object is found, and reports success.
* addresses #1359
@edzer
Copy link
Member

edzer commented Apr 14, 2020

Here's what it does, but it would really need testing by someone who wants this.

library(sf)
# Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
demo(nc, ask = FALSE, echo = FALSE)
# Reading layer `nc.gpkg' from data source `/home/edzer/R/x86_64-pc-linux-gnu-library/3.6/sf/gpkg/nc.gpkg' using driver `GPKG'
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# geographic CRS: NAD27
geom = st_geometry(nc)
# succeeds:
attr(geom, "crs")[[1]] = "EPSG:4326"
attr(geom, "crs")[[2]] = NA_character_
st_geometry(nc) = geom
nc
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# (fixing incomplete crs not created by package sf succeeded)
# geographic CRS: WGS 84
# First 10 features:
#     AREA PERIMETER CNTY_ CNTY_ID        NAME  FIPS FIPSNO CRESS_ID BIR74 SID74
# 1  0.114     1.442  1825    1825        Ashe 37009  37009        5  1091     1
# 2  0.061     1.231  1827    1827   Alleghany 37005  37005        3   487     0
# 3  0.143     1.630  1828    1828       Surry 37171  37171       86  3188     5
# 4  0.070     2.968  1831    1831   Currituck 37053  37053       27   508     1
# 5  0.153     2.206  1832    1832 Northampton 37131  37131       66  1421     9
# 6  0.097     1.670  1833    1833    Hertford 37091  37091       46  1452     7
# 7  0.062     1.547  1834    1834      Camden 37029  37029       15   286     0
# 8  0.091     1.284  1835    1835       Gates 37073  37073       37   420     0
# 9  0.118     1.421  1836    1836      Warren 37185  37185       93   968     4
# 10 0.124     1.428  1837    1837      Stokes 37169  37169       85  1612     1
#    NWBIR74 BIR79 SID79 NWBIR79                           geom
# 1       10  1364     0      19 MULTIPOLYGON (((-81.47276 3...
# 2       10   542     3      12 MULTIPOLYGON (((-81.23989 3...
# 3      208  3616     6     260 MULTIPOLYGON (((-80.45634 3...
# 4      123   830     2     145 MULTIPOLYGON (((-76.00897 3...
# 5     1066  1606     3    1197 MULTIPOLYGON (((-77.21767 3...
# 6      954  1838     5    1237 MULTIPOLYGON (((-76.74506 3...
# 7      115   350     2     139 MULTIPOLYGON (((-76.00897 3...
# 8      254   594     2     371 MULTIPOLYGON (((-76.56251 3...
# 9      748  1190     2     844 MULTIPOLYGON (((-78.30876 3...
# 10     160  2038     5     176 MULTIPOLYGON (((-80.02567 3...
# fails:
attr(geom, "crs")[[1]] = "EPSG:0000"
attr(geom, "crs")[[2]] = NA_character_
st_geometry(nc) = geom
nc
# Simple feature collection with 100 features and 14 fields
# Attribute-geometry relationship: 0 constant, 8 aggregate, 6 identity
# geometry type:  MULTIPOLYGON
# dimension:      XY
# bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
# (fixing incomplete crs not created by package sf failed on "EPSG:0000")
# Error in CPL_crs_parameters(x) : invalid crs object
# Calls: <Anonymous> ... print -> print.sfc -> crs_parameters -> CPL_crs_parameters
# In addition: Warning message:
# In CPL_crs_parameters(x) :
#   GDAL Error 1: PROJ: proj_create_from_database: crs not found
# Execution halted

@Robinlovelace
Copy link
Contributor

Robinlovelace commented Apr 14, 2020

Example test below. Not sure if this is what you're after and not tested on the latest version but this sort of thing could definitely be definitely is useful:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0
shp = st_read(system.file("shape/nc.shp", package="sf"))
#> Reading layer `nc' from data source `/home/robin/R/x86_64-pc-linux-gnu-library/3.6/sf/shape/nc.shp' using driver `ESRI Shapefile'
#> Simple feature collection with 100 features and 14 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> geographic CRS: NAD27
shp_geographic_crs = st_transform(shp, crs = 4326)
coords = sf::st_coordinates(shp_geographic_crs)
coords_mat = matrix(coords[, 1:2], ncol = 2)
midpoint = apply(coords_mat, 2, mean)
aeqd = sprintf(
  "+proj=aeqd +lat_0=%s +lon_0=%s +x_0=0 +y_0=0",
  midpoint[2], midpoint[1]
)
aeqd
#> [1] "+proj=aeqd +lat_0=35.5725611913905 +lon_0=-79.5564716837597 +x_0=0 +y_0=0"
sf::st_crs(aeqd)
#> Coordinate Reference System:
#>   User input: +proj=aeqd +lat_0=35.5725611913905 +lon_0=-79.5564716837597 +x_0=0 +y_0=0 
#>   wkt:
#> PROJCRS["unknown",
#>     BASEGEOGCRS["unknown",
#>         DATUM["World Geodetic System 1984",
#>             ELLIPSOID["WGS 84",6378137,298.257223563,
#>                 LENGTHUNIT["metre",1]],
#>             ID["EPSG",6326]],
#>         PRIMEM["Greenwich",0,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8901]]],
#>     CONVERSION["unknown",
#>         METHOD["Modified Azimuthal Equidistant",
#>             ID["EPSG",9832]],
#>         PARAMETER["Latitude of natural origin",35.5725611913905,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8801]],
#>         PARAMETER["Longitude of natural origin",-79.5564716837597,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8802]],
#>         PARAMETER["False easting",0,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8806]],
#>         PARAMETER["False northing",0,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8807]]],
#>     CS[Cartesian,2],
#>         AXIS["(E)",east,
#>             ORDER[1],
#>             LENGTHUNIT["metre",1,
#>                 ID["EPSG",9001]]],
#>         AXIS["(N)",north,
#>             ORDER[2],
#>             LENGTHUNIT["metre",1,
#>                 ID["EPSG",9001]]]]
shp2 = st_transform(shp_geographic_crs, crs = aeqd)
plot(st_geometry(shp))
plot(st_geometry(shp), add = TRUE)

Created on 2020-04-14 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants