Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esriLayers - Ensure url is MapServer or FeatureServer #41

Closed
jacpete opened this issue Jan 11, 2022 · 7 comments
Closed

esriLayers - Ensure url is MapServer or FeatureServer #41

jacpete opened this issue Jan 11, 2022 · 7 comments
Labels

Comments

@jacpete
Copy link
Collaborator

jacpete commented Jan 11, 2022

@elipousson Would you be able to give more details about this comment:

  • I'd love to add the option of returning a list of layers for all services as well as the folders and services. I think this should be relatively straightforward with the esriLayers() function but I ran into some performance issues when testing it out. Suggestions on how to incorporate this are welcome.

What kind of performance issues did you get? Were there unexpected errors in the R code or was the server returning error codes? I was trying to test out what the issue could be but I have no examples of the issue you were having. Just as a note the '/layer' subpage is only available for MapServer and FeatureServer urls. If a feature url for one of these services is provided it will truncate the url to the service type.

#All Valid entries
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Demographics/ESRI_Census_USA/MapServer/3")
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/TaxParcel/AssessorsBasemap/MapServer")
esriLayers("https://carto.nationalmap.gov/arcgis/rest/services/contours/MapServer")

#Not Valid (current return shown below)-but better error handling (explicit check for '/(FeatureServer|MapServer)/?$' needs added)
esriLayers("https://sampleserver1.arcgisonline.com/ArcGIS/rest/services/Elevation/ESRI_Elevation_World/GPServer")
# $error
# $error$code
# [1] 400
# 
# $error$message
# [1] "Unable to complete  operation."
# 
# $error$details
# [1] "GPTask 'layers' does not exist or is inaccessible."
# 
# 

Originally posted by @jacpete in #39 (comment)

@jacpete jacpete added the bug label Jan 11, 2022
jacpete added a commit to jacpete/esri2sf that referenced this issue Jan 11, 2022
@elipousson
Copy link
Contributor

I'm not sure exactly what the issue is but here is an reprex for a server where I ran into it. I had to include esriIndex in the reprex to get it to render.

esriIndex <- function(url, parent = NULL, recurse = TRUE, layers = FALSE) {
  urlServer <-
    stringr::str_extract(url, ".+rest/services")
  
  urlInfo <-
    jsonlite::read_json(paste0(urlServer, "?f=json"), simplifyVector = TRUE)
  
  folders <-
    tibble::tibble(
      name = as.character(urlInfo[["folders"]]),
      type = "Folder",
      url = paste0(urlServer, "/", urlInfo[["folders"]]),
      parent = parent
    )
  
  services <-
    tibble::tibble(
      name = as.character(urlInfo[["services"]][["name"]]),
      type = as.character(urlInfo[["services"]][["type"]]),
      url = paste0(urlServer, "/", urlInfo[["services"]][["name"]], "/", urlInfo[["services"]][["type"]], recycle0 = TRUE),
      parent = parent
    )
  
  urlIndex <-
    dplyr::bind_rows(
      folders,
      services
    )
  
  if (layers) {
    layers <- purrr::map2_dfr(
      index$url, index$name,
      ~ layerIndex(url = .x, parent = .y)
    )
    
    urlIndex <-
      dplyr::bind_rows(
        urlIndex,
        layers
      )
  }
  
  if (recurse) {
    if (length(folders[["name"]]) > 0) {
      urlIndex <-
        dplyr::bind_rows(
          urlIndex,
          purrr::pmap_dfr(
            folders,
            ~ esriIndex(url = ..3, parent = ..1)
          )
        )
    }
  }
  
  urlIndex
}

index <-
  esriIndex(url = "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services") |> 
  dplyr::filter(type != "SceneServer")

layers <-
  purrr::map_dfr(
    index$url,
    ~ esri2sf::esriLayers(.x)
  )
#> Error in value[[3L]](cond): Url is not a valid ESRI Map or Feature Service Url.
#> Invalid URL

Created on 2022-01-11 by the reprex package (v2.0.1)

I tried wrapping esriLayers with purrr::possibly() to skip over errors. Maybe I'm being impatient but this seems like it never finishing executing and just gets stuck somewhere.

index <-
  esriIndex(url = "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services")

possibly_esriLayers <- purrr::possibly(esri2sf::esriLayers, NULL)

layers <-
  purrr::map_dfr(
  index$url,
  ~ possibly_esriLayers(.x)
)

@jacpete
Copy link
Collaborator Author

jacpete commented Jan 12, 2022

I think I identified a couple issues that caused it to error out. I left your esriIndex code as is and got the index data.frame from your example

index <- esriIndex(url = "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services")

but I changed the layers object from

layers <-
  purrr::map_dfr(
    index$url,
    ~ esri2sf::esriLayers(.x)
  )

to

layers <- purrr::map(index$url, ~{
      esriUrl_isValidService(.x)
      # esri2sf::esriLayers(.x)
})

to see if any URL's are causing issues.

I found 3 failing URL's representing two separate bugs:

failUrl <- index$url[which(!unlist(layers))]
failUrl |> 
  purrr::walk(~{
    print(.x)
    esriUrl_isValidService(.x, displayReason = TRUE)
  })
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17–19_Mort__Rate_Map_Layer/FeatureServer"
# Url is not a valid ESRI Service Url.
# Invalid URL
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17–19_Mort__Rate_Map_Layer/FeatureServer"
# Url is not a valid ESRI Service Url.
# Invalid URL
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/MDCOVID19_TotalCurrentlyHospitalizedAcuteAndICU/FeatureServer"
# Url is not a valid ESRI Service Url.
# Token Required

Additionally I noticed in your original command you were using map_dfr when I think we will need to use map as esriLayers returns a list of two data.frames I believe.

Issue 1: URL encoding

The first two have an 'Invalid URL' warning that is thrown by
https://github.com/jacpete/esri2sf/blob/202f5b82e489860b51b905616ef1c89f772db949/R/esriUrl.R#L18-L20
in the esriUrl_isValidType function. We can test this with:

url1.1 <- failUrl[[1]]
!is.na(rvest::html_element(rvest::read_html(url1.1), 'div.restErrors'))
# [1] TRUE

url2.1 <- failUrl[[2]]
!is.na(rvest::html_element(rvest::read_html(url2.1), 'div.restErrors'))
# [1] TRUE

Error says something is wrong with the URL but printing them, copying to my browser, and looking at them yielded valid pages. However I copied the URL's back into R (the urlX.2 objects and found that some URL encoding was done on them in the browser.

url1.2 <- "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"
identical(url1.1, url1.2)
# [1] FALSE
url1.1
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17–19_Mort__Rate_Map_Layer/FeatureServer"
url1.2
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"

url2.2 <- "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"
identical(url2.1, url2.2)
# [1] FALSE
url2.1
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17–19_Mort__Rate_Map_Layer/FeatureServer"
url2.2
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"

The hyphen they use is actually an EN DASH and needs encoded in R. Correctly encoded, the URL should be valid.

esriUrl_isValidService(url = url1.2, displayReason = TRUE)
# [1] TRUE
esriUrl_isValidService(url = url2.2, displayReason = TRUE)
# [1] TRUE

To fix this bug, we need to see if we can encode characters in the URL. But I will have to think of which function and where this should be appropriate. Should there be an encoding check in esriUrl_isValid? When and where should we enforce encoding inside a function or should it be left as a warning and have users handle it themselves? I don't think we will have that luxury in this esriIndex function if we want it to work properly.

Issue 2: Need to add token argument

I thought of this just recently that token arguments need added to basically every function in the package that pings and ESRI REST Server and this really proves the point. My problem is lacking access to test token verification functionality. It is interesting that the esriIndex can pick up a service that is locked behind authentication though. This is part of a issue in the package and I will create a separate issue to tackle it.

jacpete added a commit to jacpete/esri2sf that referenced this issue Jan 12, 2022
…a URL in esriUrlValidType if an 'Invalid URL' message is returned by the server.
@jacpete
Copy link
Collaborator Author

jacpete commented Jan 12, 2022

Found that encoding is pretty simple

url1.1
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17–19_Mort__Rate_Map_Layer/FeatureServer"
url1.3 <- utils::URLencode(url1.1)
url1.3
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"
identical(url1.2, url1.3)
# [1] TRUE

url2.1
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17–19_Mort__Rate_Map_Layer/FeatureServer"
url2.3 <- utils::URLencode(url2.1)
url2.3
url2.2
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17%E2%80%9319_Mort__Rate_Map_Layer/FeatureServer"
identical(url2.2, url2.3)
# [1] TRUE

Added https://github.com/jacpete/esri2sf/blob/f914d6898c80448ae8ab5f0479cccc124313191c/R/esriUrl.R#L21-L25 to esriUrl_isValidType so that a better failure message can be printed.
Now when I run esriUrl_isValidService on the failUrl's I get a more informative message of a likely issue.

failUrl %>% 
  purrr::walk(~{
    print(.x)
    esriUrl_isValidService(.x, displayReason = TRUE)
  })
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/17–19_Mort__Rate_Map_Layer/FeatureServer"
# Url is not a valid ESRI Service Url.
# Invalid URL: Check encoding of supplied URL.
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/Centroids_for_17–19_Mort__Rate_Map_Layer/FeatureServer"
# Url is not a valid ESRI Service Url.
# Invalid URL: Check encoding of supplied URL.
# [1] "https://services.arcgis.com/njFNhDsUCentVYJW/ArcGIS/rest/services/MDCOVID19_TotalCurrentlyHospitalizedAcuteAndICU/FeatureServer"
# Url is not a valid ESRI Service Url.
# Token Required

Still need to think/discuss when and where to implement a hard URLencode() in the package or new esriIndex function

@elipousson
Copy link
Contributor

I think it might address both issue 1 and 2 if we add URLencode() to esrimeta (in the jsonlite::fromJSON call) and then use esrimeta to return the urlInfo object within esriIndex.

Then adding URLencode() to getEsriFeatures() should address issue 1 for both esri2sf and esri2df.

Does that make sense?

@jacpete
Copy link
Collaborator Author

jacpete commented Jan 16, 2022

I was mainly worried that adding a hard encode to every url passed to a function could ultimately lead to a error if a url is already encoded or only partially encoded. Looking through the URLencode documentation (?URLencode()), I noticed it has a 'repeated' argument:

repeated logical: should apparently already-encoded URLs be encoded again?

and I was thinking that as long as this was FALSE we should be okay and in the details section it gives a better explanation of how it decides if a url is already encoded:

An ‘apparently already-encoded URL’ is one containing %xx for two hexadecimal digits.

This should solve the case of my worry if a url is already fully encoded, but may lead to an uninformative error if a url is only partially encoded (e.g., someone creates a url with paste combining a url that is encoded already with some subpages or something that need encoded still). This is a weird example and would probably be regarded as a user error by supplying a invalid url. So I guess I would overall be okay with adding the hard encodes unless you can think of any other objection.

@elipousson
Copy link
Contributor

Glad to hear they anticipated that issue with URLencode(). I'll admit it didn't even occur to me that someone might supply an encoded URL but I suppose there may be users who have encountered this issue already and are using that as a work around. I can't think of any objections to this approach.

@jacpete
Copy link
Collaborator Author

jacpete commented May 11, 2022

I think #40 fixed these issues but am open to reopening this or moving the conversation to #39.

@jacpete jacpete closed this as completed May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants