Skip to content

Format numbers so scientific notation isn't used in paging parameters #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 5, 2020

Conversation

ateucher
Copy link
Member

@ateucher ateucher commented Dec 4, 2020

Description

Our package bcdata uses {crul} to get spatial data from a WFS server - some of these are very large and require paged requests with rows into the hundreds of thousands. Scientific number formatting of paging parameters can sometimes result in requests looking like this (note the startIndex value):

https://openmaps.gov.bc.ca/geo/pub/wfs/?SERVICE=WFS&VERSION=2.0.0&REQUEST=GetFeature&outputFormat=application%2Fjson&typeNames=WHSE_BASEMAPPING.FWA_WATERSHEDS_POLY&SRSNAME=EPSG%3A3005&sortby=OBJECTID&startIndex=1e%2B05&count=1000

This returns 1000 records, but doesn't start at the correct spot (I'm not totally sure how 1e+05 is interpreted by the server though I think it might just be ignored).

This fix formats numbers as character (avoiding scientific notation) when they are added to the query.

Example

cr_url <- "https://api.crossref.org"
cli <- HttpClient$new(url = cr_url)
aa <- Paginator$new(client = cli, by = "query_params", limit_param = "rows",
                    offset_param = "offset", limit = 500000, limit_chunk = 100000)
aa$url_fetch()
#> [1] "https://api.crossref.org/?offset=0&rows=1e%2B05"      
#> [2] "https://api.crossref.org/?offset=1e%2B05&rows=1e%2B05"
#> [3] "https://api.crossref.org/?offset=2e%2B05&rows=1e%2B05"
#> [4] "https://api.crossref.org/?offset=3e%2B05&rows=1e%2B05"
#> [5] "https://api.crossref.org/?offset=4e%2B05&rows=1e%2B05"

With the submitted fix it looks like this:

cr_url <- "https://api.crossref.org"
cli <- HttpClient$new(url = cr_url)
aa <- Paginator$new(client = cli, by = "query_params", limit_param = "rows",
                    offset_param = "offset", limit = 500000, limit_chunk = 100000)
aa$url_fetch()
#> [1] "https://api.crossref.org/?offset=0&rows=100000"     
#> [2] "https://api.crossref.org/?offset=100000&rows=100000"
#> [3] "https://api.crossref.org/?offset=200000&rows=100000"
#> [4] "https://api.crossref.org/?offset=300000&rows=100000"
#> [5] "https://api.crossref.org/?offset=400000&rows=100000"

I also updated the url_fetch() tests to use larger numbers (that would be converted to scientific notation) and a test for the new utility function num_format().

Addresses bcgov/bcdata#235

@codecov-io
Copy link

Codecov Report

Merging #151 (523622c) into master (b179e5f) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #151      +/-   ##
==========================================
+ Coverage   86.16%   86.19%   +0.02%     
==========================================
  Files          22       22              
  Lines        1070     1072       +2     
==========================================
+ Hits          922      924       +2     
  Misses        148      148              
Impacted Files Coverage Δ
R/make_url.R 97.22% <100.00%> (ø)
R/zzz.R 77.58% <100.00%> (+0.80%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b179e5f...523622c. Read the comment docs.

@sckott
Copy link
Collaborator

sckott commented Dec 5, 2020

Thanks! Looks good to me.

This also affects HttpClient since add_query is used within that class. But I can add further tests for it.

@sckott sckott merged commit e38f7b5 into ropensci:master Dec 5, 2020
@ateucher ateucher deleted the format-paging branch December 5, 2020 00:16
@ateucher
Copy link
Member Author

ateucher commented Dec 5, 2020

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants