-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inat occ query failing with limit > 3000 #215
Comments
thanks for the report, please include the actual code next time, and not screenshots. |
reinstall |
Thanks for the help - worked like a charm. Will include code next time. I knew it was the wrong choice as soon as I did it. ;) |
glad it works |
This just cropped up for me again with a different species. The reinstall solution above is now not working. Reproducible example below.
|
thanks, will have a look - what does |
Thanks @sckott . It reads 0.9.0.9811. |
i can't replicate your problem, but I only tried with up to 3200 records for inat. (tethered to phone now, will try with large limit later to see if that causes some kind of problem) |
Yeah, I just ran it successfully with pulling 3200 as well, so the problem must be pulling some number of records between 3200 and 53066 (or more). :) |
Additionally, just found a similar issue with querying gbif. I ran a search for Danaus plexippus for all gbif records (some where in 215k range). It ran overnight (over 12 hours) without finishing. Should I open a new issue for this? |
having a look |
for GBIF for that many records you're better off using the GBIF download API https://www.gbif.org/developer/occurrence#download available in rgbif with |
I'm still not getting no data proble on the Inat queries that you are getting. I do see with larger requests some warnings about combining data x = occ("Danaus plexippus", from = "inat", limit = 18020)
#> There were 41 warnings (use warnings() to see them)
warnings()
#> Warning messages:
#> 1: In data.table::rbindlist(x, fill = TRUE, use.names = TRUE) :
#> Column 2 ['tag_list'] of item 2 is length 0. This (and 0 others like it) has been filled with NA (NULL for list columns) to make each item uniform. but the data is still returned in this case. |
And thanks for the tip on GBIF - I'm trying to write a function that pulls and cleans all records from inat and gbif (a common workflow a number of projects we're working on), so it will be good to integrate the rgbif stuff for species with large numbers of occurrences. |
all records meaning literally all data from GBIF and iNat? |
Sorry - no, nothing that crazy! All records for a particular species on both iNat and GBIF - I.e. can I get all records with lat/long for a particular species from both sources in a nice tidy data frame. |
Also, more strange behavior on inat query limits:
|
okay, i finally did the |
the root problem here is that inaturalist at some point changed to limit to 10,000 records maximum - so with pagination, which we do internally in spocc, you can only get for example 200 records starting at page 51, cause 51*200 = 10,200, which is more than 10,000 we need to error better so that user gets the message, so we'll do that, but not sure what the workaround is when more than 10K records needed |
…n, add fixture for inat max records limit add docs to occ() fxn for inat limits and where to get more data
reinstall - i've made some changes. There isn't a fix for the issue of getting all the results though. But there are some alternatives. Staying within spocc, you can try getting inat data through gbif, e.g.: iNaturalist limits: they allow at most 10,000; query through GBIF to get more than 10,000 The inat research grade dataset on GBIF https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7 x <- occ(query = 'Danaus plexippus', from = 'gbif', limit = 10100,
gbifopts = list(datasetKey = "50c9509d-22c7-4a22-a47d-8c48425ef4a7"))
x$gbif |
ugh, lat/lon vars changed in the new API ... |
Nice. I'll re-install. I just finished a work-around that interacts with the inat api outside of spocc - it iterates through by year, which removes the page-limit issues. Happy to share code if you're at all interested. A frustrating problem because I'm sure we're not the only group of folks interested in downloading all occurrence data from multiple sources. Thanks again for all of your hard work on this! |
nice, that sounds good. by the way , the docs for the new inaturalist API we're using is here https://api.inaturalist.org/v1/docs/#!/Observations/get_observations you can do date queries with it like: x <- occ(query = 'Danaus plexippus', from = 'inat', limit = 10,
inatopts = list(year = 2010))
x$inat$meta$found
#> [1] 193
x$inat$data$Danaus_plexippus$observed_on_details.year
#> [1] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010
y <- occ(query = 'Danaus plexippus', from = 'inat', limit = 10,
inatopts = list(year = 2012))
y$inat$meta$found
#> [1] 478
y$inat$data$Danaus_plexippus$observed_on_details.year
#> [1] 2012 2012 2012 2012 2012 2012 2012 2012 2012 2012 the output format for data from iNat has changed in the new API so the details of drilling down through data is a bit different i think |
If you're interested: code for the inat/gbif combination and cleaning/munging. Not the most elegant, but currently working (still figuring out some bugs on records with really high occurrence numbers). |
nice. Are we all good on this? Anything else on this topic? |
All good - it seems like things are limited by the iNat API, so not much to do about it! |
Hi there,
Interesting issue - code worked about a week ago, but now seems non-functional. Querying inat for a butterfly species with lots of records - gbif query works great, but inat query doesn't work when setting the limit greater than 3000. Console output below
The text was updated successfully, but these errors were encountered: