Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching API requests #52

Closed
nleguillarme opened this issue Aug 22, 2019 · 10 comments
Closed

Caching API requests #52

nleguillarme opened this issue Aug 22, 2019 · 10 comments
Milestone

Comments

@nleguillarme
Copy link

Python 3.7
pygbif 0.3.0

In my application, I send a lot of requests to get info from GBIF backbone taxonomy, sometimes with the same parameters. Ultimately, I plan to download the full taxonomic backbone, but in the meantime, I think it would be great to implement some caching mechanism, for instance using requests-cache.

Anyway, great work on this library. Thanks.

@sckott
Copy link
Collaborator

sckott commented Aug 22, 2019

thanks for the issue @nleguillarme

had a look at that pkg. is it more common for pkg maintainers to build in caching with a pkg like requests-cache or to point the user to doing it themselves? I've never done this in Python so not sure what best practice is

@nleguillarme
Copy link
Author

Hi @sckott.

That's a good question and I do not have enough experience to give you an answer.
As I am sending requests to a lot of different APIs in my app, I use two useful tools to make it more robust :

  • Caching to improve performance and avoid hitting rate limits
  • Retry sessions to handle ConnectionError due to unreliable network or rate limits

I think both could be easily integrated in pygbif by the pkg user himself by allowing the use of a custom session (a CachedSession from requests-cache and/or a session with Retry).

@sckott
Copy link
Collaborator

sckott commented Sep 13, 2019

thanks, and sorry for the delay, will try to get something built in soon ...

@sckott sckott added this to the v0.4 milestone Nov 1, 2019
@sckott
Copy link
Collaborator

sckott commented Nov 1, 2019

@nleguillarme reinstall from branch caching [sudo] pip install git+git://github.com/sckott/pygbif.git@caching#egg=pygbif

uses requests-cache library now, see method pygbif.caching - let me know what you think, see docs for it ??pygbif.caching in ipython

@sckott
Copy link
Collaborator

sckott commented Nov 1, 2019

also, curious if any comments from @peterdesmet @stijnvanhoey @damianooldoni on this ☝️

@nleguillarme
Copy link
Author

Hi @sckott. Thank you for your work. Everything seem to work fine. I tested some scripts that get info from GBIF and never reached the request limit. When re-running the scripts, all results are obtained from the cache, which dramatically speed up the process.

@sckott
Copy link
Collaborator

sckott commented Nov 20, 2019

Great, glad it works. I'll submit a new version to pypi soon

@abubelinha
Copy link

abubelinha commented May 1, 2024

  • Retry sessions to handle ConnectionError due to unreliable network or rate limits

Your link helped to solve one of those situations. Thanks @nleguillarme

I take the opportunity to ask:
Apart from the caching system issued here, does pygbif include any builtin timeout/retry control system?
i.e., how can I tell to pygbif to wait up to max 30 seconds, and retry up to 15 times

I see some info about timeout in occurrences/search.py code, but nothing about max_retries

Thanks
@abubelinha

@abubelinha
Copy link

When re-running the scripts, all results are obtained from the cache, which dramatically speed up the process.

I think that happens only if previous script runs finished without error.
If you run a long loop and it is killed by a timeout error or if you kill the script before the loop ends ... then nothing is written to cache. Am I right?
@nleguillarme @sckott Do you have idea of how to write to cache at any time, to avoid that problem to happen?

@sckott
Copy link
Collaborator

sckott commented May 1, 2024

sorry, i can't help. I no longer work on this project - & anyway it's been too long and I dont remember this work - hopefully the current maintainers will pop in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants