Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should use aihttp instead of requests (what about wikidata ?) #22

Closed
Tracked by #14
alexgarel opened this issue Aug 25, 2022 · 4 comments · Fixed by #29
Closed
Tracked by #14

We should use aihttp instead of requests (what about wikidata ?) #22

alexgarel opened this issue Aug 25, 2022 · 4 comments · Fixed by #29

Comments

@alexgarel
Copy link
Member

The requests library does not seems to be async framework compatible… it blocks the thread while it should release it for the

This would make our server very unresponsive to a lot of requests.

We have to use either aihttp or requests-future instead (or any popular library for that.

We have the problem with wikidata lib which use urllib.request… we should see if we could monkeypatch the code for that…

@alexgarel
Copy link
Member Author

A simple solution for wikidata is to use https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor

@alexgarel
Copy link
Member Author

@sumit-158 I found a good lib : https://asyncer.tiangolo.com/ (it's 0.0.1 but as explained in the intro this is not a problem)

This will have lots of benefits:

  1. we can easily make sync code async (for wikidata) thanks to the asyncify method (I think for the rest we'd better use aihttp)
  2. we can use the task_group pattern in our main api function which will really speed up things (all knowledge panel will compute in parallel).

@sumit-158
Copy link
Member

sumit-158 commented Sep 16, 2022

@alexgarel I was doing some tests(just for fun!) on the response timing with possible methods and I got this result

process-time of response (The value is from the current main branch le., single facet)

  • Without async: 10.14 sec
  • with async and aiohttp but without multiprocessing(parallelism): 8.71 sec
  • same as above with task-group pattern (parallelism): 5.78 sec

I was also thinking if we can run it with Gunicorn might decrease compute time (I'm not so sure!)

@alexgarel
Copy link
Member Author

@sumit-158 I guess that the limiting factor here is our requests time. And the longest might be the wikidata, which needs two requests.
You could time requests and log them thanks to time.monotonic to know how much time is lost outside of our code.

Passing to gunicorn is about handling a lot of concurrent request, not reducing latency of a single request.

@sumit-158 sumit-158 moved this from 🆕 New to ✅ Done in Knowledge Panels for Facets Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants