Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to integrate with asyncio flow? #53

Closed
datakurre opened this issue Apr 6, 2016 · 11 comments
Closed

Question: How to integrate with asyncio flow? #53

datakurre opened this issue Apr 6, 2016 · 11 comments

Comments

@datakurre
Copy link

Hi,

How could and should ZODB be integrated into asyncio loop based application? What kind of issued would be expected? Is there known existing work for that somewhere?

My naive assumptions are that:

  • The easiest solution would be a custom asyncio task executor, which would queue ZODB tasks into limited amount of worker threads, matching the size of ZODB connection pool. This probably works OOTB ZODB and with both local ZODB and ZEO.
  • The next "easiest" solution would be to implement asyncio ZEO client connection. MVCC would probably come an issue.

Thanks a lot for your time and thoughts!

@jimfulton
Copy link
Member

On Wed, Apr 6, 2016 at 1:43 AM, Asko Soukka notifications@github.com
wrote:

Hi,

How could and should ZODB be integrated into asyncio loop based
application? What kind of issued would be expected? Is there known existing
work for that somewhere?

My naive assumptions are that:

The easiest solution would be a custom asyncio task executor, which
would queue ZODB tasks into limited amount of worker threads, matching the
size of ZODB connection pool. This probably works OOTB ZODB and with both
local ZODB and ZEO.

The next "easiest" solution would be to implement asyncio ZEO client
connection. MVCC would probably come an issue.

Both of these are wrong. :)

You should use a thread pool. Async is good for IO. It isn't good for
code that can block, like almost all application code. The traditional way
to do this is to run application code in separate threads, typically
managed in thread pools. Zope always worked this way. Most async
libraries provide facilities for communicating with separate threads and
some provide thread pool implementations.

Jim

Jim Fulton
http://jimfulton.info

@vincentfretin
Copy link
Member

@datakurre I don't know if I'm off topic or not, but it may be of interest. aiopyramid patches getitem for traversal to be a coroutine, so I think it's non blocking. https://github.com/housleyjk/aiopyramid/blob/master/aiopyramid/traversal.py

@datakurre
Copy link
Author

@jimfulton Thanks. Python >= 3.4 AsyncIO does include "thread pool executor" for application code with configurable worker limit. So, simply runnning all ZODB accessing/manipulating code with that is then the obvious easiest solution.

I'm still somewhat curious if ZODB / ZEO connection could be async (isn't database access mostly IO), but possibly it's not worth the effort.

@vincentfretin It's not obvious how aiopyramid's traversal uses ZODB connection (because AFAIK each request should have their own connection for ensure MVCC).

@jimfulton
Copy link
Member

ZEO does use asynchronous IO to talk to it's server. It currently uses the acyncore standard module, which is Python's oldest async library. It's being updated to use asyncio.

@datakurre
Copy link
Author

@jimfulton I can see that now. Thanks alot. (I've been a fan of asyncore in ZServer and have been relying on it a lot for the last few years.)

@djay
Copy link

djay commented Jun 13, 2016

The usecase I still don't see solved well, at least at the zope level, is where you have an IO blocking call in application code that isn't a ZODB call. For example you use urllib.geturl. The zodb client cache and it's thread is laying idle during that time urllib takes to return. In the case of a largish plone site, that's typically 1GB of RAM.

@jimfulton
Copy link
Member

jimfulton commented Jun 13, 2016

@djay I assume your question isn't related to asyncio.

I agree that having a connection tied up while an application is making a request to some external service, which is a common case today, is awkward.

At ZC, we'd sometimes release and re-acquire a database connection while making an external call, but doing this was difficult. To some degree, this is an application issue, as many applications make database connections and transaction so transparent that it may not be clear how to manage them.

If the external call can be queued, then that simplifies things somewhat, however handing off to a queue reliably is difficult and I'm convinced that most application (regardless of whether they use ZODB) do this incorrectly.

In any case, I think it would be helpful to have an easy way to release a connection (committing a transaction, or possibly not) and get a new one later. (Perhaps it would be useful to freeze a transaction, release a connection, and restart the transaction later with a new or maybe a different connection. The savepoint machinery would probably be helpful for this.) I'm not sure what use cases would inform this. This might be an interesting topic to discuss on the ZODB list and/or the ZODB wiki.

@datakurre
Copy link
Author

@djay A somewhat related thread at ZODB list https://groups.google.com/forum/m/#!topic/zodb/HpwN0RofUYQ

And then our "plone.server" experiment, where we did run multiple concurrent requests on asyncio, each with their own transaction data manager, on a single ZODB connection. We didn't get far enough to see, how bad idea it really was. Probably we still try it again for some special cases like websocket connections. The early sandbox demo should give the best overview https://github.com/plone/plone.server/blob/f11921e78f34aecfb5ab6454341e035ab35d1d3c/sandbox.py

On Zope 2, for that special case of urllib.geturl (or requests lib in real life), I've been using https://pypi.python.org/pypi/collective.futures : it works around the blocking issue by running the same request twice (aborting the transaction at first) and resolving the geturl calls in parallel between those requests outside the blocking worker threads. So, it's a little like "freezing" the transaction -option. Our main use case for that has been embedded views for remote data with JSON APIs.

@djay
Copy link

djay commented Jun 13, 2016

We were looking at c.futures to solve this too for now.

Although it would be nicer if our code didn't need callbacks. if zodb
allowed something like using the await syntax with the proviso it would
automatically end the transaction and start a new one on return. Or perhaps
any await raises an exception unless there is a commit directly before it
and nothing changed since.

If we could freeze the transaction during the coroutine call even better
but I'm not sure how you can do that, since you might have changed a lot
before the await and that would likely result in that whole zodb client
cache being held in memory during the await which defeats the purpose right?

On Tue, 14 Jun 2016, 2:00 AM Asko Soukka notifications@github.com wrote:

@djay https://github.com/djay A somewhat related thread at ZODB list
https://groups.google.com/forum/m/#!topic/zodb/HpwN0RofUYQ

And then our "plone.server" experiment, where we did run multiple
concurrent requests on asyncio, each with their own transaction data
manager, on a single ZODB connection. We didn't get far enough to see, how
bad idea it really was. Probably we still try it again for some special
cases like websocket connections. The early sandbox demo should give the
best overview
https://github.com/plone/plone.server/blob/f11921e78f34aecfb5ab6454341e035ab35d1d3c/sandbox.py

On Zope 2, for that special case of urllib.geturl (or requests lib in real
life), I've been using https://pypi.python.org/pypi/collective.futures :
it works around the blocking issue by running the same request twice
(aborting the transaction at first) and resolving the geturl calls in
parallel between those requests outside the blocking worker threads. So,
it's a little like "freezing" the transaction -option. Our main use case
for that has been embedded views for remote data with JSON APIs.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#53 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACi5DUHUBWKC_GVSw7a1qS6hS4rw1w2ks5qLai7gaJpZM4IAwbt
.

@datakurre
Copy link
Author

datakurre commented Jun 14, 2016

@djay As Jim said, use of savepoint machinery for "freezing" transaction before passing the connection to another request could be worth of investigating. It could allow sharing the same connection cache between concurrent request, but also has its cost for performance: with async loop you may constantly switch between different request and that would require the connection cache to be constantly updated and rolled back from the savepoint data of each request (to prevent requests from seeing each others uncommitted changes).

Anyway, it might be better to just design any asyncio-based application so that you could simply use asyncio thread pool executor with limited pool size for all ZODB operations. It would not be ideal, of course, because of the implicit nature of Persistent, no persistent objects could be accessed outside those executor threads (very much like in Zope 2).

@djay
Copy link

djay commented Jun 14, 2016

Yes. If you can assume that the transaction ends on each switch in the
event loop it could be more performant because the no memory needs to
changed during the switch.
But perhaps freezing and unfreezing is not as bad as it seems. When
switching transactions you remove all the object from memory that have been
changed. The next transaction might not need any of those objects so they
might just remain ghosted. You can lazily unfreeze the transactions.
The worst case might be where they are both writing to the same set of
objects in which case you are going to get a conflict anyway. Perhaps in
that case there could a conflict error thrown at the await/transaction
switch?
The use case I see this for is when you want to access an external api in
the middle of a transaction and not block the thread/client cache. It only
makes sense to handle a few transactions at once in this case, just enough
time to fill up the wait time for the external api calls. And a good load
balancer is going to spread the load such that you only need to do the
transaction switching in the case where all your processes are already in
use, ie its an overflow. The downside on not doing something like this I've
witnessed in production. 12 cores running at 20% CPU while the site is down
because all zopes threads are sitting idle waiting on a slow external api.
If you are handling websockets you would still likely make every message
handled inside a single transaction anyway so it would just be like normal
Zope so you don't need the freezing.

On Tue, 14 Jun 2016, 1:19 PM Asko Soukka notifications@github.com wrote:

@djay https://github.com/djay As Jim said, the use of savepoint
machinery for "freezing" transaction before passing the connection to
another request could be worth of investigating. It could allow sharing the
same connection cache between concurrent request, but also has its cost for
performance: with async loop you may constantly switch between different
request and that would require the connection cache to be constantly
updated and rolled back from the savepoint data of each request (to prevent
requests from seeing each others uncommitted changes).

Anyway, it might be better to just design any asyncio-based application so
that you could simply use asyncio thread pool executor with limited pool
size for all ZODB operations. It would not be ideal, of course, because of
the implicit nature of Persistent, no persistent objects could be accessed
outside those executor threads (very much like in Zope 2).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#53 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACi5JWyKThzkEgy6z0VJQdH_EQZquGXks5qLkfbgaJpZM4IAwbt
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants