Question: How to integrate with asyncio flow? #53

datakurre · 2016-04-06T05:43:31Z

Hi,

How could and should ZODB be integrated into asyncio loop based application? What kind of issued would be expected? Is there known existing work for that somewhere?

My naive assumptions are that:

The easiest solution would be a custom asyncio task executor, which would queue ZODB tasks into limited amount of worker threads, matching the size of ZODB connection pool. This probably works OOTB ZODB and with both local ZODB and ZEO.
The next "easiest" solution would be to implement asyncio ZEO client connection. MVCC would probably come an issue.

Thanks a lot for your time and thoughts!

jimfulton · 2016-04-06T10:48:58Z

On Wed, Apr 6, 2016 at 1:43 AM, Asko Soukka notifications@github.com
wrote:

Hi,

How could and should ZODB be integrated into asyncio loop based
application? What kind of issued would be expected? Is there known existing
work for that somewhere?

My naive assumptions are that:

The easiest solution would be a custom asyncio task executor, which
would queue ZODB tasks into limited amount of worker threads, matching the
size of ZODB connection pool. This probably works OOTB ZODB and with both
local ZODB and ZEO.

The next "easiest" solution would be to implement asyncio ZEO client
connection. MVCC would probably come an issue.

Both of these are wrong. :)

You should use a thread pool. Async is good for IO. It isn't good for
code that can block, like almost all application code. The traditional way
to do this is to run application code in separate threads, typically
managed in thread pools. Zope always worked this way. Most async
libraries provide facilities for communicating with separate threads and
some provide thread pool implementations.

Jim

Jim Fulton
http://jimfulton.info

vincentfretin · 2016-04-06T11:00:57Z

@datakurre I don't know if I'm off topic or not, but it may be of interest. aiopyramid patches getitem for traversal to be a coroutine, so I think it's non blocking. https://github.com/housleyjk/aiopyramid/blob/master/aiopyramid/traversal.py

datakurre · 2016-04-06T11:15:08Z

@jimfulton Thanks. Python >= 3.4 AsyncIO does include "thread pool executor" for application code with configurable worker limit. So, simply runnning all ZODB accessing/manipulating code with that is then the obvious easiest solution.

I'm still somewhat curious if ZODB / ZEO connection could be async (isn't database access mostly IO), but possibly it's not worth the effort.

@vincentfretin It's not obvious how aiopyramid's traversal uses ZODB connection (because AFAIK each request should have their own connection for ensure MVCC).

jimfulton · 2016-04-06T11:17:26Z

ZEO does use asynchronous IO to talk to it's server. It currently uses the acyncore standard module, which is Python's oldest async library. It's being updated to use asyncio.

datakurre · 2016-04-06T12:15:55Z

@jimfulton I can see that now. Thanks alot. (I've been a fan of asyncore in ZServer and have been relying on it a lot for the last few years.)

djay · 2016-06-13T06:27:56Z

The usecase I still don't see solved well, at least at the zope level, is where you have an IO blocking call in application code that isn't a ZODB call. For example you use urllib.geturl. The zodb client cache and it's thread is laying idle during that time urllib takes to return. In the case of a largish plone site, that's typically 1GB of RAM.

jimfulton · 2016-06-13T12:11:15Z

@djay I assume your question isn't related to asyncio.

I agree that having a connection tied up while an application is making a request to some external service, which is a common case today, is awkward.

At ZC, we'd sometimes release and re-acquire a database connection while making an external call, but doing this was difficult. To some degree, this is an application issue, as many applications make database connections and transaction so transparent that it may not be clear how to manage them.

If the external call can be queued, then that simplifies things somewhat, however handing off to a queue reliably is difficult and I'm convinced that most application (regardless of whether they use ZODB) do this incorrectly.

In any case, I think it would be helpful to have an easy way to release a connection (committing a transaction, or possibly not) and get a new one later. (Perhaps it would be useful to freeze a transaction, release a connection, and restart the transaction later with a new or maybe a different connection. The savepoint machinery would probably be helpful for this.) I'm not sure what use cases would inform this. This might be an interesting topic to discuss on the ZODB list and/or the ZODB wiki.

datakurre · 2016-06-13T19:00:11Z

@djay A somewhat related thread at ZODB list https://groups.google.com/forum/m/#!topic/zodb/HpwN0RofUYQ

And then our "plone.server" experiment, where we did run multiple concurrent requests on asyncio, each with their own transaction data manager, on a single ZODB connection. We didn't get far enough to see, how bad idea it really was. Probably we still try it again for some special cases like websocket connections. The early sandbox demo should give the best overview https://github.com/plone/plone.server/blob/f11921e78f34aecfb5ab6454341e035ab35d1d3c/sandbox.py

On Zope 2, for that special case of urllib.geturl (or requests lib in real life), I've been using https://pypi.python.org/pypi/collective.futures : it works around the blocking issue by running the same request twice (aborting the transaction at first) and resolving the geturl calls in parallel between those requests outside the blocking worker threads. So, it's a little like "freezing" the transaction -option. Our main use case for that has been embedded views for remote data with JSON APIs.

djay · 2016-06-13T23:28:57Z

We were looking at c.futures to solve this too for now.

Although it would be nicer if our code didn't need callbacks. if zodb
allowed something like using the await syntax with the proviso it would
automatically end the transaction and start a new one on return. Or perhaps
any await raises an exception unless there is a commit directly before it
and nothing changed since.

If we could freeze the transaction during the coroutine call even better
but I'm not sure how you can do that, since you might have changed a lot
before the await and that would likely result in that whole zodb client
cache being held in memory during the await which defeats the purpose right?

On Tue, 14 Jun 2016, 2:00 AM Asko Soukka notifications@github.com wrote:

@djay https://github.com/djay A somewhat related thread at ZODB list
https://groups.google.com/forum/m/#!topic/zodb/HpwN0RofUYQ

And then our "plone.server" experiment, where we did run multiple
concurrent requests on asyncio, each with their own transaction data
manager, on a single ZODB connection. We didn't get far enough to see, how
bad idea it really was. Probably we still try it again for some special
cases like websocket connections. The early sandbox demo should give the
best overview
https://github.com/plone/plone.server/blob/f11921e78f34aecfb5ab6454341e035ab35d1d3c/sandbox.py

On Zope 2, for that special case of urllib.geturl (or requests lib in real
life), I've been using https://pypi.python.org/pypi/collective.futures :
it works around the blocking issue by running the same request twice
(aborting the transaction at first) and resolving the geturl calls in
parallel between those requests outside the blocking worker threads. So,
it's a little like "freezing" the transaction -option. Our main use case
for that has been embedded views for remote data with JSON APIs.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#53 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACi5DUHUBWKC_GVSw7a1qS6hS4rw1w2ks5qLai7gaJpZM4IAwbt
.

datakurre · 2016-06-14T06:19:06Z

@djay As Jim said, use of savepoint machinery for "freezing" transaction before passing the connection to another request could be worth of investigating. It could allow sharing the same connection cache between concurrent request, but also has its cost for performance: with async loop you may constantly switch between different request and that would require the connection cache to be constantly updated and rolled back from the savepoint data of each request (to prevent requests from seeing each others uncommitted changes).

Anyway, it might be better to just design any asyncio-based application so that you could simply use asyncio thread pool executor with limited pool size for all ZODB operations. It would not be ideal, of course, because of the implicit nature of Persistent, no persistent objects could be accessed outside those executor threads (very much like in Zope 2).

djay · 2016-06-14T08:47:09Z

Yes. If you can assume that the transaction ends on each switch in the
event loop it could be more performant because the no memory needs to
changed during the switch.
But perhaps freezing and unfreezing is not as bad as it seems. When
switching transactions you remove all the object from memory that have been
changed. The next transaction might not need any of those objects so they
might just remain ghosted. You can lazily unfreeze the transactions.
The worst case might be where they are both writing to the same set of
objects in which case you are going to get a conflict anyway. Perhaps in
that case there could a conflict error thrown at the await/transaction
switch?
The use case I see this for is when you want to access an external api in
the middle of a transaction and not block the thread/client cache. It only
makes sense to handle a few transactions at once in this case, just enough
time to fill up the wait time for the external api calls. And a good load
balancer is going to spread the load such that you only need to do the
transaction switching in the case where all your processes are already in
use, ie its an overflow. The downside on not doing something like this I've
witnessed in production. 12 cores running at 20% CPU while the site is down
because all zopes threads are sitting idle waiting on a slow external api.
If you are handling websockets you would still likely make every message
handled inside a single transaction anyway so it would just be like normal
Zope so you don't need the freezing.

On Tue, 14 Jun 2016, 1:19 PM Asko Soukka notifications@github.com wrote:

@djay https://github.com/djay As Jim said, the use of savepoint
machinery for "freezing" transaction before passing the connection to
another request could be worth of investigating. It could allow sharing the
same connection cache between concurrent request, but also has its cost for
performance: with async loop you may constantly switch between different
request and that would require the connection cache to be constantly
updated and rolled back from the savepoint data of each request (to prevent
requests from seeing each others uncommitted changes).

Anyway, it might be better to just design any asyncio-based application so
that you could simply use asyncio thread pool executor with limited pool
size for all ZODB operations. It would not be ideal, of course, because of
the implicit nature of Persistent, no persistent objects could be accessed
outside those executor threads (very much like in Zope 2).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#53 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AACi5JWyKThzkEgy6z0VJQdH_EQZquGXks5qLkfbgaJpZM4IAwbt
.

jimfulton closed this as completed Apr 6, 2016

penn5 mentioned this issue Jun 3, 2020

Asyncio support #311

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to integrate with asyncio flow? #53

Question: How to integrate with asyncio flow? #53

datakurre commented Apr 6, 2016

jimfulton commented Apr 6, 2016

The easiest solution would be a custom asyncio task executor, which
would queue ZODB tasks into limited amount of worker threads, matching the
size of ZODB connection pool. This probably works OOTB ZODB and with both
local ZODB and ZEO.

vincentfretin commented Apr 6, 2016

datakurre commented Apr 6, 2016

jimfulton commented Apr 6, 2016

datakurre commented Apr 6, 2016

djay commented Jun 13, 2016

jimfulton commented Jun 13, 2016 •

edited by mgedmin

Loading

datakurre commented Jun 13, 2016

djay commented Jun 13, 2016

datakurre commented Jun 14, 2016 •

edited

Loading

djay commented Jun 14, 2016

Question: How to integrate with asyncio flow? #53

Question: How to integrate with asyncio flow? #53

Comments

datakurre commented Apr 6, 2016

jimfulton commented Apr 6, 2016

The easiest solution would be a custom asyncio task executor, which would queue ZODB tasks into limited amount of worker threads, matching the size of ZODB connection pool. This probably works OOTB ZODB and with both local ZODB and ZEO.

vincentfretin commented Apr 6, 2016

datakurre commented Apr 6, 2016

jimfulton commented Apr 6, 2016

datakurre commented Apr 6, 2016

djay commented Jun 13, 2016

jimfulton commented Jun 13, 2016 • edited by mgedmin Loading

datakurre commented Jun 13, 2016

djay commented Jun 13, 2016

datakurre commented Jun 14, 2016 • edited Loading

djay commented Jun 14, 2016

The easiest solution would be a custom asyncio task executor, which
would queue ZODB tasks into limited amount of worker threads, matching the
size of ZODB connection pool. This probably works OOTB ZODB and with both
local ZODB and ZEO.

jimfulton commented Jun 13, 2016 •

edited by mgedmin

Loading

datakurre commented Jun 14, 2016 •

edited

Loading