Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Await and Web3 #856

Closed
nelsonhp opened this issue May 21, 2018 · 9 comments
Closed

Async Await and Web3 #856

nelsonhp opened this issue May 21, 2018 · 9 comments

Comments

@nelsonhp
Copy link

  • Version: 4.2.1
  • Python: 3.6
  • OS: Ubuntu 17.04 64-bit

What is wrong?

Async await syntax with websockets is not readily facilitated. It can be done with external processes, but it gets nasty.

How can it be fixed?

First, move all sub-calls into the middleware stack. (IE calls that calculate gas price, account nonce, etc). These calls break any attempt at a simple hot swap of the manager/provider. I think this is already underway based on comments in code - so thanks in advance.

Second, allow the manager class to be over-ridden during the Web3 class instantiation. By over riding the manager class a MUX, or other mechanism, may be swapped into its place. I realize I can over-ride the manager as an instance attribute, but it just feels better to not hack on the instance attributes if avoidable.

Why is this an important feature?

Ideally, it would be nice to use web3py as a pre-and post processing system with an async await syntax above. IE: I am able to call a method, and get back the raw serialized jsonrpc call at the other end of the pipe, then perform the call against whatever interface I want, and finally pass the resultant back up the return pipe for post processing of the value types. Since the overhead of the pre/post processing is almost nothing, this would allow for a very fast async await system where web3 could be called synchronously from inside an await function.

Why is the existing async system is less than ideal?

The use of threading does bad things in the context of a persistent websocket connection unless ALOT of care is taken. By allowing load balanced websockets that are dedicated to request, and dedicated sockets per filter with pre-defined callbacks, it is much easier to get the multiplexing/callbacks correct. This means more time may be spent building other cool things.

Other thoughts/comments

The recent semi stabilization of the PubSub systems and websockets has enabled my company to get around some of the long standing bugs of the various web3 libraries. For example, by subscribing to newheads, and manually pulling out the logs in the callback, the bugs around the filtering interface may be avoided. This also allows for almost non-stateful server side filtering (added bonus). The only filter that has to be run server side is the newheads filter. Recovering one filter on the server side is much simpler than a whole collection.

In order to work around these issues my company either builds in Rust, or simulates a local node on the loopback and multiplexes the websockets back. It would be great if all of the code we have been developing could be used with web3py without any complex in-the-middle-hacks.

If anyone has any other/better ideas how to do this without Redis, Celery, or other external processes, I would love to start a discussion.

@voith
Copy link
Contributor

voith commented May 21, 2018

I do agree that the WebsocketProvider implementation is far from ideal.
Also, there's a similar issue #657 to have async functionality in web3py.

The use of threading does bad things in the context of a persistent websocket connection unless ALOT of care is taken.

Can you list some issues that you've faced with the current implementation?

@nelsonhp
Copy link
Author

@voith
Honestly, I have not used the current version much. I know about the potential issues from doing the exact same approach in some of our older production code IE:
future = aio.run_coroutine_threadsafe(...)

The problems we saw, that are likely to apply here are:
1: The utility of an async await approach is limited by wrapping the coroutine such that the return value is future.result()
This prevents a true async system from being run. Granted I am performing the call in a different thread, but I still halt my main thread while waiting unless I re-wrap the original call up above.

2: Although there seems to be timeout logic in the toolz, I do not see this in the actual provider class. This means I can halt on a loss of synchronization.

3: If I want to gain back the ability to context switch the main thread while waiting on the response, I am now faced with a potential race condition. If I directly call the send method, without blocking while waiting on the result, there is no guarantee I will recv the correct response. IE If I am running a pubsub filter, or I am running multiple requests, another recv may eat the response. This may be fixed with locks, but this is not ideal. If this is the design paradigm web3py is committed to, may I suggest the use of this syntax instead:

with concurrent.futures.ThreadPoolExecutor(max_workers=XXX) as executor:

This in combination with the python websockets library, that also supports context managers for easy cleanup, creates a connection pooling system so that there is less pile up for the socket. If this method is used, there still must code that enforces the separation of pubsub based filters from polling filters and/or requests.

I can get around many of these problems by running multiple, independent, web3 instances up above and pooling the web3 instances into a/several asyncio.Queue objects ( because it allows blocking await syntax ). I can then pop the queue for a web3py class, or the right to use a class, and then push it back on the queue when done. The downside is this also feels very hacky due to the handling of exceptions before the object is pushed back to the Queue and the extra memory being used for what is functionally identical code.

Building general purpose async await frameworks is HARD. The tools that are in web3py are outstanding for many reasons. I think that providing a means for programmers to gain the benefit of the tooling while still being able to control the connections externally would be a great solution to generalizing the code without the headache of handling every possible implementation style. This allows developers the ability to choose to use the inbuilt connection system, that works for most applications just fine, or do fancy external stuff that just leverages the parsing and command creation system.

@voith
Copy link
Contributor

voith commented May 22, 2018

@nelsonhp Thanks for the detailed write up! I agree with most of what you've written.
However, I'd like to tell you the design considerations that went into the existing implementation.
IMO, web3.py was designed with only synchronous calls in mind. @carver and @pipermerriam can correct me on this If I'm wrong. When WebsocketProvider was implemented, the idea was to get a basic version out which was compatible with the current blocking style of calling. It was also decided that web3.py should have its own eventloop, so that the user wouldn't have to worry about managing one. Also, if a user would use asyncio, we didn't want the websocket calls scheduled in the users thread. To do this we decided to use a thread.

with concurrent.futures.ThreadPoolExecutor(max_workers=XXX) as executor

This is something I wanted to explore and was also advised by @boneyard93501, but the basic version seemed okay enough to merge. I was just waiting for someone to complain before I'd actually explore this. I'm still not very motivated because I'm waiting for someone to actually show me an error reproduced with the current version. If you provide me with a basic test case, then I'll assure you that I'll fix it myself :)

@nelsonhp
Copy link
Author

@voith
I will see if I can create some of the errors we saw in our own production systems in this implementation.
If you would like any of our code that uses the concurrent futures methods, I would be happy to share. I didn't want this to be taken as a knock against web3py. I use this library constantly. I brought this up to see if a general approach could be found that allowed me to control my connections above and required as little effort as possible from the library developers.

@carver
Copy link
Collaborator

carver commented May 22, 2018

I didn't want this to be taken as a knock against web3py. I use this library constantly. I brought this up to see if a general approach could be found that allowed me to control my connections above and required as little effort as possible from the library developers.

It was definitely understood as constructive, and please keep the feedback coming!

We believe that the best end result comes from being nice to the community... and mean to the code. :)

@boneyard93501
Copy link
Contributor

@voith if you want any help for round two, let me know.

@voith
Copy link
Contributor

voith commented May 23, 2018

If you would like any of our code that uses the concurrent futures methods, I would be happy to share.

Please, that'll be very helpful. But I'm not very sure if we can move forward with this until we have a design for #657.

@voith if you want any help for round two, let me know.

@boneyard93501 Thanks, Lets make web3.py better. If you have some ideas on #657 then you can leave some comments on the issue. Since you've decided to work on #832, it'll be good if you could consider using asyncio over threads.

I have a couple of tasks queued up which I'd like to tackle first before returning to this.

@kclowes kclowes added this to the Async Web3 API milestone Aug 7, 2019
@miracle2k
Copy link

miracle2k commented Apr 28, 2021

SQLAlchemy now supports asyncio using an innovative approach that might be worth considering. You can read about it here: https://gist.github.com/zzzeek/2a8d94b03e46b8676a063a32f78140f1

Using it, I can get web3py working using asyncio like this:

    provider = AIOHTTPProvider(current_app.config['ETHEREUM_API_URL'])    
    web3 = Web3(provider=provider)
    
    def do_async_web3():
        print(web3.eth.block_number)
    await green_spawn(do_async_web3)

(see the gist: https://gist.github.com/miracle2k/5a5fdd226310ece48ef22aef6011ccc6)

Note that the use of web3.eth.block_number is written in normal sync-fashion, but below the hood will run in the asyncio event loop. AIOHTTPProvider simply switches out the request making function from requests to aiohttp.

This seems like magic, or like a hack, but I encourage you to take a second look. The trick here is using the eventlet for stack switching, wrapping your sync-library API in a greenlet_spawn, and wrapping your low-level asyncio http calls with greenlet_await().

This is exactly what SQLAlchemy does with their AsyncSession class. The result is that end-users have a familiar await interface, the library code can stay sync as it is, and the actual IO uses the asyncio event loop (no threads are used here).

Maybe this could revitalize the port to asyncio, but it is definitely already an option today for users of this library.

@pacrob
Copy link
Contributor

pacrob commented Feb 4, 2022

Closing, tracking async in Issue #1413

@pacrob pacrob closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants