-
-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mock network for testing #170
Comments
Some notes: I think the way to do this is to provide a hook for the socket and socketpair constructors, and for getaddrinfo and getnameinfo, and put the bulk of the code in a separate library. (The other constructors like from_stdlib_socket would remain unhooked. This is probably important for subprocess support, as well as being the Right Thing I think.) The getaddrinfo hook should probably only be called after we check the numeric fast path and go through idna encoding, so it would always receive the hostname as a bytestring. (It can also receive the hostname as a numeric IP, iff the service is a symbolic name.) A useful fake network probably needs to include features like:
I'm not sure to what kind of behavior plugin API we want. Something like the memory_stream objects, with callbacks that fire to pump data? An active "switchboard" that gets notified of connection attempts etc and then decides what to do? Just a small set of canned behaviors + for actual interaction you can put up a fake server to talk to with whatever complex behavior you want? (This could even use an extended interface, e.g imagine a version |
It would be neat to have a built in randomized/flakey-network mode; think like a 1 button hypothesis test suite for network protocol annoyances. Possibly this might work like: run the code once to calibrate how much communication happens, and then rerun N times with random delays injected, or if more ambitious and requested, with network failures. One might want to have ways to tune the kind and distribution of random delays (breaking up data by introducing a virtual delay after every byte is a good stress test, but might be too slow for larger tests). Some kind of hypothesis-like tuning towards "weird" patterns would also be good (e.g. adding tons of delays on one connection but not another, or on one direction but not another). It would also be neat if there were a way to write down a pattern of injected faults, to extract repeatable regression test cases from a randomized run, and potentially to allow particular fault patterns or fault pattern generators to be written by hand. |
It's not really an analogue to the stdlib SocketType (which is the raw _socket.SocketType that socket.socket subclasses), and having it be public makes it quite difficult to add fake sockets (see python-triogh-170). Instead, we expose a new function trio.socket.is_trio_socket, and rework the docs accordingly.
I don't want this in the "public" socket interface, because it interferes with python-triogh-170.
This is more prep for python-triogh-170. As of this commit SocketType no longer has any secret-but-quasi-public APIs.
This is more prep for python-triogh-170. As of this commit SocketType no longer has any secret-but-quasi-public APIs.
This is more prep for python-triogh-170. As of this commit SocketType no longer has any secret-but-quasi-public APIs.
See python-triogh-170. Still needs tests.
I added hooks!#253 implements the core hooks needed to enable this. Specifically, I decided to allow hooking the main Design notes on socket hooksAn interesting problem is what to do with the other socket constructors –
If it turns out that being able to hook these "real socket" constructors is useful, then in the future we could add a Design notes on hostname resolver hooksIt just lets you hook Currently, we optimize out the case Next stepSo the next step here is to make a package providing a sweet fake network simulator that hooks into trio. |
See python-triogh-170. Still needs tests.
As a suggestion, add a (Alternatively to requiring registration, call This would allow I've been using Additionally, I've also been playing around with mypy for static type checking; it really only understands |
@dmcooke: Oh huh yeah, that's a good idea. Any interest in putting together a PR? I agree it's not quite 100% obvious what kind of checking we should do for class registration. I guess there are three options (am I missing any?):
Right now we just trust the Implicitly registering classes makes me nervous though, just on vague general principles. I guess a concrete case where it could go wrong would be if, like, someone forgets a It's true that implicitly registering and checking are about the same speed, because the first thing in
Here's another possible design to consider: we could say that class _SocketTypeMeta(type):
def __subclasscheck__(self, class_):
return class_ is _InternalSocketType or current_sfactory.is_socket_class(class_)
def __instancecheck__(self, obj):
return self.__subclasscheck__(obj.__class__)
class SocketType(metaclass=_SocketTypeMeta):
pass I guess the main advantage of this is that it maintains the invariant that only the currently registered Also, it lets me skip worrying about whether the class should get moved to |
On further thought... let's keep it simple and just do: class SocketType:
def __init__(self):
raise TypeError("use trio.socket.socket() to create a SocketType object") and then the way you mark your class as being a trio socket is you just write |
As pointed out in: python-trio#170 (comment) there are advantages to having the "is this a trio socket" check be spelled using 'isinstance'. This commit un-deprecates the trio.socket.SocketType name and makes it an abstract class that concrete socket implementations should inherit from, and then gets rid of is_trio_socket since it's now unnecessary.
Switching from |
This is a really excellent talk on using simulated networks etc. for robustness testing in a real system: https://www.youtube.com/watch?v=4fFDFbi3toc |
Related to this topic, I am trying to mock sockets with mocket (see mindflayer/python-mocket#74 (comment)), but I cannot pass this typecheck : Shouldn't this part also use a |
The reason for the exact type check is that Trio needs the socket that's passed in be a real socket that it can pass into low-level OS syscalls – merely implementing the Python-level socket API isn't enough. For example, if we used However, looking at the mocket issue thread, that's not your problem... that check makes sure that the object passed in matches Lines 614 to 615 in eeafa1e
And that function actually looks up the This is kind of tricky to fix... in theory we could switch to looking up the method on each call, instead of doing it once at the beginning, but that would add extra per-call overhead to some of the most performance-sensitive methods in trio, even for all the people who aren't trying to use mocket. Maybe the difference wouldn't be that bad?
Trio does have a standard, supported API to integrate with mock socket libraries: see |
There is a TLD reserved for testing purposes:
Are you thinking of the RFC-5737 reserved blocks? I don't actually get any errors when I try using them (i.e. Although as you point out, there's no need to do this, and maybe it's better to intercept based on hostname alone. I guess this is a design decision that can be made by anybody who wants to implement a mock network library. Do you know of anybody actively working on such a library? |
@mehaase Yeah, don't read too much into my year-and-a-bit-ago initial thinking-out-loud :-). At that point I hadn't even figured out yet mock networking was something that be built into trio's regular APIs (in which case it would need to be triggered by some kind of magical arguments), or something that you had to turn on. Now we've settled on it being something you turn on, so that discussion is much less relevant.
Yeah, exactly.
Not currently, no! |
Some existing popular mock libraries that might be useful for inspiration (or at least narrowing down use cases):
For these kinds of use cases, it sounds like all you really need is a way to wire up a virtual network, so you can run a virtual web server or something and speak HTTP at it. ( |
Note: as soon as this is working, the next thing people will run into is how to get TLS working over the virtual network. |
Here's a fake network crate for tokio; might have ideas we can steal: https://docs.rs/turmoil/ |
@glyph gave a great talk at PyCon this year that involved using a virtual (= in memory, in python) networking layer to build a virtual server to test a real client.
As far as the virtual networking part goes, we have some of this, e.g. #107 has some pretty solid in-memory implementations of the stream abstraction. But it would be neat to virtualize more of networking, e.g. so in a test I can have tell my real server code to listen on some-server.example.org:12345 and tell my real client code to connect to that and they magically get an in-memory connection between them.
Fixing #159 would reduce the amount of monkeypatching needed to do this, but OTOH I guess monkeypatching the whole
trio.socket
module is probably the simplest and most direct way to do this anyway... or we could hook in at the socket layer (have it check a special flag before allocating a new socket) or at the high-level networking layer (open_tcp_stream
checks a special flag and then returns aFakeSocketStream
etc.). Fundamentally there's going to be some global state because no-one will put up with passing around the whole library interface as an argument everywhere, literally every async library has some kind of contextual/global state they use to solve this problem, and I can't think why it would matter a huge amount whether that'sfrom twisted.internet import reactor
vsasyncio._get_running_loop()
vstrio.socket.socket()
. So I'm leaning towards not worrying about monkeypatching. (The one practical issue I can think of is if someone is trying to use trio in two threads simultaneously, then this will cause some problems because the monkeypatch would be global, not thread-local. Maybe we can make it thread-local somehow? Or maybe we just don't care, because there really isn't any good reason to run your test suite multi-threaded in Python.)Oh, or here's a horrible wonderful idea: embed the fake network into the regular network namespace, so like if you try to bind to
257.1.1.1
orexample.trio-fake-tld
then the regular functions notice and return faked results (we could even encode test parameters into the name, likegetaddrinfo("example.ipv6.trio-fake-tld")
returns fake ipv6 addresses...). Of course this would be a bit of a problem for code that wants to like, use the ipaddress library to parsegetaddrinfo
results. There are the reserved ip address ranges, but that gets dicey because they should give errors in normal use... In practice the solution might be to stick to mostly intercepting things at the hostname level (e.g.open_tcp_stream
doesn't even need to resolve anything when it sees a fake hostname), though we do need to have some answer when the user asks forgetpeername
. I guess we could treat all addresses as regular until someone invokes this functionality with a hostname, at which point some ip addresses become magical.BUT there would also still very much need to be a magic flag to make sure all this is opt-in at the
run
loop level, to make sure it could never be accidentally or maliciously invoked in real code, to avoid potential security bugs. At which point I suppose that magic flag could just make all hostnames/addresses magical. Oh well, I said it was a horrible (wonderful) idea :-). The bit about having hostnames determine host properties might still be a good idea.There's also a big open question about how closely this API should mimic a real network. At the very least it would have to provide the interfaces to do things like set
TCP_NODELAY
(even as a no-op), for compatibility with code made to run on a real network. But there are also more subtle issues, like, should we simulate the large-but-finite buffers that real sockets have? Our existing in-memory stream implementations have either infinite buffering or zero buffering, both of which are often useful for testing, but neither of which is a great match to how networks actually work... and of course there are also all the usual questions about what's kind of API to provide for manipulating the virtual network within a test.I suspect that this is a big enough problem and with enough domain-specific open questions that this should be a separate special-purpose library? Though I guess if we want to hook the regular functions without monkeypatching then there will need to be some core API for that.
Prerequisite: We'll need run- or task-local storage (#2) to store the state of the virtual network.
The text was updated successfully, but these errors were encountered: