-
Notifications
You must be signed in to change notification settings - Fork 37
[DRAFT] Quality Transport #162
Comments
We don't need a new transport for this, we can just expose this information in the via the connection's |
We still need a new transport for the |
We can either add a function to the main transport, better yet, use a type assertion to check for it. |
Yes that what I do for the transport, I'm gonna add Quality to connection stats. |
We did in fact need it the goal is not for apps to get the Quality from their |
Some transport are better to use than other (QUIC > TCP > circuit) but currently swarm doesn't care about he just care the first one to show up.
That why we need a way to tell swarm which one is better.
Note when I say a transport is better I don't talk about the real quality of the transport (how well it was written or how buggy is it) but how a node should want to use it over other transports.
TL;DR
Interface change
Create
QTransport
, same as currentTransport
but Dial returnsQCapableConn
and with aScore
function maybe estimating the Quality of a future connection.Create
QCapableConn
embedCapableConn
but with aQuality() uint32
function.Notation algorithm
Quality()
returns an uint32, lower the better, a connection with a score twice lower is not twice good its just better like a leaderboard.Quality()
must be easy and really fast to calculate and result must never change.Loopback and local network preference must be moved here :
If conn is on a private network
Quality()
must divide score by 2^8 (shift right by 8).If conn is on the loopback
Quality()
must divide score by 2^16 (shift right by 16).Stream migration
Stream migration is not done by swarm but by applications themselves.
To do that on a
network.Conn
object its possible to register anOnBetter(func(time.Time))
, or directly in NewStream to avoid race issue (registering after a better conn was already found).The time in the callback represent when the hard close is gonna happend.
Async Close
When a new connection is found the old one is put in asyncClose mode. In this mode after a short grace time its not possible for new stream to be opened (bidirectional) but already open stream still works. After a grace time of 1 to 2 minutes the conn is shutdown.
Complete:
Interface change
Some new interfaces (
QTransport
temporary name, also true for other name) would be created in go-libp2p-core/transport:QTransport
embed transport except forDial
andListen
wich is same but returns aQCapableConn
(a new interface). And a newScore(ma.Multiaddr, peer.ID) Score
wich returns aScore
struct :Again
QCapableConn
embedCapableConn
but withQuality() uint32
added (EDIT: actualy no golang doesn't allow for circular type matching so a Base object contains shared functions, see the implementation for that libp2p/go-libp2p-core#121).Since go doesn't support shadowing of embedded object there will actualy be a
TransportBase
interface with all function exceptDial
andTransport
andQTransport
will embedTransportBase
. ForQCapableConn
its not a problem because nothing need be shadowed.This is not added in
Transport
so we can easly embed Transport ingo-libp2p.Transport
here (try cast toQTransport
, if not ok do embed).Notation algorithm
Uint32, lower the better, unsigned because a transport with N hops would be able to do something like this :
baseScore + underlying.Quality() * hops
(multiplication with negative value is not a great for that).So where to place a transport on the scale ?
Take a look at this scale (note: a proto with a Quality twice bigger is not twice slower or twice worst, it is just worst, see that like the place on a leaderboard) :
s.Conn().Quality()
(assumes
the stream used to connect to the router) + the number of hops * 8 (that consider that all router are equal but there is really no better than pinging/monitoring to know that) + 2^16 (base circuit value).Its also not needed to follow that closely, thing can be added if a proto add some overhead :
Quality()
result must never change for a single conn, so swarm implementation can safely assume the score to never change.If conn is on a private network
Quality()
must divide score by 2^8 (shift right by 8).If conn is on the loopback
Quality()
must divide score by 2^16 (shift right by 16).The idea is for some transport (such as Bluetooth BLE) to implement this manualy, that avoid creating an exception each time in swarm.
2 transports can have the same score, in this case the first transport to return will be used.
Why not a ping based one ?
I don't think we should implement a tester/monitor on each connection quality and really picking the best one because that would require lot of rewrite of transport themselves (could be done elsewhere but I don't think that a good solution).
Because a monitor would have higher requirement than the subprocess needing the connection, because :
Stream migration
Supporting quality add a problem, we maybe got a stream but that doesn't mean this stream is good and maybe a better came after (this case is gonna happen a lot when QUIC will be pushed as default UDP transport, high speed connection are way more sensitive to ping jitter (+-3ms over 7 is way more than +-10ms over 250) so TCP may return first even he is worst).
Actually migrating stream from one transport to an other would require a new complex abstraction layer so instead application and subsystem will be able to subscribe to an event either on the event bus or a callback
OnBetter(cb func(time.Time))
on thenetwork.Conn
or ascb ...func(time.Time)
inhost.NewStream
(if technicaly that a breaking change tohost
from the compiler point of view, I don't think that a problem because application using the current libp2p will still works).Also the callback is not made with a pointer to the new conn because that avoid creating a stream on a conn that already async closed because multiple transport returned a the same time, just use
NewStream
in the callback.The time in the callback represent when the connection is gonna be hard closed.
I prefer not using the event bus because there is no need, that a case where listeners and emitters of an event all share a common object, listeners obviously through
s.Conn()
and emitters through the map of the swarm storing activeDial.Event could then be only implemented in swarm (and host but only to forward to swarm).
Async close and grace time
When a better transport is found we should stop using the old one but outdated nodes and maybe different implementation will keep using it.
So after swarm swapping of the 2 Conn object he should start an
asyncClose()
basicaly it puts the connection in a state where no more stream can be created (bidirectional) but already open streams continue to works and if a Stream closes it should check the number of open Stream on his Conn and if he was the last one he can really Close the conn.Then 2 grace time will apply, the first one were we really stop accepting new connection (the idea is if a stream opening packet was traveling while we executed asyncClose he had time to arrive) (3 to 5 seconds should be enough) and a second one when we kill the connection even if some stream are staying (maybe we don't want it, I think that should be an option true by default, so users will have the good way by default if they don't want to update their application to be able to reopen stream), also if the other peers is incompatible the connection may be kept open until a real close is issued (and keep accepting stream), but closing if by luck all stream were been closed.
Note:
Again here its important for 2 peers to naturally agree on which transport to use because if not we would have to send a message warning that we put a connection in asyncClose mode, right here its not needed because the other peer will also do the same (due to the transport opening).
Note:
That have already been discussed before and this specificaly is not by me alone, it have been discussed the in the weekly 24/02/2020
The text was updated successfully, but these errors were encountered: