-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iroh-relay)!: implement authentication #3086
Conversation
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3086/docs/iroh/ Last updated: 2025-01-08T19:43:22Z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like I missed the initial RFC.
Generally very happy with this structure. Would like to ask for some upgrades.
Currently there's no way to manipulate the allow/deny list without restarting. Can we have some sort of resolver mechanism like we had for certs that we can pass in so it can easily be extended? For now the dummy resolver can just use the same fixed construction with Vec<NodeId>
.
FWIW Happy to follow up with another PR to add the above.
There is, just not in the main binary. The expectation is that this would need additonal changes to the binary anyway, so you would pass in a manual version of |
Yes I think one would really quickly want this. |
/// Access is allowed. | ||
Allow, | ||
/// Access is denied. | ||
Deny, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to store a Deny(String)
here so we can answer with a helpful error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, if folks want to log this they already have a callback where they can do so.
If you go to very fancy setups you'd push these kinds of things as config updates to live servers via an internal API without needing restarts. I guess the simple "unixy" way to do this is re-read the config from disk on SIGHUP. But yes, the current way the server does it by callback is flexible enough to cater for all that. (haven't reviewed the PR yet really) |
d06232d
to
0c3289d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you run this with the ActiveRelayActor
? I guess currently it would receive the health message an just trace!("Ignoring {msg:?}");
it. After which a read error would occur because the connection is closed and it would go to exponetial backoff trying to re-connect.
Our exponential backoff maxes out at 5s. So pretty soon it would be settling in a pattern of connecting every 5s forever.
I guess that's ok, but not great. I've always found our backoff strategy rather aggressive, but didn't want to change it without reason. Maybe now is a time to think this through a bit better.
I'm not really sure what to suggest though. Because you also want clients to start working again if the server had a bad config deployed and then reverted. Maybe just up the max timeout in the backoff to something a little longer for now?
iroh-relay/src/main.rs
Outdated
@@ -170,6 +171,49 @@ struct Config { | |||
metrics_bind_addr: Option<SocketAddr>, | |||
/// The capacity of the key cache. | |||
key_cache_capacity: Option<usize>, | |||
/// Access control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we point out that this is only for the relay functionality? STUN, QAD and even the non-upgraded http services will keep working AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I don't think STUN can really be restricted, other than IP blocks. I don't know if there is anything for QAD that could be done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory you can restrict QAD because you have to accept a normal QUIC connection with TLS and all, with ALPN even in our case, before any QAD addresses are sent.
But I didn't mean we have to add this now, only to clarify that this doesn't protect those endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd still be nice to expand on these docs. I know they don't end up anywhere visible because they're private items. But they're the only source we have for user-facing docs of the config format, so I've been treating them as user-facing docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some docs
iroh-relay/src/server.rs
Outdated
if node_id == a_key { | ||
Access::Deny | ||
} else { | ||
Access::Allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not testing this path. I guess it's not really needed from a coverage viewpoint. But it being written with this specific if condition rather than a blanket deny makes it stand out and seem like you only implemented half of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Yeah, I honestly don't know how to solve this in a better fashion, though shouldn't we stop trying to connect, unless this is our home server? |
We stop trying to connect when there has nothing been trying to send for 60s (unless for the home relay as you mention). Indeed a good point to call out. Because that means setting our ceiling much higher is also not useful. |
0c3289d
to
0dbbdb5
Compare
done |
iroh-relay/src/main.rs
Outdated
@@ -170,6 +171,49 @@ struct Config { | |||
metrics_bind_addr: Option<SocketAddr>, | |||
/// The capacity of the key cache. | |||
key_cache_capacity: Option<usize>, | |||
/// Access control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd still be nice to expand on these docs. I know they don't end up anywhere visible because they're private items. But they're the only source we have for user-facing docs of the config format, so I've been treating them as user-facing docs.
" | ||
access.allowlist = [ | ||
\"{node_id}\", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of points out this is a rather whaky format. So I tried:
access = "everyone"
access.allowlist = [
"{node_id}",
]
And well, this happily fails. Which is something at least I guess.
I'm a bit torn. But I guess this is fine to use TOML syntax like this. So if you like this I'm not objecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is what I expected 😅
/// Access is allowed. | ||
Allow, | ||
/// Access is denied. | ||
Deny, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, if folks want to log this they already have a callback where they can do so.
Description
Design RFC: https://www.notion.so/number-zero/Relay-Authentication-16f5df1306fb80ac9e31c1ccb04e026b?pvs=4
Breaking Changes
access
toiroh_relay::server::RelayConfig
Notes & open questions
health
frame in the relay protocol to indicate the authentcation issue. The frame was previously never sent in our code, but is now more explicitly interpreted to disconnect from the server.Should the callback beit is now asyncasync
? If this does more complex things like access a DB we probably want this to beasync
.Change checklist