feat(iroh-relay)!: implement authentication #3086

dignifiedquire · 2025-01-02T19:01:15Z

Description

Design RFC: https://www.notion.so/number-zero/Relay-Authentication-16f5df1306fb80ac9e31c1ccb04e026b?pvs=4

Breaking Changes

added: field access to iroh_relay::server::RelayConfig

Notes & open questions

This reuses the health frame in the relay protocol to indicate the authentcation issue. The frame was previously never sent in our code, but is now more explicitly interpreted to disconnect from the server.
~~Should the callback be async? If this does more complex things like access a DB we probably want this to be async.~~ it is now async

Change checklist

Self-review.
Documentation updates following the style guide, if relevant.
Tests if relevant.
All breaking changes documented.

github-actions · 2025-01-02T19:03:16Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/3086/docs/iroh/

Last updated: 2025-01-08T19:43:22Z

github-actions · 2025-01-02T19:06:41Z

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: b63bd52

Arqu

Seems like I missed the initial RFC.

Generally very happy with this structure. Would like to ask for some upgrades.

Currently there's no way to manipulate the allow/deny list without restarting. Can we have some sort of resolver mechanism like we had for certs that we can pass in so it can easily be extended? For now the dummy resolver can just use the same fixed construction with Vec<NodeId>.

FWIW Happy to follow up with another PR to add the above.

dignifiedquire · 2025-01-03T11:50:09Z

Currently there's no way to manipulate the allow/deny list without restarting.

There is, just not in the main binary. The expectation is that this would need additonal changes to the binary anyway, so you would pass in a manual version of iroh_relay::server::AccessConfig::Restricted callback when doing that, when writing your own.

matheus23 · 2025-01-03T16:50:22Z

Should the callback be async? If this does more complex things like access a DB we probably want this to be async.

Yes I think one would really quickly want this.

matheus23 · 2025-01-03T16:51:15Z

iroh-relay/src/server.rs

+    /// Access is allowed.
+    Allow,
+    /// Access is denied.
+    Deny,


Might be nice to store a Deny(String) here so we can answer with a helpful error message.

I don't think so, if folks want to log this they already have a callback where they can do so.

flub · 2025-01-03T16:54:01Z

Currently there's no way to manipulate the allow/deny list without restarting.

There is, just not in the main binary. The expectation is that this would need additonal changes to the binary anyway, so you would pass in a manual version of iroh_relay::server::AccessConfig::Restricted callback when doing that, when writing your own.

If you go to very fancy setups you'd push these kinds of things as config updates to live servers via an internal API without needing restarts.

I guess the simple "unixy" way to do this is re-read the config from disk on SIGHUP.

But yes, the current way the server does it by callback is flexible enough to cater for all that.

(haven't reviewed the PR yet really)

flub

Have you run this with the ActiveRelayActor? I guess currently it would receive the health message an just trace!("Ignoring {msg:?}"); it. After which a read error would occur because the connection is closed and it would go to exponetial backoff trying to re-connect.

Our exponential backoff maxes out at 5s. So pretty soon it would be settling in a pattern of connecting every 5s forever.

I guess that's ok, but not great. I've always found our backoff strategy rather aggressive, but didn't want to change it without reason. Maybe now is a time to think this through a bit better.

I'm not really sure what to suggest though. Because you also want clients to start working again if the server had a bad config deployed and then reverted. Maybe just up the max timeout in the backoff to something a little longer for now?

iroh-relay/src/main.rs

flub · 2025-01-07T09:59:49Z

iroh-relay/src/main.rs

@@ -170,6 +171,49 @@ struct Config {
    metrics_bind_addr: Option<SocketAddr>,
    /// The capacity of the key cache.
    key_cache_capacity: Option<usize>,
+    /// Access control


Can we point out that this is only for the relay functionality? STUN, QAD and even the non-upgraded http services will keep working AFAIK.

yeah, I don't think STUN can really be restricted, other than IP blocks. I don't know if there is anything for QAD that could be done

In theory you can restrict QAD because you have to accept a normal QUIC connection with TLS and all, with ALPN even in our case, before any QAD addresses are sent.

But I didn't mean we have to add this now, only to clarify that this doesn't protect those endpoints.

It'd still be nice to expand on these docs. I know they don't end up anywhere visible because they're private items. But they're the only source we have for user-facing docs of the config format, so I've been treating them as user-facing docs.

added some docs

flub · 2025-01-07T10:03:59Z

iroh-relay/src/server.rs

+                    if node_id == a_key {
+                        Access::Deny
+                    } else {
+                        Access::Allow


You're not testing this path. I guess it's not really needed from a coverage viewpoint. But it being written with this specific if condition rather than a blanket deny makes it stand out and seem like you only implemented half of the test.

dignifiedquire · 2025-01-07T11:05:46Z

I'm not really sure what to suggest though. Because you also want clients to start working again if the server had a bad config deployed and then reverted. Maybe just up the max timeout in the backoff to something a little longer for now?

Yeah, I honestly don't know how to solve this in a better fashion, though shouldn't we stop trying to connect, unless this is our home server?

flub · 2025-01-07T12:26:34Z

I'm not really sure what to suggest though. Because you also want clients to start working again if the server had a bad config deployed and then reverted. Maybe just up the max timeout in the backoff to something a little longer for now?

Yeah, I honestly don't know how to solve this in a better fashion, though shouldn't we stop trying to connect, unless this is our home server?

We stop trying to connect when there has nothing been trying to send for 60s (unless for the home relay as you mention). Indeed a good point to call out. Because that means setting our ceiling much higher is also not useful.

Design RFC: https://www.notion.so/number-zero/Relay-Authentication-16f5df1306fb80ac9e31c1ccb04e026b?pvs=4

dignifiedquire · 2025-01-08T17:28:50Z

Should the callback be async? If this does more complex things like access a DB we probably want this to be async.

Yes I think one would really quickly want this.

done

flub · 2025-01-08T17:57:40Z

iroh-relay/src/main.rs

@@ -170,6 +171,49 @@ struct Config {
    metrics_bind_addr: Option<SocketAddr>,
    /// The capacity of the key cache.
    key_cache_capacity: Option<usize>,
+    /// Access control


It'd still be nice to expand on these docs. I know they don't end up anywhere visible because they're private items. But they're the only source we have for user-facing docs of the config format, so I've been treating them as user-facing docs.

flub · 2025-01-08T18:06:12Z

iroh-relay/src/main.rs

+            "
+            access.allowlist = [
+              \"{node_id}\",
+            ]


This kind of points out this is a rather whaky format. So I tried:

access = "everyone" access.allowlist = [ "{node_id}", ]

And well, this happily fails. Which is something at least I guess.

I'm a bit torn. But I guess this is fine to use TOML syntax like this. So if you like this I'm not objecting.

it is what I expected 😅

flub · 2025-01-08T18:08:05Z

iroh-relay/src/server.rs

+    /// Access is allowed.
+    Allow,
+    /// Access is denied.
+    Deny,


I don't think so, if folks want to log this they already have a callback where they can do so.

dignifiedquire changed the title ~~feat(iroh-relay): implement authentication~~ 1feat(iroh-relay): implement authentication Jan 2, 2025

dignifiedquire changed the title ~~1feat(iroh-relay): implement authentication~~ feat(iroh-relay)!: implement authentication Jan 2, 2025

Arqu reviewed Jan 3, 2025

View reviewed changes

Arqu approved these changes Jan 3, 2025

View reviewed changes

matheus23 reviewed Jan 3, 2025

View reviewed changes

dignifiedquire force-pushed the feat-relay-auth branch from d06232d to 0c3289d Compare January 3, 2025 17:27

dignifiedquire requested a review from flub January 3, 2025 17:27

flub reviewed Jan 7, 2025

View reviewed changes

dignifiedquire self-assigned this Jan 8, 2025

dignifiedquire added this to the v0.31.0 milestone Jan 8, 2025

dignifiedquire added the c-iroh-relay label Jan 8, 2025

dignifiedquire added 4 commits January 8, 2025 18:14

feat(iroh-relay): implement authentication

b7d0d8f

Design RFC: https://www.notion.so/number-zero/Relay-Authentication-16f5df1306fb80ac9e31c1ccb04e026b?pvs=4

fix default for access config

07f8d70

typing is hard

9a491cf

fixup

0dbbdb5

dignifiedquire force-pushed the feat-relay-auth branch from 0c3289d to 0dbbdb5 Compare January 8, 2025 17:14

apply CR

8f70daf

flub approved these changes Jan 8, 2025

View reviewed changes

CR

01a2b77

dignifiedquire enabled auto-merge January 8, 2025 19:42

dignifiedquire added this pull request to the merge queue Jan 8, 2025

Merged via the queue into main with commit 2c42eff Jan 8, 2025
25 of 26 checks passed

dignifiedquire deleted the feat-relay-auth branch January 8, 2025 20:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(iroh-relay)!: implement authentication #3086

feat(iroh-relay)!: implement authentication #3086

dignifiedquire commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025 •

edited

Loading

Arqu left a comment •

edited

Loading

dignifiedquire commented Jan 3, 2025

matheus23 commented Jan 3, 2025

matheus23 Jan 3, 2025

flub Jan 8, 2025

flub commented Jan 3, 2025

flub left a comment

flub Jan 7, 2025

dignifiedquire Jan 7, 2025

flub Jan 7, 2025

flub Jan 8, 2025

dignifiedquire Jan 8, 2025

flub Jan 7, 2025

dignifiedquire Jan 8, 2025

dignifiedquire commented Jan 7, 2025

flub commented Jan 7, 2025

dignifiedquire commented Jan 8, 2025

flub Jan 8, 2025

flub Jan 8, 2025

dignifiedquire Jan 8, 2025

flub Jan 8, 2025

feat(iroh-relay)!: implement authentication #3086

feat(iroh-relay)!: implement authentication #3086

Conversation

dignifiedquire commented Jan 2, 2025 • edited Loading

Description

Breaking Changes

Notes & open questions

Change checklist

github-actions bot commented Jan 2, 2025 • edited Loading

github-actions bot commented Jan 2, 2025 • edited Loading

Arqu left a comment • edited Loading

Choose a reason for hiding this comment

dignifiedquire commented Jan 3, 2025

matheus23 commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flub commented Jan 3, 2025

flub left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dignifiedquire commented Jan 7, 2025

flub commented Jan 7, 2025

dignifiedquire commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dignifiedquire commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025 •

edited

Loading

github-actions bot commented Jan 2, 2025 •

edited

Loading

Arqu left a comment •

edited

Loading