Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: update NAT address discovery design document #4659

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jdslab
Copy link

@jdslab jdslab commented Dec 2, 2024

Updated design document for NAT support as announced in #4630.

@jiceatscion
Copy link
Contributor

This change is Reviewable

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 145 at r1 (raw file):

--------
During the open-source contributors meeting on Nov. 19, 2024, it was agreed that the STUN/UDP/IP solution is preferred
due to its simplicity. However, arguments about message integrity/authentication have not yet been discussed at that time.

As I remember it, the advantage wasn't simplicity (IP/UDP/SCION/STUN would also be simple) but a general viewpoint that STUN should operate on the underlay because it solves an underlay problem.

Copy link
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jdslab)

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jdslab, @jiceatscion, @JordiSubira, and @oncilla)

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jdslab, @jiceatscion, @JordiSubira, and @oncilla)

Copy link
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion, @JordiSubira, and @oncilla)

Copy link
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jiceatscion, @JordiSubira, and @oncilla)

@marcfrei marcfrei changed the title Update NAT Address Discovery Design Document doc: Update NAT Address Discovery Design Document Dec 11, 2024
@marcfrei marcfrei changed the title doc: Update NAT Address Discovery Design Document doc: update NAT Address Discovery Design Document Dec 11, 2024
@marcfrei marcfrei changed the title doc: update NAT Address Discovery Design Document doc: update NAT Address Discovery design document Dec 11, 2024
@marcfrei marcfrei changed the title doc: update NAT Address Discovery design document doc: update NAT address discovery design document Dec 11, 2024
Copy link
Contributor

@marcfrei marcfrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @jdslab, @jiceatscion, @JordiSubira, @lukedirtwalker, @oncilla, and @tzaeschke)


doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

Also note that the NAT issue only exists for traffic between separate ASes. If both the client and the server are located within
the same AS, the server can simply send returning packets to the underlay source address of the client (which is visible to the server in this case).

Add a note that this is unfortunately only possible for servers that are listening on ports which don't get routed through a SHIM dispatcher.

Code quote:

Also note that the NAT issue only exists for traffic between separate ASes. If both the client and the server are located within
the same AS, the server can simply send returning packets to the underlay source address of the client (which is visible to the server in this case).

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 244 at r3 (raw file):

- AS local traffic:
  For communication within the same AS, the endhost should send returning packets to the underlay source address of the sender
  simply by reversing src/dst addresses of the underlay in addition to reversing the src/dst addresses in the SCION header.

I would probably not mention any of the reversing, the reversing is just obvious and hasn't changed in any way. Instead, I would probably emphasize the "underlay":, e.g. "For communication... to the underlay source address instead of the SCION source address."

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

Previously, marcfrei (Marc Frei) wrote…

Add a note that this is unfortunately only possible for servers that are listening on ports which don't get routed through a SHIM dispatcher.

By the time this goes into a release, there are hopefully no more dispatchers :-)

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 237 at r3 (raw file):

STUN/UDP/IP, STUN/SCION/UDP/IP, and SCMP message extension.
It was agreed that a PR would be created for the STUN/UDP/IP variant.
The STUN specification used is the newer specification (RFC5389).

Insert space: "RFC 5389"'?
And possibly a link?

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

Previously, tzaeschke (Tilmann) wrote…

By the time this goes into a release, there are hopefully no more dispatchers :-)

I think the NAT issue also exists with intra-AS, it is just solved differently. I would rephrase it to something like: "For intra-AS traffic, the NAT problem can be solved be changing the implementation servers such that return packets are always sent to the underlay address instead (which may be a NATed address)of the SCION source address."

Copy link
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @jiceatscion, @JordiSubira, @lukedirtwalker, @marcfrei, and @oncilla)

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @jiceatscion, @JordiSubira, @marcfrei, and @oncilla)

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @JordiSubira, @marcfrei, and @oncilla)


doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

Previously, tzaeschke (Tilmann) wrote…

I think the NAT issue also exists with intra-AS, it is just solved differently. I would rephrase it to something like: "For intra-AS traffic, the NAT problem can be solved be changing the implementation servers such that return packets are always sent to the underlay address instead (which may be a NATed address)of the SCION source address."

I agree that the NAT issue does exist just as well intra-AS: Sunrise has SCION, but I doubt it does me much good with the non-routable address they give me. However, can't we assume that the exact same STUN tactics can be used intra-AS? That is: use the nearest BR as a stun server - on the inside? I know you'll tell me that if the client is behind a symmetric NAT, then it won't work. But that's fine, because if an AS is configured like that internally, then it just violates the SCION protocol entirely: the underlay address is supposed to be a valid endhost address; not some random thing. Am I missing something? I am not too fond of the idea that we can skirt the issue by reversing the underlay header; that assumes that in all cases the underlay header remains associated with the rest of the SCION packet until such time that a response is warranted. It also excludes any kind of NAT-ed SCION server, which sucks exactly as much as it does in the IP world.

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My objection to the intra-AS solution shouldn't be viewed as blocking. Intra-AS NAT support can be introduced later and need not be tied to the inter-As solution.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @JordiSubira, @marcfrei, and @oncilla)

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @JordiSubira, @marcfrei, and @oncilla)

@tzaeschke
Copy link
Contributor

I think the NAT issue also exists with intra-AS, it is just solved differently. I would rephrase it to something like: "For intra-AS traffic, the NAT problem can be solved be changing the implementation servers such that return packets are always sent to the underlay address instead (which may be a NATed address)of the SCION source address."

I agree that the NAT issue does exist just as well intra-AS: Sunrise has SCION, but I doubt it does me much good with the non-routable address they give me. However, can't we assume that the exact same STUN tactics can be used intra-AS? That is: use the nearest BR as a stun server - on the inside? I know you'll tell me that if the client is behind a symmetric NAT, then it won't work. But that's fine, because if an AS is configured like that internally, then it just violates the SCION protocol entirely: the underlay address is supposed to be a valid endhost address; not some random thing. Am I missing something? I am not too fond of the idea that we can skirt the issue by reversing the underlay header; that assumes that in all cases the underlay header remains associated with the rest of the SCION packet until such time that a response is warranted. It also excludes any kind of NAT-ed SCION server, which sucks exactly as much as it does in the IP world.

Symmetric NAT may be one issue, but another one is that an AS may be divided into multiple subnets separated by NATs. That means, different border routers may be seeing different NATed addresses. The problem is then that we need to figure out which border router is in the same subnet as the destination host, and there may not be any border router in that subnet.

The only way I see to avoid using the underlay is having endhosts implement STUN servers, so before I send anything to a destination host, I first STUN them to get my NATed address. Compared to that, it seems simpler to me to use the underlay, but we can discuss this if people disagree?

Copy link
Contributor

@jiceatscion jiceatscion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the possibility of adding stun server to every endpoint crossed my mind. Sounds brutal, but might not be that bad. It'd be basically the same code, just enabled in more places. Yet, I have a hard time imagining what internal network topology would make that necessary. May be it'd be worth showing an example? As I said, we don't really need to solve it right now. It could be a subsequent effort... even if it leads to generalizing the stun server to every end-point, it would still reuse the stun server code from the inter-AS case, so not much would be wasted.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @JordiSubira, @marcfrei, and @oncilla)

@tzaeschke
Copy link
Contributor

tzaeschke commented Dec 20, 2024

@jiceatscion
Okay, so:

  1. Dragging along the underlay address: In JPAN, I currently do the following, when I receive a packet, I attach the inverted path to it. If the packet comes from intra-AS, the path is [], so I store the underlay address in the firstHop field. I don't need an extra attribute. When sending an answer, I simply send to the firstHop underlay, no extra logic required (except for ensuring the correct port range).
  2. I would actually solve intra-AS and inter-AS together, if possible. It would be weird if one works and the other one doesn't. What do others think?
  3. An example for an AS with multiple subnets is the Geant AS, for example Geant Paris and Geant Frankfurt are in the same AS but in different subnets (thanks to @FR4NK-W for providing the example; and yes there isn't really a NAT inbetween them, as I wrote, that may have caused confusion).

Does that make sense?
Should we solve intra/inter-AS separately?

Copy link
Collaborator

@lukedirtwalker lukedirtwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @JordiSubira, @marcfrei, and @oncilla)

Copy link
Contributor

@JordiSubira JordiSubira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @jdslab, @marcfrei, and @oncilla)


doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

Previously, jiceatscion wrote…

I agree that the NAT issue does exist just as well intra-AS: Sunrise has SCION, but I doubt it does me much good with the non-routable address they give me. However, can't we assume that the exact same STUN tactics can be used intra-AS? That is: use the nearest BR as a stun server - on the inside? I know you'll tell me that if the client is behind a symmetric NAT, then it won't work. But that's fine, because if an AS is configured like that internally, then it just violates the SCION protocol entirely: the underlay address is supposed to be a valid endhost address; not some random thing. Am I missing something? I am not too fond of the idea that we can skirt the issue by reversing the underlay header; that assumes that in all cases the underlay header remains associated with the rest of the SCION packet until such time that a response is warranted. It also excludes any kind of NAT-ed SCION server, which sucks exactly as much as it does in the IP world.

I would also add that on the long term, we cannot assume how packets will be dispatched to end hosts. Mainly, if we ever have kernel support, packets may get encapsulated again to a fix port, so that they can be routed internally by intra-AS middleware. In that sense, ideally, this solution should work also in this case.

@tzaeschke
Copy link
Contributor

doc/dev/design/NAT-address-discovery.rst line 69 at r3 (raw file):

use the nearest BR as a stun server

The problem really is, which is the "nearest" STUN server? And in which subnet is it?
Copied from my comment just to keep thing in order (I should have put my comment here in the first place):
An example for an AS with multiple subnets is the Geant AS, for example Geant Paris and Geant Frankfurt are in the
same AS but in different subnets (thanks to @FR4NK-W for providing the example; and yes there isn't really a NAT inbetween them, as I wrote, that may have caused confusion).

Copy link
Contributor

@tzaeschke tzaeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @FR4NK-W, @jdslab, and @oncilla)

@tzaeschke
Copy link
Contributor

@lukedirtwalker @oncilla @jiceatscion @JordiSubira @marcfrei @FR4NK-W

Apologies for opening this up again but we came up with a new idea that we may want to discuss.

The idea is to let border routers rewrite the source address/port (replace it with the underlay src address/port) on packets on the way out. Previously this idea was dismissed because it would not work with SPAO where the SCION SRC/DST addresses are protected fields. If we were to change SPAO to allow this kind of rewriting of SRC addresses and SRC ports, the NAT solution could became a whole lot simpler.

  • We would not need STUN. THis would remove quite some complexity from clients (keep a table of mappings and handle keepalive etc).
  • This would do away with any STUN roudtrips required to detect the external NATed address prior to connection to e border router.
  • This would allow easy implementation of SCION-aware NATs that could simply rewrite the SRC address/port.

SPAO may anyway need some changes because it is unclear how it works with DRKeys, specifically how clients behind a NAT can get distinct DRKeys instead of sharing one key (unless sharing is okay).

We think it may be worth discussing this in more detail, maybe in the upcoming open-source meeting?

Comments?

@jiceatscion
Copy link
Contributor

Apologies for opening this up again but we came up with a new idea that we may want to discuss.

No need. Better take the time to arrive at the best possible solution.

The idea is to let border routers rewrite the source address/port (replace it with the underlay src address/port) on packets on the way out.

That sounds both simple and disturbing. In fact, I found it disturbing for a while that the effect of NAT leaks out of the underlay. Trying to think about it from an abstract networking stack perspective, I wonder if the layering violation isn't already implicit in the SCION forwarding code. We just assume that the link-layer address (since that's what the underlay address is from the SCION network perspective) is contained verbatim in the network address. I guess avoiding any address resolution protocol is a worthy goal, but may be we should accept that there needs to be some form of resolution. In the router, the forwarding to a local host is named "ResolveLocalDst()". so, I'm not the first one to think that some non-trivial "resolution" could be in order. For example we could keep a map that we populate from packets received from local sources. The map would not need to store the common case identity mapping. I can take my answer from next Tuesday's discussion.

We think it may be worth discussing this in more detail, maybe in the upcoming open-source meeting?

Agree. Consider it added to the agenda.

@shitz
Copy link
Contributor

shitz commented Jan 23, 2025

From our perspective, rewriting the SCION host address is not desirable. Having a stable address is hugely beneficial for multiple reasons:

  1. It allows for source authentication. As you already mentioned, SPAO and DRKey rely on stable source addresses. Introducing a different stable identifier for this purpose introduces all kinds of complexity.
  2. At Anapaya, we are currently building a system that enables SCION ASes to authorize endpoints in their AS to use the local SCION infrastructure. Furthermore, this system also does usage accounting to enable monetization of "SCION subscribers". We believe that such an infrastructure is essential for the widespread adoption and deployment of SCION. Commercial ISPs will only deploy SCION if they can monetize it and today this is only possible on the level of SCION links. This system also relies on relatively stable SCION source addresses otherwise it would need to keep much more state for accounting purposes. Specifically we intend that clients resolve the underlay address with STUN and then register themselves at the router (or SCION network access point - SNAP) with a SCION address and the resolved underlay address.
  3. NAT has been a source of many issues for all kind of end-to-end properties in the past. If we can avoid poluting the SCION address space with IP NATs I think we should do so even if it comes at the cost of some complexity on the underlay.

For those reasons, we are still advocating for the STUN-based approach.

N.B., once it is ready, we will also create a proposal for this SCION endhost API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants