Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request API Change: Enable source routing for VIP address on nsm interface #119

Closed
szvincze opened this issue Nov 10, 2021 · 13 comments
Closed
Assignees

Comments

@szvincze
Copy link

Traffic with the VIP address as source should exit the POD via the nsmX interface instead of the default interface of the POD.

Currently the VIP address put on the loopback interface and routed by ip rules:
ip rule add from 20.0.0.1/32 table 1
ip route add table 1 nexthop via 167.0.0.1

@edwarnicke
Copy link
Member

@szvincze thanks for raising this... I figured out what confused me here.

All the examples have as first step mucking with adding things to files to enable the use of names for tables... for some reason my brain went tilt and stopped there.

@szvincze Do you see any reason not to do this for every Dst/SrcIP ?

@szvincze
Copy link
Author

@edwarnicke, thanks for your prompt response.

Since then we created a proof of concept based on your proposal, which seems to be working fine. It means we can add VIP as secondary IP-address to the NSM interface. The same VIP can be assigned to different NSM interfaces by updating the connection, however in this case we cannot avoid source routing.

@szvincze
Copy link
Author

NSMsourceRoutingT4.pdf <- Here I add the presentation material we used during the NSM Community Meeting today. It explains why is another interface needed besides the primary network, and source routing that can be configured via the API.

@edwarnicke
Copy link
Member

@szvincze Thank you for providing the slides! I think we actually have two things here:

  1. The need for Policy Based Routing (particularly do you can distinguish by protocol)
  2. The need for multiple Network Service Mesh interfaces being able to have conflicting IP addresses.

Policy Based Routing (PBR)

Currently, we have Connection.ConnectionContext.IPContext.{Src/Dst}Route which basically provides destination based routing. If the dst_ip for a packet is in a prefix in those routes, its routed out the nsm interface.

We had started out discussing source based routing, where if a src_ip for a packet is in the source route, we map it out the nsm interface.

Now you've brought up wanting to route based on (src_ip, proto)... which gets to a bit more like PBR.

My initial thought would be something like this:

message PolicyRoute {
    string from = 1; /* destination ip address.  This must be an IP that NSM has placed on the nsm interface or empty (in which case it applies to all IPs NSM puts on the interface */
    uint8 proto = 2; /* ip prototol number */
    repeated Route routes = 3; /* list of destination based routes, if empty becomes a default route, but only for things matching both from and proto */
}

In order to preserve the principle of "don't break existing networking in the Pod" you can only provide a PolicyRoute that impacts traffic with a src_ip that matches the nsm interface.

Thoughts?

@edwarnicke
Copy link
Member

@szvincze Bringing us back to

  1. The need for multiple Network Service Mesh interfaces being able to have conflicting IP addresses.

Generally speaking, we should be adding any IPs/routing prefixes for any existing NSM interfaces in a Pod to the exclude prefix list for any new requests for Network Services for that Pod. This is to avoid mutually ignorant Network Services colliding in terms of ip ranges.

Obviously, you have a use case in your slides that for legitimate reasons (shared VIP) does collide.

Currently NSM is very dogmatic about presuming mutual ignorance between Network Services. This is a good default assumption. We have slightly over tweaked it though. That presumption is "All Network Services should be treated as mutually ignorant". Looking at your use case it seems that it should instead be "All Network Services should be treated as mutually ignorant unless we have a credible representation that they are not."

Let's take an example that may be instructive. Say we have a Pod that is connecting to five Network Services: A,B,C,D,E.

  • A is ignorant of B,C,D,E
  • B is ignorant of A,E - but knows about C,D
  • C is ignorant of A,E - but knows about B,D
  • D is ignorant of A,E - but knows about B,C

Clearly, B,C,D may not have ip collisions with A,E. But if B,C,D wish to take responsibility for having IP collisions among themselves... well... caveat emptor :)

So what we need is a clear way of letting the Connection returned as a response from B,C,D indicate that they are in fact mutually aware of each other and so collisions are OK. The simplest first pass idea I have there would be to provide some sort of field in the returned Connection that could contain some sort of uid that they would all have in common, but not with other things.

There are some details to think out here. Would you mind if we opened a separate issue for that discussion?

@szvincze
Copy link
Author

@edwarnicke We had a discussion about your idea described above. Actually this is what we need.
So, besides the previously mentioned source based routing, policy based routing is also required for our use-cases.

One addition is that to keep it generic, PolicyRoute should contain port as well.

@edwarnicke
Copy link
Member

@szvincze Got it. Adding port as well shouldn't be hard.

@denis-tingaikin
Copy link
Member

@edwarnicke , @szvincze Is this ready to go?

@edwarnicke
Copy link
Member

@denis-tingaikin Yes

@sol-0 sol-0 moved this from Todo to In Progress in Release 1.2.0 Dec 21, 2021
@sol-0
Copy link

sol-0 commented Dec 30, 2021

Hi @edwarnicke , @szvincze ,

I would like to ask couple of questions to understand the use-case better.

VIPs

On the slide from this comment I can see couple of VIPs.

  • are they virtual interface IPs between client and endpoints that we create?
  • or if these are not virtual interface IPs, then how and by whom these VIPs should be created?

Setup

Before implementing policy routes in sdk-kernel, I wanted to create a local setup to (sort of) emulate the target use-case.

As per my understanding, a forwarder configures policy routes on a client using PolicyRoute (policy will be configured from a service/endpoint side). And a client will communicate to corresponding VIP depending on the protocol after the configuration.

I'm using exclude-prefixes-client example with 2 services/endpoints and a client (very similar to this, but with modified IPAM - to be able to have multiple endpoints with the same IPs on a client).

I've tried following cases:

  1. 2 endpoints: each having an interface with IP 172.1.16.96; a client: having 2 interfaces with distinct names, but the same IP - 172.1.16.97. In this case there's no way we can add routes on the client - despite having 2 interfaces, only 1 can be seen by the OS.
  2. 2 endpoints have distinct IPs: 172.1.16.97 and 172.1.16.98; a client has 2 interfaces (nsm-1 and nsm-2) with the same IP - 172.1.16.96. In this case we can see 2 routes configured on the client:
172.16.1.97 dev nsm-1
172.16.1.98 dev nsm-2

I assume that "1" is NOT what we want.
But is "2" appropriately emulating the use-case?

I've tried to configure routes on a client using "2" setup:

echo "1 1" >> /etc/iproute2/rt_tables 
echo "2 2" >> /etc/iproute2/rt_tables
ip rule add ipproto tcp dport 8081 table 1
ip rule add ipproto tcp dport 8080 table 2
ip route add default dev nsm-1 table 1
ip route add default dev nsm-2 table 2

This configuration doesn't seem to work. I can see with tcptraceroute that interface nsm-2 is used for both endpoints:

/ # tcptraceroute 172.16.1.97 8081
Selected device nsm-2, address 172.16.1.96, port 51203 for outgoing packets
Tracing the path to 172.16.1.97 on TCP port 8081 (tproxy), 30 hops max
 1  * * *
^C
/ # tcptraceroute 172.16.1.98 8080
Selected device nsm-2, address 172.16.1.96, port 59523 for outgoing packets
Tracing the path to 172.16.1.98 on TCP port 8080 (http-alt), 30 hops max
 1  172.16.1.98 [closed]  0.135 ms  0.105 ms  0.107 ms

@zolug
Copy link
Contributor

zolug commented Jan 5, 2022

Hi @sol-0,

The problem should be caused by tcptraceroute:
If no source address is specified it will try to find a feasible src address. For that it relies on a UDP socket, by connecting the socket to the dst address and a non-zero dst port (53). Then using getsockname to fetch the src address.
https://github.com/mct/tcptraceroute/blob/3772409867b3c5591c50d69f0abacf780c3a555f/datalink.c#L281

Was the src address set tcptraceroute could still miss the proper device as it just loops through the list of interfaces looking for a match:
https://github.com/mct/tcptraceroute/blob/3772409867b3c5591c50d69f0abacf780c3a555f/datalink.c#L318

'ip route get' can be used to verify the route/dev to be picked by the stack:

vm-001 ~ # ip rule add ipproto tcp dport 5555 table 1
vm-001 ~ # ip rule add ipproto tcp dport 6666 table 2
vm-001 ~ # ip r a default dev eth0 table 1
vm-001 ~ # ip r a default dev eth1 table 2
vm-001 ~ # ip r get 1.1.1.1 ipproto tcp dport 5555
1.1.1.1 dev eth0 table 1 src 192.168.0.1 uid 0
cache
vm-001 ~ # ip r get 1.1.1.1 ipproto tcp dport 6666
1.1.1.1 dev eth1 table 2 src 192.168.1.1 uid 0
cache
vm-001 ~ # ip r get 1.1.1.1 from 10.0.0.1 ipproto tcp dport 5555
1.1.1.1 from 10.0.0.1 dev eth0 table 1 uid 0
cache
vm-001 ~ # ip r get 1.1.1.1 from 10.0.0.1 ipproto tcp dport 6666
1.1.1.1 from 10.0.0.1 dev eth1 table 2 uid 0
cache

@tedlean
Copy link

tedlean commented Jan 6, 2022

Hi @sol-0 ,
I might try to verbalize a bit around the VIP address.

VIP addresses are not to be seen as virtual interface IPs between clients (NSCs) and endpoints (NSEs). These, what you could term "NSM tunnel endpoint IP-addresses", are to be assigned as normally.

The VIP address is an alias/secondary IP-address added to the NSM interface on the client. Actually it could as well have been put on the loopback interface, but if I understand it correctly NSM already supports adding more IP-addresses to the client NSM interface, so we might as well utilize this.

The VIP address is a routable address exposed externally to the cluster making it possible to attract and load-balance traffic without NAT'ting the traffic.

For outgoing traffic the VIP address is to be used as source IP. Notice that K8s networking normally will direct all traffic via the default route out the primary interface. In order to sent traffic out through the NSM interface the application should bind to the VIP-address and then by the use of Police Based Routing traffic should be directed out the right NSM interface.

Therefore the ip rule should also include the VIP as source.

ip rule add from [VIP] ipproto tcp dport 8081 table 1

The VIP-addresses are provided by explicit configuration of the NSE. They might originate from a custom resource or from a DNS lookup. But how this is done is outside the scope of NSM.

BR
Leif

@edwarnicke edwarnicke assigned glazychev-art and unassigned sol-0 Jan 25, 2022
@denis-tingaikin denis-tingaikin moved this from Review in progress to Possibly Fixed in Release 1.2.0 Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

7 participants