Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce item not found #439

Merged
merged 1 commit into from
Feb 20, 2023
Merged

Conversation

llhhbc
Copy link
Contributor

@llhhbc llhhbc commented Jan 29, 2023

Signed-off-by: Longhui li longhui.li@woqutech.com

Description

During the test, we found a lot of NOT_FOUND_EXTERNAL and NOT_FOUND_INTERNAL, which is not conducive to network analysis. Tracking and positioning found that partly due to the exclusion of daemonset, partly due to the fact that when there are multiple network cards, only the ip of one node is analyzed, and other traffic such as cilium_host is not clearly identified.

Related Issue

None

Motivation and Context

How Has This Been Tested?

Yes

@dxsup
Copy link
Member

dxsup commented Feb 6, 2023

Sorry for the late reply and thanks for your contribution.

The issue on DaemonSet is clear and your solution is correct and acceptable. But there are still two lines of code, where the issue stands, that need to be corrected. See lines 156 and 162.

if !ok {
// find the first pod whose network mode is not hostnetwork
for _, info := range portContainerInfo {
if !info.RefPodInfo.isHostNetwork && info.RefPodInfo.WorkloadKind != "daemonset" {
return info, true
}
}
return nil, false
} else {
if !containerInfo.RefPodInfo.isHostNetwork && containerInfo.RefPodInfo.WorkloadKind != "daemonset" {
return containerInfo, true
}
return nil, false
}

For the second issue, could you provide more details about "multiple network cards" to help us validate your solution? How is the node used? Are there any more fields for such nodes? What does the data look like you would like to see? Maybe we should open a new issue to discuss the case. These questions are important for further reference.

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 6, 2023

# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.10.40.25/24 brd 10.10.40.255 scope global em1
       valid_lft forever preferred_lft forever
6: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.10.66.25/24 brd 10.10.66.255 scope global p5p1
       valid_lft forever preferred_lft forever
681: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
685: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 171.0.1.42/32 scope link cilium_host
       valid_lft forever preferred_lft forever

For example, we have the em1 network card as the business network, the p5p1 network card as the cluster network, and the cilium_host network card as the internal network. Since k8s will only report one ip, for example, we correspond to the ip of p5p1. The traffic of other network cards is marked as NOTFOUND, which is not conducive to traffic analysis

@dxsup
Copy link
Member

dxsup commented Feb 7, 2023

Could you please marshal the metadata node.Status.Addresses of this node and attach the result here?

@dxsup
Copy link
Member

dxsup commented Feb 7, 2023

I'm guessing the motivation for why you added other interfaces. If there are mistakes, please feel free to point them out.

First, let me explain what is NOT_FOUND_EXTERNAL and NOT_FOUND_INTERNAL and when they will be assigned to namespace:

  • NOT_FOUND_EXTERNAL: The IP is not the pod IP in the K8s and it is not the internal IP, one of the interfaces the K8s cluster uses, either.
  • NOT_FOUND_INTERNAL: The IP is not the pod IP in the K8s, but it is the internal IP.

So when you see NOT_FOUND_*, it means the IP querying for Pods is actually not a Pod IP. This is unrelated with node addresses.

*_EXTERNAL and *_INTERNAL are related with node addresses. Once we only considered the "InternalIP" in K8s is INTERNAL, while your modification will make all node IPs be INTERNAL. Is this necessary? Could you please explain in which case the current data model is impracticable and why?

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 7, 2023

For example, the kubelet heartbeat check is an http request, and its source address is the ip of cilium_host, not the ip of the cluster network. Traffic that checks the availability of containers like this should also be counted as internal traffic. In addition, enter the cluster from node1, jump to node2 through vxlan, and then go to the target pod on node2. This kind of traffic is captured on node2, and the source IP is also cilium_host. Should it also be classified as internal traffic?

In k8s,node.Status.Addresses Only has cluster ip。

@dxsup dxsup requested a review from NeJan2020 February 7, 2023 07:50
@dxsup
Copy link
Member

dxsup commented Feb 7, 2023

Thanks for your explanation. What is your opinion on this? @NeJan2020

@dxsup
Copy link
Member

dxsup commented Feb 9, 2023

According to the definition of NOT_FOUND_EXTERNAL and NOT_FOUND_INTRENAL, the interfaces that are not as InternalIP are also considered internal. So your idea is acceptable. However, there is still another concern we have to consider.

In this implementation, an agent can only get the interfaces on its node, and it doesn't know the interfaces on other nodes. This results in a situation where the same IP could be considered as having different statuses on different nodes. For example, if you have em1 10.10.40.25 on node A, then it is considered "INTERNAL" on node A, but it is considered "EXTERNAL" on node B because the agent on node B only knows node A has an InternalIP, p5p1 10.10.66.25. I think this case will make data inconsistent on different nodes and make users more confused than before.

Kubernetes API doesn't provide interfaces of nodes. So to fix the issue above, we have to introduce another API to share interfaces between agents. This will increase the complexity and it is not what we want now for Kindling.

@NeJan2020
Copy link
Collaborator

The *_INTERNAL is only used to mark whether the src/dst is on any node in the cluster, so adding ip of all network cards into NodeIp Map looks good.

But the current method can only get the network cards of the node where the agent is located. Would this cause agents on other nodes to mark the IP of cilium_host as *_EXTERNAL?

@dxsup
Copy link
Member

dxsup commented Feb 13, 2023

Are you still working on this PR? We are eager for your response.

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 14, 2023

I'm thinking about this too, I'll think about it first. I need to sort out what scenarios this kind of traffic will appear in

@dxsup
Copy link
Member

dxsup commented Feb 14, 2023

I figure it will take a long time to conduct a conclusion, but your changes on DaemonSet are great. How about we first merge that part and leave the issue behind for further discussion? If you agree, then please reset the changes on node_watch.go and I will merge this PR as soon as possible.

@llhhbc llhhbc force-pushed the reduce_item_not_found branch 2 times, most recently from 6475e1f to fbbeef4 Compare February 17, 2023 01:50
@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 17, 2023

@dxsup commit has been updated.

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 17, 2023

Regarding whether it is necessary to share the network interface ip of machine A with other machines, I thought about the traffic situation:

  1. When the vip is on machine A and the accessed service is on machine B, the traffic will be nat on machine A, including snat and dnat. And snat will become the cilium_host IP of A machine, which is used to return the original way. (Of course, I am talking about the case of using cilium vxlan. If you use bgp, it may be different. I am not familiar with this)

Cross-host traffic should be similar to this. And this will also be related to what kind of network components are used.
In cilium, various traffic types are distinguished:

  1. Host
  2. remote-node
  3. cluste-ip
  4. pod-ip
  5. world( ip not found)

https://docs.cilium.io/en/stable/gettingstarted/terminology/#reserved-labels

I don't quite know how kindling handles the traffic forwarded from machine A to machine B and then to the pod. Is it necessary to do a finer-grained division like cilium?

deploy/scripts/run_docker.sh Outdated Show resolved Hide resolved
Signed-off-by: longhui.li <longhui.li@woqutech.com>
@dxsup dxsup merged commit 1358546 into KindlingProject:main Feb 20, 2023
@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 21, 2023

Regarding whether it is necessary to share the network interface ip of machine A with other machines, I thought about the traffic situation:

  1. When the vip is on machine A and the accessed service is on machine B, the traffic will be nat on machine A, including snat and dnat. And snat will become the cilium_host IP of A machine, which is used to return the original way. (Of course, I am talking about the case of using cilium vxlan. If you use bgp, it may be different. I am not familiar with this)

Cross-host traffic should be similar to this. And this will also be related to what kind of network components are used. In cilium, various traffic types are distinguished:

  1. Host
  2. remote-node
  3. cluste-ip
  4. pod-ip
  5. world( ip not found)

https://docs.cilium.io/en/stable/gettingstarted/terminology/#reserved-labels

I don't quite know how kindling handles the traffic forwarded from machine A to machine B and then to the pod. Is it necessary to do a finer-grained division like cilium?

@dxsup How about this?

@dxsup
Copy link
Member

dxsup commented Feb 22, 2023

Kindling captures the syscalls that transmit messages via sockets to create topologies. These syscalls differ between the client and server sides, and we analyze them to obtain socket information.

  • On the client side, the source IP is of pods, and the destination IP is the one before DNAT. Kindling uses this data to generate kindling_topology_request_total.
  • On the server side, the source IP is the one after SNAT, and the destination IP is the one after DNAT. Kindling uses this data to generate kindling_entity_request_total. In this metric, the source IP is not used unless the option store_external_src_ip is enabled. If it is enabled, when the source is NOT_FOUND_EXTERNAL, it will generate a topology. This is what you saw before.

So in your case:

  • On the client side, a metric kindling_topology_request_total with VIP A(pod IP?) -> service IP is generated.
  • On the server side, the syscalls with cilium_host -> VIP B(pod IP?) are captured and a metric kindling_entity_request_total with VIP B (pod IP?) is generated. If the option store_external_src_ip is enabled, one more metric kindling_topology_request_total with cilium_host -> VIP B(pod IP?) is generated.

Back to your question, a more fine-grained division is good, of course. But unlike Cilium which is a CNI plugin, Kindling doesn't have the networking metadata inherently, so it is hard to identify every type of traffic. One of the obstacles is the issue we talked about earlier.

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 28, 2023

image

The network is indeed complex as you describe it, but tracking tools are also designed to make complex things simple. From the above figure, although different network components have different implementations, the most complicated one is the traffic number 4, and most of the business traffic used in production is in this direction. flow 4 in a
Snat and dnat will be performed on the machine, and only redirection is done, leaving few traces. After arriving at machine b, the traffic analysis will be clearer. Because it is all local traffic after all(But it is not easy to distinguish whether the traffic is redirected from a). As for traffic 1, although the path is the longest, because there is no nat, the analysis will be so much simpler. At present, during our use, it is mostly 4 traffic that is not clear. When using iptables in the past, there were traces in conntrack, but after using bpf, it is very difficult to check. Of course, these are also because we still do not know the principle of cilium not enough. I don't know how you think about packet tracking: https://lpc.events/event/7/contributions/683/attachments/554/979/lpc20-pkt-mark-slides.pdf, the power of opentrace is that it will flow In addition to business integration, I understand that kindling is also for these purposes. If the data is just an isolated island, it is difficult to analyze valuable data. Most of them need to rely on experience to analyze, and the threshold is also very high.
@dxsup

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 28, 2023

This is an idea of ​​mine, I don't know if it can be realized. For different cni plug-ins, can kindling open some interfaces, like kubelet open cni, to implement different packet analysis strategies for different cni, I am happy to improve some packet collection and analysis methods for cilium scenarios

@llhhbc
Copy link
Contributor Author

llhhbc commented Feb 28, 2023

我发现你也是杭州的,直接中文沟通吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants