Reduce item not found #439

llhhbc · 2023-01-29T03:42:03Z

Signed-off-by: Longhui li longhui.li@woqutech.com

Description

During the test, we found a lot of NOT_FOUND_EXTERNAL and NOT_FOUND_INTERNAL, which is not conducive to network analysis. Tracking and positioning found that partly due to the exclusion of daemonset, partly due to the fact that when there are multiple network cards, only the ip of one node is analyzed, and other traffic such as cilium_host is not clearly identified.

Related Issue

None

Motivation and Context

How Has This Been Tested?

Yes

dxsup · 2023-02-06T10:05:34Z

Sorry for the late reply and thanks for your contribution.

The issue on DaemonSet is clear and your solution is correct and acceptable. But there are still two lines of code, where the issue stands, that need to be corrected. See lines 156 and 162.

kindling/collector/pkg/metadata/kubernetes/k8scache.go

Lines 153 to 166 in be46fd6

    
           if !ok { 
        
           	// find the first pod whose network mode is not hostnetwork 
        
           	for _, info := range portContainerInfo { 
        
           		if !info.RefPodInfo.isHostNetwork && info.RefPodInfo.WorkloadKind != "daemonset" { 
        
           			return info, true 
        
           		} 
        
           	} 
        
           	return nil, false 
        
           } else { 
        
           	if !containerInfo.RefPodInfo.isHostNetwork && containerInfo.RefPodInfo.WorkloadKind != "daemonset" { 
        
           		return containerInfo, true 
        
           	} 
        
           	return nil, false 
        
           }

For the second issue, could you provide more details about "multiple network cards" to help us validate your solution? How is the node used? Are there any more fields for such nodes? What does the data look like you would like to see? Maybe we should open a new issue to discuss the case. These questions are important for further reference.

llhhbc · 2023-02-06T13:17:59Z

# ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.10.40.25/24 brd 10.10.40.255 scope global em1
       valid_lft forever preferred_lft forever
6: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    inet 10.10.66.25/24 brd 10.10.66.255 scope global p5p1
       valid_lft forever preferred_lft forever
681: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
685: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 171.0.1.42/32 scope link cilium_host
       valid_lft forever preferred_lft forever

For example, we have the em1 network card as the business network, the p5p1 network card as the cluster network, and the cilium_host network card as the internal network. Since k8s will only report one ip, for example, we correspond to the ip of p5p1. The traffic of other network cards is marked as NOTFOUND, which is not conducive to traffic analysis

dxsup · 2023-02-07T02:56:57Z

Could you please marshal the metadata node.Status.Addresses of this node and attach the result here?

dxsup · 2023-02-07T04:00:51Z

I'm guessing the motivation for why you added other interfaces. If there are mistakes, please feel free to point them out.

First, let me explain what is NOT_FOUND_EXTERNAL and NOT_FOUND_INTERNAL and when they will be assigned to namespace:

NOT_FOUND_EXTERNAL: The IP is not the pod IP in the K8s and it is not the internal IP, one of the interfaces the K8s cluster uses, either.
NOT_FOUND_INTERNAL: The IP is not the pod IP in the K8s, but it is the internal IP.

So when you see NOT_FOUND_*, it means the IP querying for Pods is actually not a Pod IP. This is unrelated with node addresses.

*_EXTERNAL and *_INTERNAL are related with node addresses. Once we only considered the "InternalIP" in K8s is INTERNAL, while your modification will make all node IPs be INTERNAL. Is this necessary? Could you please explain in which case the current data model is impracticable and why?

llhhbc · 2023-02-07T07:39:28Z

For example, the kubelet heartbeat check is an http request, and its source address is the ip of cilium_host, not the ip of the cluster network. Traffic that checks the availability of containers like this should also be counted as internal traffic. In addition, enter the cluster from node1, jump to node2 through vxlan, and then go to the target pod on node2. This kind of traffic is captured on node2, and the source IP is also cilium_host. Should it also be classified as internal traffic?

In k8s，node.Status.Addresses Only has cluster ip。

dxsup · 2023-02-07T08:33:37Z

Thanks for your explanation. What is your opinion on this? @NeJan2020

dxsup · 2023-02-09T08:01:00Z

According to the definition of NOT_FOUND_EXTERNAL and NOT_FOUND_INTRENAL, the interfaces that are not as InternalIP are also considered internal. So your idea is acceptable. However, there is still another concern we have to consider.

In this implementation, an agent can only get the interfaces on its node, and it doesn't know the interfaces on other nodes. This results in a situation where the same IP could be considered as having different statuses on different nodes. For example, if you have em1 10.10.40.25 on node A, then it is considered "INTERNAL" on node A, but it is considered "EXTERNAL" on node B because the agent on node B only knows node A has an InternalIP, p5p1 10.10.66.25. I think this case will make data inconsistent on different nodes and make users more confused than before.

Kubernetes API doesn't provide interfaces of nodes. So to fix the issue above, we have to introduce another API to share interfaces between agents. This will increase the complexity and it is not what we want now for Kindling.

NeJan2020 · 2023-02-09T08:29:16Z

The *_INTERNAL is only used to mark whether the src/dst is on any node in the cluster, so adding ip of all network cards into NodeIp Map looks good.

But the current method can only get the network cards of the node where the agent is located. Would this cause agents on other nodes to mark the IP of cilium_host as *_EXTERNAL?

dxsup · 2023-02-13T08:38:27Z

Are you still working on this PR? We are eager for your response.

llhhbc · 2023-02-14T07:02:27Z

I'm thinking about this too, I'll think about it first. I need to sort out what scenarios this kind of traffic will appear in

dxsup · 2023-02-14T07:10:27Z

I figure it will take a long time to conduct a conclusion, but your changes on DaemonSet are great. How about we first merge that part and leave the issue behind for further discussion? If you agree, then please reset the changes on node_watch.go and I will merge this PR as soon as possible.

llhhbc · 2023-02-17T01:50:47Z

@dxsup commit has been updated.

llhhbc · 2023-02-17T02:16:01Z

Regarding whether it is necessary to share the network interface ip of machine A with other machines, I thought about the traffic situation:

When the vip is on machine A and the accessed service is on machine B, the traffic will be nat on machine A, including snat and dnat. And snat will become the cilium_host IP of A machine, which is used to return the original way. (Of course, I am talking about the case of using cilium vxlan. If you use bgp, it may be different. I am not familiar with this)

Cross-host traffic should be similar to this. And this will also be related to what kind of network components are used.
In cilium, various traffic types are distinguished:

Host
remote-node
cluste-ip
pod-ip
world( ip not found)

https://docs.cilium.io/en/stable/gettingstarted/terminology/#reserved-labels

I don't quite know how kindling handles the traffic forwarded from machine A to machine B and then to the pod. Is it necessary to do a finer-grained division like cilium?

deploy/scripts/run_docker.sh

Signed-off-by: longhui.li <longhui.li@woqutech.com>

llhhbc · 2023-02-21T14:23:05Z

Regarding whether it is necessary to share the network interface ip of machine A with other machines, I thought about the traffic situation:

When the vip is on machine A and the accessed service is on machine B, the traffic will be nat on machine A, including snat and dnat. And snat will become the cilium_host IP of A machine, which is used to return the original way. (Of course, I am talking about the case of using cilium vxlan. If you use bgp, it may be different. I am not familiar with this)

Cross-host traffic should be similar to this. And this will also be related to what kind of network components are used. In cilium, various traffic types are distinguished:

Host

remote-node

cluste-ip

pod-ip

world( ip not found)

https://docs.cilium.io/en/stable/gettingstarted/terminology/#reserved-labels

I don't quite know how kindling handles the traffic forwarded from machine A to machine B and then to the pod. Is it necessary to do a finer-grained division like cilium?

@dxsup How about this?

dxsup · 2023-02-22T03:07:18Z

Kindling captures the syscalls that transmit messages via sockets to create topologies. These syscalls differ between the client and server sides, and we analyze them to obtain socket information.

On the client side, the source IP is of pods, and the destination IP is the one before DNAT. Kindling uses this data to generate kindling_topology_request_total.
On the server side, the source IP is the one after SNAT, and the destination IP is the one after DNAT. Kindling uses this data to generate kindling_entity_request_total. In this metric, the source IP is not used unless the option store_external_src_ip is enabled. If it is enabled, when the source is NOT_FOUND_EXTERNAL, it will generate a topology. This is what you saw before.

So in your case:

On the client side, a metric kindling_topology_request_total with VIP A(pod IP?) -> service IP is generated.
On the server side, the syscalls with cilium_host -> VIP B(pod IP?) are captured and a metric kindling_entity_request_total with VIP B (pod IP?) is generated. If the option store_external_src_ip is enabled, one more metric kindling_topology_request_total with cilium_host -> VIP B(pod IP?) is generated.

Back to your question, a more fine-grained division is good, of course. But unlike Cilium which is a CNI plugin, Kindling doesn't have the networking metadata inherently, so it is hard to identify every type of traffic. One of the obstacles is the issue we talked about earlier.

llhhbc · 2023-02-28T13:43:28Z

The network is indeed complex as you describe it, but tracking tools are also designed to make complex things simple. From the above figure, although different network components have different implementations, the most complicated one is the traffic number 4, and most of the business traffic used in production is in this direction. flow 4 in a
Snat and dnat will be performed on the machine, and only redirection is done, leaving few traces. After arriving at machine b, the traffic analysis will be clearer. Because it is all local traffic after all(But it is not easy to distinguish whether the traffic is redirected from a). As for traffic 1, although the path is the longest, because there is no nat, the analysis will be so much simpler. At present, during our use, it is mostly 4 traffic that is not clear. When using iptables in the past, there were traces in conntrack, but after using bpf, it is very difficult to check. Of course, these are also because we still do not know the principle of cilium not enough. I don't know how you think about packet tracking: https://lpc.events/event/7/contributions/683/attachments/554/979/lpc20-pkt-mark-slides.pdf, the power of opentrace is that it will flow In addition to business integration, I understand that kindling is also for these purposes. If the data is just an isolated island, it is difficult to analyze valuable data. Most of them need to rely on experience to analyze, and the threshold is also very high.
@dxsup

llhhbc · 2023-02-28T14:07:40Z

This is an idea of mine, I don't know if it can be realized. For different cni plug-ins, can kindling open some interfaces, like kubelet open cni, to implement different packet analysis strategies for different cni, I am happy to improve some packet collection and analysis methods for cilium scenarios

llhhbc · 2023-02-28T14:09:25Z

我发现你也是杭州的，直接中文沟通吧

dxsup self-requested a review January 30, 2023 03:28

llhhbc force-pushed the reduce_item_not_found branch from 5710183 to cfec99e Compare February 1, 2023 09:52

llhhbc force-pushed the reduce_item_not_found branch from cfec99e to c059d72 Compare February 6, 2023 13:24

dxsup requested a review from NeJan2020 February 7, 2023 07:50

llhhbc force-pushed the reduce_item_not_found branch 2 times, most recently from 6475e1f to fbbeef4 Compare February 17, 2023 01:50

dxsup approved these changes Feb 17, 2023

View reviewed changes

NeJan2020 reviewed Feb 17, 2023

View reviewed changes

deploy/scripts/run_docker.sh Outdated Show resolved Hide resolved

dxsup requested changes Feb 17, 2023

View reviewed changes

deploy/scripts/run_docker.sh Outdated Show resolved Hide resolved

Reduce item not found

04eb1fd

Signed-off-by: longhui.li <longhui.li@woqutech.com>

llhhbc force-pushed the reduce_item_not_found branch from fbbeef4 to 04eb1fd Compare February 18, 2023 13:30

llhhbc requested a review from dxsup February 18, 2023 13:30

dxsup approved these changes Feb 20, 2023

View reviewed changes

dxsup merged commit 1358546 into KindlingProject:main Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce item not found #439

Reduce item not found #439

llhhbc commented Jan 29, 2023

dxsup commented Feb 6, 2023 •

edited

Loading

llhhbc commented Feb 6, 2023 •

edited

Loading

dxsup commented Feb 7, 2023

dxsup commented Feb 7, 2023

llhhbc commented Feb 7, 2023

dxsup commented Feb 7, 2023

dxsup commented Feb 9, 2023

NeJan2020 commented Feb 9, 2023

dxsup commented Feb 13, 2023

llhhbc commented Feb 14, 2023

dxsup commented Feb 14, 2023

llhhbc commented Feb 17, 2023

llhhbc commented Feb 17, 2023

llhhbc commented Feb 21, 2023

dxsup commented Feb 22, 2023 •

edited

Loading

llhhbc commented Feb 28, 2023

llhhbc commented Feb 28, 2023

llhhbc commented Feb 28, 2023

Reduce item not found #439

Reduce item not found #439

Conversation

llhhbc commented Jan 29, 2023

Description

Related Issue

Motivation and Context

How Has This Been Tested?

dxsup commented Feb 6, 2023 • edited Loading

llhhbc commented Feb 6, 2023 • edited Loading

dxsup commented Feb 7, 2023

dxsup commented Feb 7, 2023

llhhbc commented Feb 7, 2023

dxsup commented Feb 7, 2023

dxsup commented Feb 9, 2023

NeJan2020 commented Feb 9, 2023

dxsup commented Feb 13, 2023

llhhbc commented Feb 14, 2023

dxsup commented Feb 14, 2023

llhhbc commented Feb 17, 2023

llhhbc commented Feb 17, 2023

llhhbc commented Feb 21, 2023

dxsup commented Feb 22, 2023 • edited Loading

llhhbc commented Feb 28, 2023

llhhbc commented Feb 28, 2023

llhhbc commented Feb 28, 2023

dxsup commented Feb 6, 2023 •

edited

Loading

llhhbc commented Feb 6, 2023 •

edited

Loading

dxsup commented Feb 22, 2023 •

edited

Loading