-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement collector and analyser for network namespace connectivity #1670
feat: implement collector and analyser for network namespace connectivity #1670
Conversation
…vity checks if two network namespaces can talk to each other on udp and tcp. its usage is as follows: ```yaml apiVersion: troubleshoot.sh/v1beta2 kind: SupportBundle metadata: name: test spec: hostCollectors: - networkNamespaceConnectivity: collectorName: check-network-connectivity fromCIDR: 10.0.0.0/24 toCIDR: 10.0.1.0/24 hostAnalyzers: - networkNamespaceConnectivity: collectorName: check-network-connectivity outcomes: - pass: message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working" - fail: message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working" ``` if this fails then you may need to enable `forwarding` with: ```bash sysctl -w net.ipv4.ip_forward=1 ``` if it still fails then you may need to configure firewalld to allow the traffic or simply disable it for sake of testing.
check both protocols even if one fails. this pr commit also introduces a timeout that can be set by the user.
allow users to dump the errors found during the analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The collector works well. I was able to perform the tests as described in the PR.
There might be resource leak somewhere. One of the virtual network interfaces is left behind after the collector completes. The screenshot below shows the support-bundle
binary waiting for user input. At this point, collection, redaction and analysis have already taken place. Any resources by collectors should have been deleted. Exiting support-bundle
removes the network interface suggesting that one of the servers is left running in some go routine.
Interesting, I haven't seen this on my tests and in a simple attempt to reproduce it now. Would you mind sharing the YAML you have used so I can try to reproduce this ? |
I used the same spec you have in your description apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: test
spec:
hostCollectors:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
fromCIDR: 10.0.0.0/24
toCIDR: 10.0.1.0/24
hostAnalyzers:
- networkNamespaceConnectivity:
collectorName: check-network-connectivity
outcomes:
- pass:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 is working"
- fail:
message: "Communication between 10.0.0.0/24 and 10.0.1.0/24 isn't working" I ran |
even though the interface pair is deleted everyttime we delete the namespace on my tests we better delete it before we delete the namespace. this comes out of a review comment where some people seem to still be able to see the interface pair even after the namespace is deleted. i.e. better safe than sorry.
I ran this more than a thousand times this morning and I could not reproduce. Tried on kernels For sake of testing I have raised this commit, can you please check if you can still reproduce this ? Please let me know what Kernel you are using. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
After your last commit to fix deletion of the interface things look good now |
Description, Motivation and Context
Important
This feature is only supported when running on Linux, all others platforms return an
unsupported platform
error.Checks if two network namespaces can talk to each other on UDP and TCP. Its usage is as follows:
It is also available on a
HostPreflight
object:The analyzer output can be templated as follows:
Tip
If this fails then you may need to enable
IP forwarding
with:If it still fails then you may need to configure firewalld to allow the traffic or simply disable it for sake of testing.
or
Notes
port
property is optional, if none is provide then8080
is used.Workflow
from
andto
).to
namespace.from
namespace.from
namespace through theveth
interface and into theto
namespace through routing.Optional Configurations
The collector supports the following additional optional configurations:
port
8080
timeout
5s
Message templating
The following templating variables are available when templating the message of a failure outcome:
{{ .ErrorMessage }}
{{ .FromNamespace }}
from
property{{ .ToNamespace }}
to
propertyChecklist
Does this PR introduce a breaking change?