Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probe: use netlink to talk to conntrack #3298

Merged
merged 18 commits into from
Nov 20, 2018
Merged

Conversation

bboreham
Copy link
Collaborator

@bboreham bboreham commented Aug 5, 2018

This is an update of #2130 with all necessary* changes upstreamed (typetypetype/conntrack#12, typetypetype/conntrack#13 and typetypetype/conntrack#14).

The objective is to reduce the CPU usage of the Scope probe, because it will no longer be running a conntrack binary which converts netlink into text then parsing that text.

The commits on this branch include the original #2130, rebased, then some changes to make it easier to diff that code against the upstream, then replacing with upstream, then tidying up and ensuring CI passes. Most of that should be squashed before merging, but I like the idea that we might preserve the authorship of @josephglanville who did much of the work.

As of the time of writing I have run some small tests which suggest it broadly works; have not done any big tests or checked performance.

I looked at https://github.com/vishvananda/netlink, but the conntrack code there is rudimentary: to get equal functionality I would basically have to copy most of the implementation from typetypetype/conntrack.

(*) I hope I didn't miss anything!

@bboreham
Copy link
Collaborator Author

bboreham commented Aug 5, 2018

CI is failing on the MacOS (Darwin) build - now we have four files depending on Linux-specific code, and I'm uncertain whether it is better to stub out all of their interfaces or introduce an insulating abstraction.

EDIT: I stubbed it out at a higher level

@josephglanville
Copy link
Contributor

Cool to see this being picked up. 👍

@bboreham bboreham force-pushed the conntrack-netlink-upstream branch 7 times, most recently from 0e54459 to 3153544 Compare August 6, 2018 16:11
@bboreham bboreham force-pushed the conntrack-netlink-upstream branch from 3153544 to 025b0b4 Compare August 27, 2018 21:47
@dholbach dholbach requested a review from 2opremio September 24, 2018 10:03
quit chan struct{}
}

// newConntracker creates and starts a new conntracker.
func newConntrackFlowWalker(useConntrack bool, procRoot string, bufferSize int, args ...string) flowWalker {
func newConntrackFlowWalker(useConntrack bool, procRoot string, bufferSize int, natOnly bool) flowWalker {

This comment was marked as abuse.

This comment was marked as abuse.

@2opremio
Copy link
Contributor

2opremio commented Sep 24, 2018

The objective is to reduce the CPU usage of the Scope probe, because it will no longer be running a conntrack binary which converts netlink into text then parsing that text.

What's the performance improvement? The code is much cleaner like this, but it would also be good to know the performance improvement.

@bboreham
Copy link
Collaborator Author

@2opremio the performance is only slightly better in my trials; I don't think we have a cluster that would really stress this code. But after adding typetypetype/conntrack#15 the whole probe uses 10% less CPU.

@bboreham
Copy link
Collaborator Author

Just noticed a panic on exit:

<probe> INFO: 2018/10/19 17:37:49.797634 === received SIGINT/SIGTERM ===
*** exiting
<probe> INFO: 2018/10/19 17:37:49.797780 Success Kubernetes reflector (pods)
<probe> INFO: 2018/10/19 17:37:49.797822 Success Kubernetes reflector (jobs)
<probe> INFO: 2018/10/19 17:37:49.797836 Success Kubernetes reflector (namespaces)
<probe> INFO: 2018/10/19 17:37:49.797796 Success Kubernetes reflector (daemonsets)
<probe> INFO: 2018/10/19 17:37:49.797852 Success Kubernetes reflector (storageclasses)
<probe> INFO: 2018/10/19 17:37:49.797901 Publish loop for scope.weave.svc.cluster.local exiting
<probe> INFO: 2018/10/19 17:37:49.797913 Success Kubernetes reflector (deployments)
<probe> INFO: 2018/10/19 17:37:49.797991 Success Kubernetes reflector (statefulsets)
<probe> INFO: 2018/10/19 17:37:49.797905 conntrack exiting
<probe> INFO: 2018/10/19 17:37:49.798032 Success Kubernetes reflector (persistentvolumeclaims)
<probe> INFO: 2018/10/19 17:37:49.797936 Success Kubernetes reflector (services)
<probe> INFO: 2018/10/19 17:37:49.798097 Success Kubernetes reflector (persistentvolumes)
<probe> INFO: 2018/10/19 17:37:49.797972 Success Kubernetes reflector (cronjobs)
<probe> INFO: 2018/10/19 17:37:49.798063 Success Kubernetes reflector (nodes)
panic: bad file descriptor
goroutine 127 [running]:
github.com/weaveworks/scope/vendor/github.com/typetypetype/conntrack.FollowSize.func2(0x4a, 0xc4211f0300)
/go/src/github.com/weaveworks/scope/vendor/github.com/typetypetype/conntrack/client.go:168 +0xba
created by github.com/weaveworks/scope/vendor/github.com/typetypetype/conntrack.FollowSize
/go/src/github.com/weaveworks/scope/vendor/github.com/typetypetype/conntrack/client.go:159 +0x106

@bboreham
Copy link
Collaborator Author

I have updated the conntrack package to bring in the perf improvement and a fix to the panic.

@bboreham
Copy link
Collaborator Author

bboreham commented Nov 1, 2018

@2opremio any chance you can give this one more glance over so I can merge?

@bboreham
Copy link
Collaborator Author

bboreham commented Nov 9, 2018

Now I would like to get typetypetype/conntrack#19 merged since it doesn't pick up pre-existing NAT mappings without it.

(it's merged now)

@bboreham bboreham force-pushed the conntrack-netlink-upstream branch 2 times, most recently from 635ee0c to 88d805e Compare November 13, 2018 15:01
- don't need another wrapper round `conntrack.Connections()`
- logPipe() was only for the command-line conntrack
- nobody closes the `event` chan now, so no need to pre-check for quit
When the probe first starts we should only be interested in active
connections, and if the loop re-starts it's probably because too many
connections are opening and closing to keep up with, so it's good to
drop any that are already closed then too.

Refactor the code so `handleFlow` is only called on events, and handle
the initial list of connections directly.
Split Reporter into Linux and non-Linux parts, and stubbed it out for
non-Linux targets.
From revision 1ea26629 to 9d9dd841, to bring in a couple of bug fixes
and some performance improvements.
@bboreham bboreham force-pushed the conntrack-netlink-upstream branch from 503b37c to c732fee Compare November 14, 2018 15:35
@bboreham bboreham merged commit 476ef27 into master Nov 20, 2018
@bboreham bboreham deleted the conntrack-netlink-upstream branch January 13, 2020 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants