-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate Iptables Rules with Dynamic ASG's #101
Conversation
src/code.cloudfoundry.org/cni-wrapper-plugin/netrules/rule_converter.go
Outdated
Show resolved
Hide resolved
src/code.cloudfoundry.org/cni-wrapper-plugin/netrules/rule_converter_test.go
Outdated
Show resolved
Hide resolved
src/code.cloudfoundry.org/cni-wrapper-plugin/netrules/rule_converter_test.go
Outdated
Show resolved
Hide resolved
src/code.cloudfoundry.org/cni-wrapper-plugin/netrules/rule_converter_test.go
Outdated
Show resolved
Hide resolved
@@ -59,6 +59,22 @@ func (c *RuleConverter) BulkConvert(ruleSpec []Rule, logChainName string, global | |||
return iptablesRules | |||
} | |||
|
|||
func (c *RuleConverter) DeduplicateRules(iptablesRules []rules.IPTablesRule) []rules.IPTablesRule { | |||
keys := make(map[string]bool) | |||
deduped_rules := []rules.IPTablesRule{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we creating a copy of the rules set here? What would be the memory impact, we have seen cases with millions of rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a map, in which we use the rules as keys and the value is true or false. If on the current iteration of the rules list we have a rule which is not part of the keys map, we add it to keys and set the value to true. This way if a duplicate of that rule is present, we will skip adding it to the dedupedRules, since the value of that rule is set to true. As to the memory concern, this way of removing duplicates was the one I saw used the most in go. Honestly I do not know how will this impact performance. Open to suggestions here
In general, I am okay with this fix, but I am worried about performance implications. ❓ At the WG meeting someone mentioned that SAP has ~1 million iptables rules per cell. How long does this calculation take for all of those rules? ❓ Did you investigate a way to de-dupe the policies closer to the "root" in the internal-policy-server? Idk if that would be any more performant, but it would be closer to the root issue. |
@ameowlia , currently we have implemented a python script and a custom bosh-release, which replaces We also checked if we can do it closer to the problem. The reason we did not do it there is because it works this way for Static ASGs. But there - the binary that is used to apply them already does deduplication. This is why - in order to make the smallest change and not change other logic, we decided to do the same - as close as posible to the actual call to the binary. Doing the deduplication on an earlier phase, will change a bit the semantic for both cases. And perhaps would also come with a set of different side-effects that we need to analyze. |
Hello @maxmoehl, The benchmark performs the de-duplication with input of 10000 rules. ~7000 of them are duplicates. Results from benchmark with initial format with 1s benchtime:
Results with optimised format with 1s benchtime:
There is an improvement of ~0,4ms. Not sure how significant it is though. So the deduplication does not significantly affect the overall execution time for https://github.com/klapkov/silk-release/blob/develop/src/code.cloudfoundry.org/cni-wrapper-plugin/netrules/netout_chain.go#L57 But greatly improves overall performance by removing these duplicate rules that are not needed. |
@ameowlia i will put the numbers above a bit in perspective, because it does not become evident from the example The proposal from Max above for reusing the sting variable, is ratheer a misunderstanding, as this is just assignment. The new string is created as a result of the join operation. The other proposal, to pre-allocate the array with the maximum length of resulting strings in fact decreases the performance (e.g. 1-2%), because most of the time we are talking about 10x+ reduction of the size of the rules. A solution that we considered, but consciously did not implement, because we did not need improvement from 1 ms to 10ns - is to implement Equals and Hashcode methods to type In general memory management was a problem in Java's infancy which to a large extent has been resolved since Java 8 (almost 10 years ago). Go allocates local variables on the stack, and thus is much more efficient that any version of Java. And again i would like to put this into perspective, how the problem arises, and what are the numbers we've observed (knowing that deduplicating 6 mil rules will take 1 sec on an old Mac core) More than 100k iptable rules on a Diego Cell results in measurable decrease of creating containers. Up until 500k it is bearable, with times from 20+ seconds to create a container. (When Dynamic ASGs) are used. Above this - all iptables operations become too slow, and we were not able to enable Dynamic ASGs on those landscapes. The problem stems from the fact, that with Static ASGs, duplicates were removed by the binary, so the iptable rules never became more than 20-30k in our case. With dynamic ASGs, no deduplication was done by the binary, and on some landscapes, due to special stakeholder usage pattern (Using Azure DBs) we realized that the old binaries were removing a lot of rules. So this resulted in explosion of the number of rules, e.g. from 20-30-50k to 500-1000-2000k We made a fix last year, that significantly reduced the time, and this allowed us to enable Dynamic ASGs, on the landscapes with <500k rules. But not the larger ones. On the affected diego-cells, we have up to 100 containers, each of them having 10-20k rules. As soon as the total rules on the VM cross 500k, adding new ones becomes exponentially slower, taking 10s of seconds. So essentially, we do not care about the performance of deduplicating millions of rules. What we care is that the total number of rules does not rise, so that iptables stays performant. With the current proposal we achieve this without performance overhead (< 1ms per LRP) Assuming that our usecase is special - many duplicates, etc- so what would happen in another case, where there are indeed LRPs with millions of rules -> Well they would have never worked in first place, as also as Static ASGs they would have broken iptables. So indeed the scenario that we are talking about is - low number of rules with Static ASGs and explosion with Dynamic ASGs -> that is - many duplicates in one or many LRPs that eventually add up. |
Thanks @ameowlia! |
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722 [DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876 [BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80 [BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79 [BBS] Add routing info endpoint: cloudfoundry/bbs#66 [BBS] Remove cpu_weight limit: cloudfoundry/bbs#81 [EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91 [LOCKET] Add a keepalive timeout on the locket client: cloudfoundry/locket#12 [GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155 [ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23 [ROUTE-EMITTER] Use routing_info for desired_lrp's when there are missing actual_lrp's: cloudfoundry/route-emitter#26 [SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101 [SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable: cloudfoundry/silk-release#111
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722 [DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876 [BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80 [BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79 [BBS] Add routing info endpoint: cloudfoundry/bbs#66 [BBS] Remove cpu_weight limit: cloudfoundry/bbs#81 [EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91 [LOCKET] Add a keepalive timeout on the locket client: cloudfoundry/locket#12 [GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155 [ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23 [ROUTE-EMITTER] Use routing_info for desired_lrp's when there are missing actual_lrp's: cloudfoundry/route-emitter#26 [SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101 [SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable: cloudfoundry/silk-release#111 wip
[DIEGO-RELEASE] Add locket client keepalive time and timeout to jobs: cloudfoundry/diego-release#722 [DIEGO-RELEASE] Make max_containers prop configurable: cloudfoundry/diego-release#876 [BBS] Add request metrics for BBS - still open: cloudfoundry/bbs#80 [BBS] Use scheduling info instead of the whole desiredLRP: cloudfoundry/bbs#79 [BBS] Add routing info endpoint: cloudfoundry/bbs#66 [BBS] Remove cpu_weight limit: cloudfoundry/bbs#81 [EXECUTOR] Improve error handling on process start - still open: cloudfoundry/executor#91 [LOCKET] Add a keepalive timeout on the locket client: cloudfoundry/locket#12 [GROOTFS] UsedVolumesSize and UsedStoreInBytes metrics: cloudfoundry/grootfs#155 [ROUTE-EMITTER] Use routing info bbs endpoint when syncing: cloudfoundry/route-emitter#23 [ROUTE-EMITTER] Use routing_info for desired_lrp's when there are missing actual_lrp's: cloudfoundry/route-emitter#26 [SILK-RELEASE] Deduplicate Iptables Rules with Dynamic ASG's: cloudfoundry/silk-release#101 [SILK-RELEASE] Make container_metadata_file_check_timeout on silk-shutdown configurable: cloudfoundry/silk-release#111 wip
No description provided.