-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: seccomp_rule_add became very slow since v2.4.0 #153
Comments
Hi @varqox. Yes, the syscall resolver functions could use some improvement, in fact if you look at the code you will see several comments like the following:
If you want to look at improving that code, we should could use the help! |
As @pcmoore mentioned there are ample opportunities for speeding up the creation of a seccomp filter using libseccomp. Your research above outlined one of several areas that could use improvement. This has not been a concern for my users, so I have not focused on it. Regarding the runtime performance, I am currently working on using a binary tree for large filters like the one you provided in foo.c. Initial results with my in-house customers are looking promising, but I would love to get another set of eyes on the changes. See pull request #152 |
OK, I see that syscall resolving could be improved, but it is not the root cause of the issue. Which is, as I see it, the creation of a snapshot in the db_col_transaction_start(). There arch_filter_rule_add() is called which is slow because of resolving the syscall, which is resolved in the original rule. I see it as follows: We want to duplicate the whole set of current filters (aka struct db_filter) with all their rules, so we construct all the filters from scratch instead of taking advantage of what we already have and just copy all the filters. We don't have to build from scratch, we have fully build filter we just want a copy of. Maybe I missed something, but it looks like a lot of improvement may be done to the db_col_transaction_start() function. |
With all of the state in the internal libseccomp db collection, duplicating it is not a trivial task, regenerating the collection from the original rules is much easier (from a code perspective). Keeping track of the original rules also allows us to offer the ability to "remove" an existing rule (possible future feature). This isn't to say the transaction code couldn't be improved - it definitely can - but the current code is the way it is for a reason, primarily simplicity. |
We are seeing timeouts from users because of this change. It really slowed things down by an order of magnitude. |
Another thought, we can probably change this so that we only duplicate the rules on a transaction start, not the entire tree, and only recreate the tree on a failed transaction. It isn't perfect, but that should get back a good chunk of the time. |
We need to do something because start times for containers and exec processes are seeing a huge performance regression and causing people to pin to 2.3x |
I'm not going to comment further on the "huge" nature of the problem, that perspective has been made multiple times already and I consider it both relative and use-case dependent. However, I did want to remind everyone that libseccomp releases prior to v2.4 are vulnerable to a potential vulnerability which has been made public (issue #139). For those who are concerned about this issue, it is currently marked for a v2.5 release. |
You made a refactoring and it has "huge" performance impacts in a minor release and you are not being helpful by blowing this off saying it's use-case dependent. Please take this seriously as people are going to start noticing as distros update to 2.4 |
@crosbymichael the change was not simply a refactoring, it was necessary to fix problems and support the changes in the kernel (most notably the need to support both multiplexed and direct call syscalls, e.g. socket syscalls on 32-bit x86). I am not blowing this off, I've continued to think about ways to solve this problem (see my comments above), and the fact that I've marked this as something for the next minor release. At this point it is hard for me not to perceive your comments as inflammatory, if that is not your intention I suggest taking more care when commenting in the future. If you are unhappy with the progress on this issue you are always welcome to help out by submitting a patch/PR for review. |
Note to self and whoever else is considering trying to solve this ... I was recently reminded why we do what we do with respect to transactions (copy everything up front); we do this because we need to be able to rollback a transaction without failing. Why? Duplicating the tree without the rules is going to continue to be challenging due to the nature of the tree and the linking inside the tree, but we might be able to selectively choose when when we need to create an internal transaction, skipping it for the many cases when it is not needed. |
I spent a little more time looking at this and because of the way we destructively modify the decision tree during a rule addition I'm not sure we can avoid wrapping rule additions with a transaction. This means instead of finding ways to limit our use of transactions internally we need to find a way to speed them up, thankfully I think I may have found a solution: shadow trees. Currently we build a new tree each time we create a new transaction and discard it on success, which as we've seen can be prohibitively slow in some use cases. My thought is that instead of discarding the duplicate tree on commit, we attempt to add the rule we just added to the duplicate tree (making it a copy of the current filter) and keep it as a "shadow transaction" to speed up the next transaction snapshot. Some notes:
|
I had some time after dinner tonight so I made a quick pass at implementing the shadow transaction idea above. The code is still crude, and my tests (below) even cruder, but it looks like we are seeing some performance gains with this approach:
If we subtract out the test overhead we are looking at roughly 20% increase in performance on this "test", but I expect the benefit for complex filter sets to be better (much better?) than this. @varqox and/or @crosbymichael once I clean up the patches a bit and create a PR, would you be able to test this in your environment? |
My example test case is already here:
But as soon as the PR is ready, I can test it in my environment. |
Hi @pcmoore , Thanks for making this PR. EnvironmentUbuntu 19.04 VM (2 CPU, 2G memory) on MacBook Pro (15-inch, mid 2015) Test case:Prepare 20 containers: for i in $(seq 1 20)
do
docker run -d --name bb$i busybox sleep 3d
done Run test by firing
Resultslibseccomp 2.3.3
libseccomp 2.4.1
Your PR build
Misc notes
|
That's great, thanks for the help @xinfengliu! |
Hi @pcmoore, Resultsfoo.cg++ foo.c -lseccomp -o foo -O3
for ((i=0; i<10; ++i)); do time ./foo; done libseccomp 2.3.3
Average: libseccomp 2.4.2
Average: PR #180
Average: It seems like this PR gives some speedup over v2.3.3 in this synthetic test. Building and loading (from seccomp_init() to seccomp_load()) filters in my sandbox plus some sandbox initialization overheadlibseccomp 2.3.3
Average: libseccomp 2.4.2
Average: PR #180
Average: Although in synthetic test PR give better times than v2.3.3, in real world it is slightly slower (possibly because of more complicated rules and running seccomp_merge() to merge two big filters). However, it still gives roughly tenfold speedup over v2.4.2. |
Thanks for verifying the performance @varqox! As soon as @drakenclimber responds to the last round of comments (and I fix any remaining issues he may bring up), we'll get this merged. |
Ah, nevermind, I just noticed that @drakenclimber did mark the PR as approved. I'm going to go ahead and merge that now. |
I just merged PR #180 so I think we can mark this as closed, if anyone notices any remaining performance problems feel free to comment and/or re-open. Thanks everyone for your patience and help! |
@pcmoore do you plan on releasing soon with those changes? |
This is currently part of the libseccomp v2.5 release milestone, you can track our progress towards the v2.5 release using the link below: |
libseccomp 2.4 introduced a serious performance regression. Backport the fix. seccomp/libseccomp#153 seccomp/libseccomp#180
Hello,
the issue was introduced by the commit ce3dda9 between v2.3.3 and v2.4.0. Consider the following example: foo.c.zip.
It adds a very large number of rules. And works around 100 times slower after the abovementioned commit.
foo.c
execution time using v2.4.1:0.448
foo.c
execution time using v2.3.3:0.077
I dug a little and found out that db_col_transaction_start() copies already existing filter collection and uses arch_filter_rule_add() to duplicate filter rules. But arch_filter_rule_add() calls arch_syscall_translate() which calls arch_syscall_resolve_name() which works in O(number of syscalls on the given architecture). So adding one rule works at least in O(number of already added rules * number of syscalls on used architectures) which IMO is really bad.
I counted the number of calls to arch_filter_rule_add() in the above example and it equals
201152
.Before that commit number of calls to arch_filter_rule_add() was
896
. And from what I understand from the code, also db_col_transaction_start() copies already existing filter collection and does not use arch_filter_rule_add(). Which gives us estimation: time of adding a rule around O(number of already added rules + number of syscalls on the given architecture), which is much better.However IMO it should not be related to the number of already added rules, because adding n rules works in O(n^2) then. But that is a topic to a different discussion, so it should not be a problem for small filters or filters generated infrequently.
Why this problem matters?
Some filters need executing programs PID (e.g. allowing the thread to send signals only to itself). So if the restricted program needs to be executed a considerable number of times, it becomes a very visible overhead. I have a filter of around 300 rules and libseccomp overhead is around 0.16s per run of the sandboxed process (I run the process dozens of times).
Thank you in advance for your help!
The text was updated successfully, but these errors were encountered: