RFE: seccomp_rule_add became very slow since v2.4.0 #153

varqox · 2019-05-01T16:19:01Z

Hello,
the issue was introduced by the commit ce3dda9 between v2.3.3 and v2.4.0. Consider the following example: foo.c.zip.
It adds a very large number of rules. And works around 100 times slower after the abovementioned commit.

foo.c execution time using v2.4.1: 0.448
foo.c execution time using v2.3.3: 0.077

I dug a little and found out that db_col_transaction_start() copies already existing filter collection and uses arch_filter_rule_add() to duplicate filter rules. But arch_filter_rule_add() calls arch_syscall_translate() which calls arch_syscall_resolve_name() which works in O(number of syscalls on the given architecture). So adding one rule works at least in O(number of already added rules * number of syscalls on used architectures) which IMO is really bad.
I counted the number of calls to arch_filter_rule_add() in the above example and it equals 201152.

Before that commit number of calls to arch_filter_rule_add() was 896. And from what I understand from the code, also db_col_transaction_start() copies already existing filter collection and does not use arch_filter_rule_add(). Which gives us estimation: time of adding a rule around O(number of already added rules + number of syscalls on the given architecture), which is much better.

However IMO it should not be related to the number of already added rules, because adding n rules works in O(n^2) then. But that is a topic to a different discussion, so it should not be a problem for small filters or filters generated infrequently.

Why this problem matters?
Some filters need executing programs PID (e.g. allowing the thread to send signals only to itself). So if the restricted program needs to be executed a considerable number of times, it becomes a very visible overhead. I have a filter of around 300 rules and libseccomp overhead is around 0.16s per run of the sandboxed process (I run the process dozens of times).

Thank you in advance for your help!

The text was updated successfully, but these errors were encountered:

pcmoore · 2019-05-01T16:41:35Z

Hi @varqox.

Yes, the syscall resolver functions could use some improvement, in fact if you look at the code you will see several comments like the following:

/* XXX - plenty of room for future improvement here */

If you want to look at improving that code, we should could use the help!

drakenclimber · 2019-05-01T19:28:30Z

As @pcmoore mentioned there are ample opportunities for speeding up the creation of a seccomp filter using libseccomp. Your research above outlined one of several areas that could use improvement. This has not been a concern for my users, so I have not focused on it.

Regarding the runtime performance, I am currently working on using a binary tree for large filters like the one you provided in foo.c. Initial results with my in-house customers are looking promising, but I would love to get another set of eyes on the changes. See pull request #152

varqox · 2019-05-01T22:39:56Z

OK, I see that syscall resolving could be improved, but it is not the root cause of the issue. Which is, as I see it, the creation of a snapshot in the db_col_transaction_start(). There arch_filter_rule_add() is called which is slow because of resolving the syscall, which is resolved in the original rule.

I see it as follows: We want to duplicate the whole set of current filters (aka struct db_filter) with all their rules, so we construct all the filters from scratch instead of taking advantage of what we already have and just copy all the filters. We don't have to build from scratch, we have fully build filter we just want a copy of. Maybe I missed something, but it looks like a lot of improvement may be done to the db_col_transaction_start() function.

pcmoore · 2019-05-02T01:34:37Z

With all of the state in the internal libseccomp db collection, duplicating it is not a trivial task, regenerating the collection from the original rules is much easier (from a code perspective). Keeping track of the original rules also allows us to offer the ability to "remove" an existing rule (possible future feature).

This isn't to say the transaction code couldn't be improved - it definitely can - but the current code is the way it is for a reason, primarily simplicity.

crosbymichael · 2019-08-22T15:34:24Z

We are seeing timeouts from users because of this change. It really slowed things down by an order of magnitude.

pcmoore · 2019-08-26T02:05:24Z

Another thought, we can probably change this so that we only duplicate the rules on a transaction start, not the entire tree, and only recreate the tree on a failed transaction. It isn't perfect, but that should get back a good chunk of the time.

crosbymichael · 2019-08-26T19:05:53Z

We need to do something because start times for containers and exec processes are seeing a huge performance regression and causing people to pin to 2.3x

pcmoore · 2019-08-26T19:13:50Z

I'm not going to comment further on the "huge" nature of the problem, that perspective has been made multiple times already and I consider it both relative and use-case dependent. However, I did want to remind everyone that libseccomp releases prior to v2.4 are vulnerable to a potential vulnerability which has been made public (issue #139).

For those who are concerned about this issue, it is currently marked for a v2.5 release.

crosbymichael · 2019-08-26T19:33:30Z

You made a refactoring and it has "huge" performance impacts in a minor release and you are not being helpful by blowing this off saying it's use-case dependent. Please take this seriously as people are going to start noticing as distros update to 2.4

pcmoore · 2019-08-26T20:34:27Z

@crosbymichael the change was not simply a refactoring, it was necessary to fix problems and support the changes in the kernel (most notably the need to support both multiplexed and direct call syscalls, e.g. socket syscalls on 32-bit x86).

I am not blowing this off, I've continued to think about ways to solve this problem (see my comments above), and the fact that I've marked this as something for the next minor release. At this point it is hard for me not to perceive your comments as inflammatory, if that is not your intention I suggest taking more care when commenting in the future. If you are unhappy with the progress on this issue you are always welcome to help out by submitting a patch/PR for review.

pcmoore · 2019-10-22T17:24:32Z

Note to self and whoever else is considering trying to solve this ...

I was recently reminded why we do what we do with respect to transactions (copy everything up front); we do this because we need to be able to rollback a transaction without failing. Why?
A normal seccomp_rule_add() operation needs to keep the filter intact even in the case of failure; if we fail a multi-part transaction (e.g. socket/ipc syscalls on x86/s390/s390x/etc.) as part of a normal rule addition, we MUST be able to revert to the filter at the start of the transaction without fail (regardless of memory pressure, etc.).

Duplicating the tree without the rules is going to continue to be challenging due to the nature of the tree and the linking inside the tree, but we might be able to selectively choose when when we need to create an internal transaction, skipping it for the many cases when it is not needed.

pcmoore · 2019-10-30T14:17:09Z

I spent a little more time looking at this and because of the way we destructively modify the decision tree during a rule addition I'm not sure we can avoid wrapping rule additions with a transaction. This means instead of finding ways to limit our use of transactions internally we need to find a way to speed them up, thankfully I think I may have found a solution: shadow trees.

Currently we build a new tree each time we create a new transaction and discard it on success, which as we've seen can be prohibitively slow in some use cases. My thought is that instead of discarding the duplicate tree on commit, we attempt to add the rule we just added to the duplicate tree (making it a copy of the current filter) and keep it as a "shadow transaction" to speed up the next transaction snapshot. Some notes:

db_col_transaction_start() should attempt to use the shadow transaction if present, but if not it should fallback to current behavior.
db_col_transaction_abort() should behave the same way as it does now; this means a failed transaction will clear the shadow transaction (it needs to tree to restore the filter), but the next successful transaction will restore the shadow. A failed transaction should be infrequent enough that this shouldn't be a major issue.
We may need to clear the shadow transaction on other operations, e.g. arch/ABI ops?, but that is something that we'll need to check out. Regardless, clearing the shadow transaction should be trivial.
This has the advantage of not just speeding up rule adds, but speeding up transactions in general. This may not be significant now, but it will be useful when we expose the transaction functionality to users (this would be necessary if we ever want to do a BSD "pledge" like mechanism).

pcmoore · 2019-11-01T22:51:59Z

I had some time after dinner tonight so I made a quick pass at implementing the shadow transaction idea above. The code is still crude, and my tests (below) even cruder, but it looks like we are seeing some performance gains with this approach:

Baseline test overhead

# time for i in {0..20000}; do /bin/true; done
real    0m10.479s
user    0m7.641s
sys     0m3.924s

Current master branch

# time for i in {0..20000}; do ./42-sim-adv_chains > /dev/null; done

real    0m16.303s
user    0m12.584s
sys     0m4.501s

Patched

time for i in {0..20000}; do ./42-sim-adv_chains > /dev/null; done

real    0m15.021s
user    0m11.540s
sys     0m4.387s

If we subtract out the test overhead we are looking at roughly 20% increase in performance on this "test", but I expect the benefit for complex filter sets to be better (much better?) than this.

@varqox and/or @crosbymichael once I clean up the patches a bit and create a PR, would you be able to test this in your environment?

varqox · 2019-11-01T23:07:47Z

My example test case is already here:

Hello,
the issue was introduced by the commit ce3dda9 between v2.3.3 and v2.4.0. Consider the following example: foo.c.zip.
It adds a very large number of rules. And works around 100 times slower after the abovementioned commit.

foo.c execution time using v2.4.1: 0.448
foo.c execution time using v2.3.3: 0.077

But as soon as the PR is ready, I can test it in my environment.

pcmoore · 2019-11-02T20:23:30Z

Hi @varqox, yes I saw that you included a test case in the original report, but I'm more interested in hearing how it performs in real use. If you could try out PR #180 and report back I would really appreciate it - thanks!

xinfengliu · 2019-11-04T09:26:30Z

Hi @pcmoore ,

Thanks for making this PR.
I built and tested your PR #180, the result is promising for my test case. I watch this issue because customers use docker health-check and suffered from the performance issue in libseccomp 2.4.x.
In my test case, this PR's performance is comparable to libseccomp 2.3.3. Details are as below:

Environment

Ubuntu 19.04 VM (2 CPU, 2G memory) on MacBook Pro (15-inch, mid 2015)
Kernel 5.0.0-32-generic
Docker CE 19.03.2

Test case:

Prepare 20 containers:

for i in $(seq 1 20)
do
  docker run -d --name bb$i busybox sleep 3d
done

Run test by firing docker exec on all containers at the same time

for i in $(seq 1 20)
do 
  /usr/bin/time -f "%E real" docker exec bb$i true & 
done

Results

libseccomp 2.3.3

0:01.05 real
0:01.12 real
0:01.16 real
0:01.20 real
0:01.23 real
0:01.27 real
0:01.31 real
0:01.35 real
0:01.37 real
0:01.38 real
0:01.40 real
0:01.41 real
0:01.40 real
0:01.40 real
0:01.45 real
0:01.46 real
0:01.47 real
0:01.48 real
0:01.48 real
0:01.49 real

libseccomp 2.4.1

0:00.98 real
0:01.63 real
0:01.67 real
0:01.95 real
0:02.55 real
0:02.70 real
0:02.70 real
0:02.96 real
0:03.04 real
0:03.16 real
0:03.17 real
0:03.21 real
0:03.23 real
0:03.27 real
0:03.24 real
0:03.29 real
0:03.27 real
0:03.29 real
0:03.28 real
0:03.27 real

Your PR build

0:00.95 real
0:01.12 real
0:01.20 real
0:01.23 real
0:01.28 real
0:01.29 real
0:01.31 real
0:01.37 real
0:01.38 real
0:01.40 real
0:01.43 real
0:01.43 real
0:01.44 real
0:01.45 real
0:01.42 real
0:01.47 real
0:01.48 real
0:01.48 real
0:01.48 real
0:01.50 real

Misc notes

I set 2.4.1 in AC_INIT in configure.ac before I built this PR.
This custom build is installed in /usr/local/lib, I verified it by running ldd /usr/bin/runc to make sure the custom build is being used during the test.
I ran the test for a few times, there are very small variances in the results. So I'm confident they are not accidental results.

pcmoore · 2019-11-04T23:56:01Z

That's great, thanks for the help @xinfengliu!

varqox · 2019-11-09T20:57:36Z

Hi @pcmoore,
Thanks for this PR.
In my case, it restores libseccomp performance to a level comparable with v2.3.3.

Results

foo.c

g++ foo.c -lseccomp -o foo -O3
for ((i=0; i<10; ++i)); do time ./foo; done

libseccomp 2.3.3

./foo  0.01s user 0.00s system 98% cpu 0.018 total
./foo  0.02s user 0.00s system 98% cpu 0.020 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total
./foo  0.02s user 0.00s system 98% cpu 0.018 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total
./foo  0.02s user 0.00s system 98% cpu 0.018 total
./foo  0.02s user 0.00s system 98% cpu 0.019 total

Average: 0.0188 s

libseccomp 2.4.2

./foo  0.19s user 0.00s system 99% cpu 0.195 total
./foo  0.19s user 0.00s system 99% cpu 0.194 total
./foo  0.19s user 0.00s system 99% cpu 0.193 total
./foo  0.19s user 0.00s system 99% cpu 0.196 total
./foo  0.19s user 0.00s system 99% cpu 0.195 total
./foo  0.20s user 0.00s system 99% cpu 0.196 total
./foo  0.19s user 0.00s system 99% cpu 0.194 total
./foo  0.20s user 0.00s system 99% cpu 0.197 total
./foo  0.19s user 0.00s system 99% cpu 0.195 total
./foo  0.19s user 0.00s system 99% cpu 0.194 total

Average: 0.1949 s

PR #180

./foo  0.01s user 0.01s system 98% cpu 0.012 total
./foo  0.01s user 0.00s system 97% cpu 0.013 total
./foo  0.01s user 0.00s system 96% cpu 0.013 total
./foo  0.01s user 0.01s system 97% cpu 0.014 total
./foo  0.01s user 0.00s system 97% cpu 0.012 total
./foo  0.01s user 0.00s system 98% cpu 0.013 total
./foo  0.01s user 0.00s system 98% cpu 0.012 total
./foo  0.01s user 0.00s system 98% cpu 0.013 total
./foo  0.01s user 0.00s system 97% cpu 0.013 total
./foo  0.01s user 0.00s system 97% cpu 0.011 total

Average: 0.0126 s

It seems like this PR gives some speedup over v2.3.3 in this synthetic test.

Building and loading (from seccomp_init() to seccomp_load()) filters in my sandbox plus some sandbox initialization overhead

libseccomp 2.3.3

Measured: 0.0052 s
Measured: 0.0040 s
Measured: 0.0046 s
Measured: 0.0042 s
Measured: 0.0038 s
Measured: 0.0038 s
Measured: 0.0039 s
Measured: 0.0036 s
Measured: 0.0042 s
Measured: 0.0044 s
Measured: 0.0036 s
Measured: 0.0037 s
Measured: 0.0044 s
Measured: 0.0035 s
Measured: 0.0035 s
Measured: 0.0035 s
Measured: 0.0040 s
Measured: 0.0037 s
Measured: 0.0043 s
Measured: 0.0042 s
Measured: 0.0035 s
Measured: 0.0034 s
Measured: 0.0038 s
Measured: 0.0035 s
Measured: 0.0035 s
Measured: 0.0037 s
Measured: 0.0038 s

Average: 0.0039 s

libseccomp 2.4.2

Measured: 0.0496 s
Measured: 0.0480 s
Measured: 0.0474 s
Measured: 0.0475 s
Measured: 0.0479 s
Measured: 0.0479 s
Measured: 0.0492 s
Measured: 0.0485 s
Measured: 0.0491 s
Measured: 0.0490 s
Measured: 0.0484 s
Measured: 0.0483 s
Measured: 0.0480 s
Measured: 0.0482 s
Measured: 0.0474 s
Measured: 0.0483 s
Measured: 0.0507 s
Measured: 0.0472 s
Measured: 0.0482 s
Measured: 0.0471 s
Measured: 0.0498 s
Measured: 0.0489 s
Measured: 0.0474 s
Measured: 0.0494 s
Measured: 0.0483 s
Measured: 0.0498 s
Measured: 0.0492 s

Average: 0.0466 s

PR #180

Measured: 0.0058 s
Measured: 0.0059 s
Measured: 0.0054 s
Measured: 0.0046 s
Measured: 0.0059 s
Measured: 0.0048 s
Measured: 0.0045 s
Measured: 0.0051 s
Measured: 0.0052 s
Measured: 0.0053 s
Measured: 0.0048 s
Measured: 0.0048 s
Measured: 0.0045 s
Measured: 0.0044 s
Measured: 0.0044 s
Measured: 0.0059 s
Measured: 0.0044 s
Measured: 0.0046 s
Measured: 0.0046 s
Measured: 0.0044 s
Measured: 0.0044 s
Measured: 0.0062 s
Measured: 0.0047 s
Measured: 0.0044 s
Measured: 0.0044 s
Measured: 0.0044 s
Measured: 0.0044 s

Average: 0.0049 s

Although in synthetic test PR give better times than v2.3.3, in real world it is slightly slower (possibly because of more complicated rules and running seccomp_merge() to merge two big filters). However, it still gives roughly tenfold speedup over v2.4.2.

pcmoore · 2019-11-11T22:37:16Z

Thanks for verifying the performance @varqox! As soon as @drakenclimber responds to the last round of comments (and I fix any remaining issues he may bring up), we'll get this merged.

pcmoore · 2019-11-11T22:39:23Z

Ah, nevermind, I just noticed that @drakenclimber did mark the PR as approved. I'm going to go ahead and merge that now.

pcmoore · 2019-11-11T22:45:40Z

I just merged PR #180 so I think we can mark this as closed, if anyone notices any remaining performance problems feel free to comment and/or re-open. Thanks everyone for your patience and help!

yulianedyalkova · 2019-11-15T14:24:23Z

@pcmoore do you plan on releasing soon with those changes?

pcmoore · 2019-11-15T15:32:07Z

This is currently part of the libseccomp v2.5 release milestone, you can track our progress towards the v2.5 release using the link below:

https://github.com/seccomp/libseccomp/milestone/4

libseccomp 2.4 introduced a serious performance regression. Backport the fix. seccomp/libseccomp#153 seccomp/libseccomp#180

pcmoore added enhancement priority/low labels May 1, 2019

pcmoore changed the title ~~seccomp_rule_add became very slow since v2.4.0~~ RFE: seccomp_rule_add became very slow since v2.4.0 May 1, 2019

varqox mentioned this issue May 2, 2019

RFE: speed up resolving syscalls by name by using hash map #156

Closed

pcmoore added this to the v2.5 milestone Aug 26, 2019

pcmoore self-assigned this Aug 26, 2019

This was referenced Sep 7, 2019

running bash from host to container is very slow docker/for-linux#466

Open

Docker takes a really long time to start / stop containerized processes compared to running processes without Docker moby/moby#38077

Open

thaJeztah mentioned this issue Oct 16, 2019

[DO NOT MERGE] make sure containerd runs with all libseccomp versions docker/containerd-packaging#112

Closed

pcmoore mentioned this issue Nov 2, 2019

RFE: improve rule add performance through shadow transactions #180

Closed

pcmoore closed this as completed Nov 11, 2019

pcmoore reopened this Nov 15, 2019

pcmoore closed this as completed Nov 15, 2019

algitbot pushed a commit to alpinelinux/aports that referenced this issue Jan 23, 2020

main/libseccomp: backport fix for performance regression

9d67c32

libseccomp 2.4 introduced a serious performance regression. Backport the fix. seccomp/libseccomp#153 seccomp/libseccomp#180

cpuguy83 mentioned this issue Oct 16, 2020

Processes are running slow in docker container on ubuntu 18.04 docker/for-linux#738

Open

3 tasks

blaisewang mentioned this issue Jan 26, 2021

Executing a simple command like ls from host to the container is very slow docker/for-linux#1194

Closed

3 tasks

thaJeztah mentioned this issue Jun 24, 2021

Default seccomp introduces significant overhead moby/moby#41389

Open

A-dead-pixel mentioned this issue Oct 2, 2022

Fix handling of rseq syscall sio2project/sio2jail#32

Closed

xiangyanw mentioned this issue Sep 16, 2023

Enabling seccomp profile for pod lead to higher CPU awslabs/amazon-eks-ami#1430

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: seccomp_rule_add became very slow since v2.4.0 #153

RFE: seccomp_rule_add became very slow since v2.4.0 #153

varqox commented May 1, 2019

pcmoore commented May 1, 2019

drakenclimber commented May 1, 2019

varqox commented May 1, 2019

pcmoore commented May 2, 2019

crosbymichael commented Aug 22, 2019

pcmoore commented Aug 26, 2019

crosbymichael commented Aug 26, 2019

pcmoore commented Aug 26, 2019

crosbymichael commented Aug 26, 2019

pcmoore commented Aug 26, 2019

pcmoore commented Oct 22, 2019

pcmoore commented Oct 30, 2019 •

edited

Loading

pcmoore commented Nov 1, 2019

varqox commented Nov 1, 2019

pcmoore commented Nov 2, 2019

xinfengliu commented Nov 4, 2019

pcmoore commented Nov 4, 2019

varqox commented Nov 9, 2019

pcmoore commented Nov 11, 2019

pcmoore commented Nov 11, 2019

pcmoore commented Nov 11, 2019

yulianedyalkova commented Nov 15, 2019

pcmoore commented Nov 15, 2019

RFE: seccomp_rule_add became very slow since v2.4.0 #153

RFE: seccomp_rule_add became very slow since v2.4.0 #153

Comments

varqox commented May 1, 2019

pcmoore commented May 1, 2019

drakenclimber commented May 1, 2019

varqox commented May 1, 2019

pcmoore commented May 2, 2019

crosbymichael commented Aug 22, 2019

pcmoore commented Aug 26, 2019

crosbymichael commented Aug 26, 2019

pcmoore commented Aug 26, 2019

crosbymichael commented Aug 26, 2019

pcmoore commented Aug 26, 2019

pcmoore commented Oct 22, 2019

pcmoore commented Oct 30, 2019 • edited Loading

pcmoore commented Nov 1, 2019

varqox commented Nov 1, 2019

pcmoore commented Nov 2, 2019

xinfengliu commented Nov 4, 2019

Environment

Test case:

Results

libseccomp 2.3.3

libseccomp 2.4.1

Your PR build

Misc notes

pcmoore commented Nov 4, 2019

varqox commented Nov 9, 2019

Results

foo.c

libseccomp 2.3.3

libseccomp 2.4.2

PR #180

Building and loading (from seccomp_init() to seccomp_load()) filters in my sandbox plus some sandbox initialization overhead

libseccomp 2.3.3

libseccomp 2.4.2

PR #180

pcmoore commented Nov 11, 2019

pcmoore commented Nov 11, 2019

pcmoore commented Nov 11, 2019

yulianedyalkova commented Nov 15, 2019

pcmoore commented Nov 15, 2019

pcmoore commented Oct 30, 2019 •

edited

Loading