BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall #107

cjwatson · 2018-02-15T09:59:04Z

On x32, the kernel VDSO that provides clock_gettime and gettimeofday sometimes falls back to the underlying syscall. Unfortunately, it falls back to the x86-64 variant of that syscall (https://bugs.debian.org/850047 is an example from a non-libseccomp context).

It would be possible for every libseccomp user that needs these syscalls to work around these by something like this (omitting error handling):

/* These must be the last syscalls added to the filter, as once we've
 * called seccomp_arch_add all syscalls after that point will be allowed
 * for both architectures.
 */
#if defined(__x86_64__) && defined(__ILP32__)
seccomp_arch_add(ctx, SCMP_ARCH_X86_64);
#endif
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(clock_gettime), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(gettimeofday), 0);

This seems cumbersome and easy to get wrong, though, and it seems like the kind of architecture-specific quirk that libseccomp is supposed to deal with for us. Would it be possible for libseccomp to handle this?

The text was updated successfully, but these errors were encountered:

cjwatson · 2018-02-15T10:56:44Z

This doesn't look too difficult to do (famous last words). I'll see if I can put together a PR.

pcmoore · 2018-02-16T18:16:05Z

Hi @cjwatson, thanks for the report. It's disappointing that the kernel isn't doing the Right Thing here with regards to VDSOs, but given the nature, and support, of the x32 ABI I can't say I'm very surprised.

My initial thought on how to resolve this would be to create a x32 specific "rule_add(...)" function, see "src/arch-x86.c:x86_rule_add(...)" as an example. The x32 version would need to handle this in much the same way we handle the multiplexed/direct call socket calls on x86: adding one syscall really ends up adding multiple syscall rules, one for the VDSO and one for the actual x86_64 syscall.

Don't forget to check the "strict" setting on the rule, if the strict flag is set you shouldn't add the x86_64 syscall rule.

We may also need to disable, or workaround, the x86_64 ABI/syscall check (BADARCH) at the top of the generated filter. Ungh. Hopefully not, but something to be aware of when you start testing your code.

cjwatson · 2018-02-18T01:27:10Z

Thanks for the initial thoughts; this is indeed basically the approach I've been pursuing. Of course my initial assessment was a bit optimistic.

The basic problem is that having an architecture-specific rule_add function add a rule for a different architecture is not really something the current code caters for at all; and, as you say, there's the bad-architecture check to deal with too, one way or another.

I experimented with adding the concept of "early rules" that are inserted into the compiled filter before the architecture check, but realised that that doesn't actually help. We do still want the compiled filter to have strict architecture checking; it's just that we need the implicitly-created x86_64 rules to be under a check for the x86_64 architecture. So I think we need proper struct db_filter and struct db_sys_list entries here.

I considered reintroducing a struct db_filter_col * to rule_add's signature and adding a rule for the proper architecture, as if the caller had called seccomp_arch_add followed by seccomp_rule_add. That has the wrong behaviour, though, because any rule added after that point would be added for both x32 and x86_64.

So I think that we either need:

an architecture-specific post-processing step called when loading the filter (but I'm not sure whether there are any situations in which the same filter structure in memory might be loaded more than once, and if it might therefore be bad for loading the filter to mutate it)
an additional const struct arch_def *arch in struct db_sys_list, which if set would override the one in the containing struct db_filter, and the code that generates the bad-architecture check would need to walk through each struct db_sys_list as well as just the top-level filters; this would let us generate single rules for different architectures without adding the architecture for any future rules that are added

Does either of these options sound good, or am I on the wrong track entirely?

I do at least have a regression test case for the desired behaviour now, so I can try out various options fairly easily.

cjwatson · 2018-02-18T01:29:10Z

I may possibly mean struct db_api_rule_list instead of struct db_sys_list in the above. Or maybe both. I haven't quite got my head around the internal API yet.

pcmoore · 2018-02-21T18:45:11Z

Thanks for the initial thoughts; this is indeed basically the approach I've been pursuing. Of course my initial assessment was a bit optimistic.

No worries, a little optimism can be a good thing :)

The basic problem is that having an architecture-specific rule_add function add a rule for a different architecture is not really something the current code caters for at all; and, as you say, there's the bad-architecture check to deal with too, one way or another.

Yes, I was afraid that would be a problem, but I hadn't looked at it too closely so I wasn't sure how problematic it would be in practice.

I experimented with adding the concept of "early rules" that are inserted into the compiled filter before the architecture check, but realised that that doesn't actually help. We do still want the compiled filter to have strict architecture checking; it's just that we need the implicitly-created x86_64 rules to be under a check for the x86_64 architecture. So I think we need proper struct db_filter and struct db_sys_list entries here.

Yes, I want to limit the number of special cases we need to add to a filter. There are things like the TSKIP processing/checks, but I consider those to be "hacks" and not something we should strive for in the code base.

I considered reintroducing a struct db_filter_col * to rule_add's signature and adding a rule for the proper architecture, as if the caller had called seccomp_arch_add followed by seccomp_rule_add. That has the wrong behaviour, though, because any rule added after that point would be added for both x32 and x86_64.

I will admit, this was my first thought when I started reading your reply, but as you point out, this would not be the correct behavior. However, I think you are on the right path, we just need to tweak the idea a bit, more on this below.

So I think that we either need:

an architecture-specific post-processing step called when loading the filter (but I'm not sure whether there are any situations in which the same filter structure in memory might be loaded more than once, and if it might therefore be bad for loading the filter to mutate it).

I don't believe we do anything to prevent users from loading the same filter multiple times, or loading it then manipulating it further and reloading it. Of course doing something like this would almost always be a Bad Idea, but we do allow it so I'd like to preserve it if at all possible.

We might be able to leverage some of the transaction code to manipulate the filter, load it, and then rollback the changes. Although I can't say I would be very supportive of such code, but I guess it would be a possibility.

an additional const struct arch_def *arch in struct db_sys_list, which if set would override the one in the containing struct db_filter, and the code that generates the bad-architecture check would need to walk through each struct db_sys_list as well as just the top-level filters; this would let us generate single rules for different architectures without adding the architecture for any future rules that are added.

You're getting close to what I'm thinking ...

What if we were to augment the db_filter struct so that we could add a "hidden" arch/ABI to the filter that would only allow the addition of rules from libseccomp internal functions? In other words, we would create a new ABI filter that would not be accessible via the libseccomp API (it wouldn't show up in the PFC output, only the BPF output), but we could add rules to it using the internal functions, e.g. an x32 specific rule_add() function. Of course if the user added the "hidden" arch/ABI to the filter then everything would be made visible, but until then it would remain hidden.

What do you think about this approach?

pcmoore · 2024-01-07T00:25:05Z

In an effort to get v2.6.0 out sooner than later, I'm going to suggest we push this out to v2.7.0; if you have any concerns or objections please drop a comment.

pcmoore added enhancement priority/medium labels Feb 16, 2018

pcmoore added bug and removed enhancement labels Feb 16, 2018

pcmoore changed the title ~~clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall~~ BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall Feb 16, 2018

pcmoore added this to the v2.5 milestone Feb 16, 2018

pcmoore removed this from the v2.5 milestone Aug 26, 2019

pcmoore added this to the v2.6.0 milestone Jul 21, 2020

pcmoore modified the milestones: v2.6.0, v2.7.0 Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall #107

BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall #107

cjwatson commented Feb 15, 2018

cjwatson commented Feb 15, 2018

pcmoore commented Feb 16, 2018

cjwatson commented Feb 18, 2018

cjwatson commented Feb 18, 2018

pcmoore commented Feb 21, 2018

pcmoore commented Jan 7, 2024

BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall #107

BUG: clock_gettime and gettimeofday VDSOs on x32 may use x86-64 syscall #107

Comments

cjwatson commented Feb 15, 2018

cjwatson commented Feb 15, 2018

pcmoore commented Feb 16, 2018

cjwatson commented Feb 18, 2018

cjwatson commented Feb 18, 2018

pcmoore commented Feb 21, 2018

pcmoore commented Jan 7, 2024