Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

procstat(1) support for per-process compartment statistics #2080

Closed
wants to merge 3 commits into from

Conversation

rwatson
Copy link
Member

@rwatson rwatson commented Apr 8, 2024

This change introduces experimental support for procstat(1) to display information on c18n statistics for each processes. A number of parts come together to support this:

  • Two new ELF auxarg types are added, AT_C18N (pointer to per-process c18n exported state) and AT_C18NLEN (size of per-process c18n exported state). These default to NULL and 0 respectively, but may be updated by the process dynamically.
  • The kernel adds a new kern.proc.c18n sysctl, which allows retrieving c18n exported state from a target process. In general, this is a lot like kern.proc.auxv and the other sysctl MIB entries used by procstat(1) already.
  • rtld-elf has a new struct rtld_c18n_stats structure, defined in a new header rtld_c18n_public.h, which contains an initial set of statistics of interest, including number of compartments, number of stacks, number of initialised trampolines, and also information on memory commitment to trampolines.
  • rtld-elf implements a global instance of struct c18n_stats and, during startup updates the ELF auxiliary arguments to point at it. The c18n code number atomically updates various fields in the structure as sundry slow-path operations are performed. There is no support (currently) for fast-path operations such as domain switch counters.
  • libprocstat(3) implements procstat_getc18n(3) and procstat_freec18n(3) APIs. These are not yet documented, but grab the c18n exported state from a target process using the new sysctl, and provide it back to the caller.
  • procstat(1) implements an additional column in the cheri output, C18N, which lists a count of compartments in each process.
  • procstat(1) implements a new mode c18n that includes the full set of c18n stats fields for each process.

As a result, it’s now possible to easily look at the number of compartments in a process, as well as comment on some other behavioural aspects, when giving demos.

This isn’t yet a pull request, as various aspects (including overall approach) need discussing. We might for example, want the stacks block to me more overtly a variable-length array of {type, value} a la auxargs.

@rwatson rwatson requested review from dpgao, brooksdavis and bsdjhb April 8, 2024 00:11
Copy link
Member

@brooksdavis brooksdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer one more indirection auxv, placing a kernel allocated struct {void *, size} in stringspace and having a single AT_C18N point to it. My argument is that I don't think we have anything that assumes auxv's storage is writable, but string space must be. I might also hang a pointer off of struct proc to avoid needing to read auxv in the sysctl.

I'm in the airport and running out of battery so haven't attempted a detailed review.

#define RTLD_C18N_STATS_H

struct rtld_c18n_stats {
long rcs_compartments;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long? ewww.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using uint64_t, but need suitable atomics to be consistently available. :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwatson C built-in functions like atomic_fetch_add_explicit should do the job?

@dpgao
Copy link
Contributor

dpgao commented Apr 9, 2024

I have an old patch (404ff02) that records a similar set of statistics and exports them to a file supplied by the user. Using procstat is clearly a better interface than files.

The counters exposed by that patch are

struct c18n_stats {
	/*
	 * This field also acts as a synchronisation barrier where a non-zero
	 * value indicates that the other fields contain valid data.
	 */
	_Atomic(pid_t) pid;
	/*
	 * Compartments are created in a single thread, but the other counters
	 * can be modified concurrently.
	 */
	size_t num_compart;
	_Atomic(size_t) num_ustack;
	_Atomic(size_t) num_tramp;
	_Atomic(size_t) num_bytes_total;
};

In that patch, there is a counter that aggregates the total number of bytes taken up by c18n data structures. This includes space taken up by:

  • trampolines,
  • trampoline lookup tables,
  • policy data structures, and
  • stack lookup tables.

I suggest replacing both rcs_tramppages and rcs_tramptable with this counter.

@rwatson
Copy link
Member Author

rwatson commented Apr 9, 2024

I think I'd prefer one more indirection auxv, placing a kernel allocated struct {void *, size} in stringspace and having a single AT_C18N point to it. My argument is that I don't think we have anything that assumes auxv's storage is writable, but string space must be. I might also hang a pointer off of struct proc to avoid needing to read auxv in the sysctl.

I don’t have a strong opinion on this, and certainly adding a further level of indirection is trivial (and would also allow versioning if we wanted to add that explicitly). However:

  1. there is a comment in kern_proc.c:get_proc_vector() to detect races in retrieving auxv, and certainly the other routines (such as kern_proc.c:get_ps_strings()) assumes the possibility of updates.
  2. I think the AT_EXECPATH support does support updating the vector but haven’t tried confirming this at runtime:
                /* Point AT_EXECPATH auxv and aux_info to the binary path. */
                if (binpath == NULL) {
                    aux_info[AT_EXECPATH] = NULL;
                } else {
                    if (aux_info[AT_EXECPATH] == NULL) {
                        aux_info[AT_EXECPATH] = xmalloc(sizeof(Elf_Auxinfo));
                        aux_info[AT_EXECPATH]->a_type = AT_EXECPATH;
                    }
                    aux_info[AT_EXECPATH]->a_un.a_ptr = __DECONST(void *,
                      binpath);         

@brooksdavis
Copy link
Member

Hmm, indeed. (Direct exec of rtld sure adds a lot of corner cases). Much as I dislike it and having suggested something in auxv previously, I think we might want to go back to extending ps_strings. I'm worried about a future AT_ value from upstream causing us a painful flag day down the road. I need to think about this more.

Looking at get_proc_vector(PROC_AUX), it definitely depends on things we don't intend to guarantee and that I suspect could be made untrue today if env could be large enough.

@rwatson
Copy link
Member Author

rwatson commented Apr 9, 2024

I have an old patch (404ff02) that records a similar set of statistics and exports them to a file supplied by the user. Using procstat is clearly a better interface than files.

I don’t think the two are mutually exclusive -- the file output case is quite useful when running benchmarks, for example, to get a final snapshot of statistics when execution terminates. @qwattash might have opinions on the logistics around that.

In that patch, there is a counter that aggregates the total number of bytes taken up by c18n data structures. This includes space taken up by:

  • trampolines,
  • trampoline lookup tables,
  • policy data structures, and
  • stack lookup tables.

I suggest replacing both rcs_tramppages and rcs_tramptable with this counter.

Using _Atomic() and using a total byte count sound good to me, although there is also value in including page information as that tells us something about not just bytes allocated, but also fragmentation caused by the suballocation model. This might matter more with large pages for trampolines, but certainly the page measure I have in there currently doesn’t address that either.

@rwatson
Copy link
Member Author

rwatson commented Apr 9, 2024

Hmm, indeed. (Direct exec of rtld sure adds a lot of corner cases). Much as I dislike it and having suggested something in auxv previously, I think we might want to go back to extending ps_strings. I'm worried about a future AT_ value from upstream causing us a painful flag day down the road. I need to think about this more.

I wondered if we might want to upstream placeholder AT_ values upstream to preempt this.

@brooksdavis
Copy link
Member

Hmm, indeed. (Direct exec of rtld sure adds a lot of corner cases). Much as I dislike it and having suggested something in auxv previously, I think we might want to go back to extending ps_strings. I'm worried about a future AT_ value from upstream causing us a painful flag day down the road. I need to think about this more.

I wondered if we might want to upstream placeholder AT_ values upstream to preempt this.

Indeed, could do AT_RESERVED0... and document them as being for local use like we now do with gaps in the syscall table.

@brooksdavis
Copy link
Member

This is merged into dev and thus OBE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants