Skip to content

Conversation

sunshowers
Copy link
Contributor

@sunshowers sunshowers commented Oct 14, 2025

This is an incomplete attempt to use a semaphore with one permit, to prevent kstat reads and XDE ioctls from happening simultaneously, with the goal of working around oxidecomputer/opte#758 (see #9211).

What's missing:

  • Tests need to be updated.
  • There's no great way to ensure that every ioctl is covered -- would need manual review to ensure that it is.
  • Dropping an illumos_utils::opte::PortInner (a Port is an Arc around this) calls delete_xde, which would need to be protected by the same semaphore.
  • Validating that this actually works, doesn't introduce new issues, etc.

Unfortunately this is becoming complicated and fragile enough that I've become significantly less optimistic about using this path as a workaround, compared to the alternative where:

  • support monitors this as part of the r17 upgrade, and
  • we fix it in the illumos kernel for r18

In particular, the requirement to pass the semaphore into the PortInner and lock it inside the generally-unobservable Drop impl has me quite concerned.

Putting this up there for folks to look at in the meantime.

Created using spr 1.3.6-beta.1
@sunshowers sunshowers changed the title [DNM] attempt to use a semaphore to work around opte#758 [DNM] attempt to use a semaphore to work around opte 758 Oct 14, 2025
Created using spr 1.3.6-beta.1
@jclulow
Copy link
Collaborator

jclulow commented Oct 14, 2025

FWIW, it might be worth using DTrace to collect user stacks from ioctl system calls made by sled agent during some test runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants