Skip to content

Conversation

@zeeshanlakhani
Copy link
Collaborator

@zeeshanlakhani zeeshanlakhani commented Nov 30, 2025

This PR also addresses permission models, object deletion, and error handling questions related to reserved addresses presented in @askfongjojo's testing Google Doc (default IP Pools are covered in a follow-up, stacked PR).

In thinking through the Groups API, permission scopes, and flexibility, @rcgoodfellow mentioned this consideration:

Do we need an explicit notion of a group object at all? Or can
instances simply allocate/deallocate group IPs from pools, and there is
no explicit management of group objects.

With Fleet admins having access control to create pools and link silos to a pool, we arrived at the idea of replacing the current explicit multicast group CRUD with an implicit lifecycle, where groups are created upon the first member join and deleted when the last member leaves.

Note: Most of the PR's changes are test-related due to moving away from the explicit multicast group(s) lifecycle.

Auth Model:

  • Discovery (fleet-scoped):
    • Read/list groups and list members: any authenticated user in the same fleet.
  • Membership (project-scoped):
    • Join/leave requires Instance::Modify on the specific instance.
  • Creation control:
    • Implicit group creation only when the caller’s silo is linked to a suitable multicast pool (by name or by explicit IP in that pool).

Behavior:

  • Implicit lifecycle:
    • Create on first join (idempotent); delete when last member leaves (atomic mark-for-removal, reconciler schedules cleanup).
  • Addressing and validation:
    • Implicit allocation from the caller’s linked multicast pools.
    • SSM/ASM semantics enforced:
      • IPv4 SSM 232/8 and IPv6 ff3x::/32 require ≥1 source IP.
      • ASM must not specify sources. - When joining by explicit IP: resolve the pool containing the IP, and we verify the silo link before creating.
  • Error handling: - Reserved/invalid multicast ranges rejected at pool/range add time.

API:

  • Primary flows: - Group-centric member management: POST/DELETE /v1/multicast-groups/{group}/members - Instance-centric join/leave: PUT/DELETE /v1/instances/{instance}/multicast-groups/{group}
  • Discovery endpoints remain for list/view; there is no explicit group create/update/delete.
  • This is a breaking change, but multicast is not yet enabled or available in production

Key changes:

  • Implicit group model; groups exist while they have members.
  • IP pool integration for multicast allocation with silo link gating.
  • Simplified API centered on join/leave flows.
  • Add multicast_ip to the member table for responses.
  • For consistency, move to Instant type over SystemTime for mcast-related caches

Follow‑ups (stacked [and related] PRs)

This PR also addresses permission models, object deletion, and error handling questions related to
reserved addresses  presented in @askfongjojo's testing Google Doc (default IP Pools are covered
in a follow-up, stacked PR).

In thinking through the *Groups* API, permission scopes, and flexibility, @rcgoodfellow mentioned this consideration:

> Do we need an explicit notion of a group object at all? Or can
> instances simply allocate/deallocate group IPs from pools, and there is
> no explicit management of group objects.

With Fleet admins having access control to create pools and link silos to a pool, we arrived at the idea
of replacing the current explicit multicast group CRUD with an implicit lifecycle, where groups are created
upon the first member join and deleted when the last member leaves.

**Note**: Most of the PR's changes are test-related due to moving away from the explicit multicast group(s) lifecycle.

Auth Model:
  - Discovery (fleet-scoped):
    - Read/list groups and list members: any authenticated user in the same fleet.
  - Membership (project-scoped):
    - Join/leave requires Instance::Modify on the specific instance.
  - Creation control:
    - Implicit group creation only when the s silo is linked to a suitable multicast pool (by name or by explicit IP in that pool).

Behavior:
  - Implicit lifecycle:
    - Create on first join (idempotent); delete when last member leaves (atomic mark-for-removal, reconciler schedules cleanup).
  - Addressing and validation:
    - Implicit allocation from the s linked multicast pools.
    - SSM/ASM semantics enforced:
      - IPv4 SSM 232/8 and IPv6 ff3x::/32
  - Error handling: - Reserved/invalid multicast ranges rejected at pool/range add time.

API:
  - Primary flows:
    - Group-centric member management: POST/DELETE /v1/multicast-groups/{group}/members
    - Instance-centric join/leave: PUT/DELETE /v1/instances/{instance}/multicast-groups/{group}
  - Discovery endpoints remain for list/view; there is no explicit group create/update/delete.
  - This is a *breaking* change, but multicast is not yet enabled or available in production

Key changes:
  - Implicit group model; groups exist while they have members.
  - IP pool integration for multicast allocation with silo link gating.
  - Simplified API centered on join/leave flows.
  - Add multicast_ip to the member table for responses.
  - For consistency, move to `Instant` type over `SystemTime` for mcast-related caches

Follow-ups (stacked PRs)
  - [ ] Remove MVLAN from group data model.
  - [ ] Default IP pool support (IPv4/IPv6 Followrequire unicast/multicast).
  - [ ] Dendrite: use omicron-common constants for validation.
@zeeshanlakhani zeeshanlakhani force-pushed the zl/mcast-implicit-lifecycle branch from e44bc27 to 1c9d172 Compare November 30, 2025 06:36
zeeshanlakhani added a commit to oxidecomputer/dendrite that referenced this pull request Dec 1, 2025
Previously, internal multicast groups accepted admin-scoped addresses
including admin-local (ff04), site-local (ff05), and org-local (ff08).
This narrows the scope to only admin-local (ff04::/16), which is what
Omicron *now* dictates.

- [ ] This should be merged after
    oxidecomputer/omicron#9450 is reviewed
    and merged into Omicron. We now make Dendrite/Dpd match Omicron
    consistently for validation.

Key changes:
  - Remove IPV6_SITE_LOCAL_PATTERN and IPV6_ORG_SCOPE_PATTERN from P4
  - Update P4 table entries to only match admin-local (size 4→2)
  - Add ADMIN_LOCAL_PREFIX const to dpd-types with RFC doc links
  - Update validation to use `is_admin_local_multicast()` from oxnet v0.1.4
  - Bump to API version 2 for doc changes (only)
  - Update README with OpenAPI generation instructions
  - Use new multicast subnet constants from `omicron-common` for validation
zeeshanlakhani added a commit to oxidecomputer/dendrite that referenced this pull request Dec 1, 2025
Previously, internal multicast groups accepted admin-scoped addresses
including admin-local (ff04), site-local (ff05), and org-local (ff08).
This narrows the scope to only admin-local (ff04::/16), which is what
Omicron *now* dictates.

- [ ] This should be merged after
    oxidecomputer/omicron#9450 is reviewed
    and merged into Omicron. We now make Dendrite/Dpd match Omicron
    consistently for validation.

Key changes:
  - Remove IPV6_SITE_LOCAL_PATTERN and IPV6_ORG_SCOPE_PATTERN from P4
  - Update P4 table entries to only match admin-local (size 4→2)
  - Add ADMIN_LOCAL_PREFIX const to dpd-types with RFC doc links
  - Update validation to use `is_admin_local_multicast()` from oxnet v0.1.4
  - Bump to API version 2 for doc changes (only)
  - Update README with OpenAPI generation instructions
  - Use new multicast subnet constants from `omicron-common` for validation
@askfongjojo
Copy link

@zeeshanlakhani - The revised authn model and implicit creation/deletion logic make sense to me. I'll make another pass tomorrow to ensure I didn't miss anything but the mental model of multicast group lifecycle seems straightforward enough.

zeeshanlakhani added a commit that referenced this pull request Dec 3, 2025
This PR adds omdb commands to inspect multicast state:
  - `omdb db multicast groups` - list multicast groups with optional
    state and pool name filters
  - `omdb db multicast members` - list group members with filters for group-id,
    group-name, group-ip, state, and sled-id
  - `omdb db multicast info` - show detailed info for a specific group
  - `omdb db multicast pools` - list multicast IP pools

We also include:
  - Background task status display for multicast_reconciler
  - Integration tests for all multicast omdb commands

Follows the multicast lifecycle work in
#9450.
Introduce API version `VERSION_MULTICAST_IMPLICIT_LIFECYCLE_UPDATES`
(v2025120500) to support the transition from explicit to implicit
multicast group lifecycle management.

Changes in new API version:
  - Groups are created implicitly when first member joins
  - Groups are deleted implicitly when last member leaves
  - Instance create/update accept `MulticastGroupIdentifier` (name, UUID,
    or multicast IP address) instead of just `NameOrId`
  - MulticastGroupMemberAdd now has optional `source_ips` for SSM

Backward compatibility (v20251120):
  - Add `v20251120` module with compatibility types using `NameOrId`
  - Explicit group create/update/delete endpoints marked deprecated
  - Proper base64 validation for user_data via shared UserData serde helper

Also includes:
  - Add version_policy to techport server for omdb compatibility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants