Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Launch switch zone automatically on scrimlets #1933

Merged
merged 11 commits into from
Nov 11, 2022
12 changes: 11 additions & 1 deletion common/src/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ const DNS_ADDRESS_INDEX: usize = 1;
const GZ_ADDRESS_INDEX: usize = 2;

/// The maximum number of addresses per sled reserved for RSS.
pub const RSS_RESERVED_ADDRESSES: u16 = 10;
pub const RSS_RESERVED_ADDRESSES: u16 = 16;
smklein marked this conversation as resolved.
Show resolved Hide resolved

/// Wraps an [`Ipv6Network`] with a compile-time prefix length.
#[derive(
Expand Down Expand Up @@ -136,6 +136,7 @@ impl ReservedRackSubnet {
}

const SLED_AGENT_ADDRESS_INDEX: usize = 1;
const SWITCH_ZONE_ADDRESS_INDEX: usize = 2;

/// Return the sled agent address for a subnet.
///
Expand All @@ -146,6 +147,15 @@ pub fn get_sled_address(sled_subnet: Ipv6Subnet<SLED_PREFIX>) -> SocketAddrV6 {
SocketAddrV6::new(sled_agent_ip, SLED_AGENT_PORT, 0, 0)
}

/// Return the switch zone address for a subnet.
///
/// This address will come from the second address of the [`SLED_PREFIX`] subnet.
pub fn get_switch_zone_address(
sled_subnet: Ipv6Subnet<SLED_PREFIX>,
) -> Ipv6Addr {
sled_subnet.net().iter().nth(SWITCH_ZONE_ADDRESS_INDEX).unwrap()
}

/// Returns a sled subnet within a rack subnet.
///
/// The subnet at index == 0 is used for rack-local services.
Expand Down
2 changes: 1 addition & 1 deletion docs/crdb-debugging.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ The following provides instructions for connecting to a CRDB shell on a running

1. **Find the zone running CockroachDB**. This can be accomplished by running `zoneadm list -cv`, and finding the zone with a prefix of `oxz_cockroachdb`.
2. **Log into that zone**. This can be done using `pfexec zlogin <that zone name>`.
3. **Read the CockroachDB log file to determine the connection instructions**. This can be done with `tail -f $(svcs -L cockroachdb)`, look for a line starting with `RPC client flags:`. As one example, this may look like `/opt/oxide/cockroachdb/bin/cockroach <client cmd> --host=[fd00:1122:3344:101::2]:32221 --insecure`
3. **Read the CockroachDB log file to determine the connection instructions**. This can be done with `tail -f $(svcs -L cockroachdb)`, look for a line starting with `RPC client flags:`. As one example, this may look like `/opt/oxide/cockroachdb/bin/cockroach <client cmd> --host=[fd00:1122:3344:101::3]:32221 --insecure`

4. Run that command, with however you want to access `cockroach`. One notable `<client cmd>` is `sql`, which grants access to a SQL shell.
15 changes: 8 additions & 7 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -172,13 +172,14 @@ be set as a default route for the Nexus zone.
| Service | Endpoint
| Sled Agent: Bootstrap | Derived from MAC address of physical data link.
| Sled Agent: Dropshot API | `[fd00:1122:3344:0101::1]:12345`
| Cockroach DB | `[fd00:1122:3344:0101::2]:32221`
| Nexus: Internal API | `[fd00:1122:3344:0101::3]:12221`
| Oximeter | `[fd00:1122:3344:0101::4]:12223`
| Clickhouse | `[fd00:1122:3344:0101::5]:8123`
| Crucible Downstairs 1 | `[fd00:1122:3344:0101::6]:32345`
| Crucible Downstairs 2 | `[fd00:1122:3344:0101::7]:32345`
| Crucible Downstairs 3 | `[fd00:1122:3344:0101::8]:32345`
| Switch Zone | `[fd00:1122:3344:0101::2]`
smklein marked this conversation as resolved.
Show resolved Hide resolved
| Cockroach DB | `[fd00:1122:3344:0101::3]:32221`
| Nexus: Internal API | `[fd00:1122:3344:0101::4]:12221`
| Oximeter | `[fd00:1122:3344:0101::5]:12223`
| Clickhouse | `[fd00:1122:3344:0101::6]:8123`
| Crucible Downstairs 1 | `[fd00:1122:3344:0101::7]:32345`
| Crucible Downstairs 2 | `[fd00:1122:3344:0101::8]:32345`
| Crucible Downstairs 3 | `[fd00:1122:3344:0101::9]:32345`
| Internal DNS Service | `[fd00:1122:3344:0001::1]:5353`
| Nexus: External API | `192.168.1.20:80`
| Internet Gateway | None, but can be set in `smf/sled-agent/config-rss.toml`
Expand Down
16 changes: 13 additions & 3 deletions openapi/sled-agent.json
Original file line number Diff line number Diff line change
Expand Up @@ -1525,15 +1525,15 @@
"$ref": "#/components/schemas/ServiceType"
}
},
"zone_name": {
"type": "string"
"zone_type": {
"$ref": "#/components/schemas/ZoneType"
}
},
"required": [
"addresses",
"id",
"services",
"zone_name"
"zone_type"
]
},
"Slot": {
Expand Down Expand Up @@ -1834,6 +1834,16 @@
"required": [
"rules"
]
},
"ZoneType": {
"description": "The type of zone which may be requested from Sled Agent",
"type": "string",
"enum": [
"internal_dns",
"nexus",
"oximeter",
"switch"
]
}
}
}
Expand Down
14 changes: 12 additions & 2 deletions package-manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,11 @@ output.type = "zone"
setup_hint = "Run `./tools/ci_download_cockroachdb` to download the necessary binaries"

[package.internal-dns]
service_name = "internal-dns"
service_name = "internal_dns"
smklein marked this conversation as resolved.
Show resolved Hide resolved
source.type = "local"
source.rust.binary_names = ["dnsadm", "dns-server"]
source.rust.release = true
source.paths = [ { from = "smf/internal-dns", to = "/var/svc/manifest/site/internal-dns" } ]
source.paths = [ { from = "smf/internal-dns", to = "/var/svc/manifest/site/internal_dns" } ]
output.type = "zone"

[package.omicron-gateway]
Expand Down Expand Up @@ -161,13 +161,23 @@ source.sha256 = "208ae10a61f834608378eb135e4b6e5993dc363019b8fba75465b6ea5506b63
output.type = "zone"
output.intermediate_only = true

# To package and install the asic variant of the switch, do:
#
# $ cargo run --release -p omicron-package -- -t switch_variant=asic package
# $ pfexec ./target/release/omicron-package -t switch_variant=asic install
[package.switch-asic]
service_name = "switch"
only_for_targets.switch_variant = "asic"
source.type = "composite"
source.packages = [ "omicron-gateway.tar.gz", "dendrite-asic.tar.gz" ]
output.type = "zone"

# To package and install the stub variant of the switch, do the following:
#
# - Set the sled agent's configuration option "stub_scrimlet" to "true"
# - Run the following:
# $ cargo run --release -p omicron-package -- -t switch_variant=stub package
# $ pfexec ./target/release/omicron-package -t switch_variant=stub install
[package.switch-stub]
service_name = "switch"
only_for_targets.switch_variant = "stub"
Expand Down
5 changes: 2 additions & 3 deletions sled-agent/src/bootstrap/agent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ use crate::illumos::dladm::{self, Dladm, PhysicalLink};
use crate::illumos::zone::Zones;
use crate::server::Server as SledServer;
use crate::sp::SpHandle;
use omicron_common::address::{get_sled_address, Ipv6Subnet};
use omicron_common::address::Ipv6Subnet;
use omicron_common::api::external::{Error as ExternalError, MacAddr};
use omicron_common::backoff::{
internal_service_policy, retry_notify, BackoffError,
Expand Down Expand Up @@ -234,7 +234,7 @@ impl Agent {
) -> Result<SledAgentResponse, BootstrapError> {
info!(&self.log, "Loading Sled Agent: {:?}", request);

let sled_address = get_sled_address(request.subnet);
let sled_address = request.sled_address();

let mut maybe_agent = self.sled_agent.lock().await;
if let Some(server) = &*maybe_agent {
Expand Down Expand Up @@ -276,7 +276,6 @@ impl Agent {
let server = SledServer::start(
&self.sled_config,
self.parent_log.clone(),
sled_address,
request.clone(),
)
.await
Expand Down
14 changes: 12 additions & 2 deletions sled-agent/src/bootstrap/params.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@

use super::trust_quorum::SerializableShareDistribution;
use macaddr::MacAddr6;
use omicron_common::address::{Ipv6Subnet, SLED_PREFIX};
use omicron_common::address::{self, Ipv6Subnet, SLED_PREFIX};
use serde::{Deserialize, Deserializer, Serialize};
use serde_with::serde_as;
use serde_with::DeserializeAs;
use serde_with::PickFirst;
use std::borrow::Cow;
use std::net::Ipv4Addr;
use std::net::{Ipv4Addr, Ipv6Addr, SocketAddrV6};
use uuid::Uuid;

/// Information about the internet gateway used for externally-facing services.
Expand Down Expand Up @@ -78,6 +78,16 @@ pub struct SledAgentRequest {
pub subnet: Ipv6Subnet<SLED_PREFIX>,
}

impl SledAgentRequest {
pub fn sled_address(&self) -> SocketAddrV6 {
address::get_sled_address(self.subnet)
}

pub fn switch_ip(&self) -> Ipv6Addr {
address::get_switch_zone_address(self.subnet)
}
}

// We intentionally DO NOT derive `Debug` or `Serialize`; both provide avenues
// by which we may accidentally log the contents of our `share`. To serialize a
// request, use `RequestEnvelope::danger_serialize_as_json()`.
Expand Down
21 changes: 18 additions & 3 deletions sled-agent/src/illumos/dladm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
//! Utilities for poking at data links.
use crate::common::vlan::VlanID;
use crate::illumos::vnic::VnicKind;
use crate::illumos::link::{Link, LinkKind};
use crate::illumos::zone::IPADM;
use crate::illumos::{execute, ExecutionError, PFEXEC};
use omicron_common::api::external::MacAddr;
Expand Down Expand Up @@ -158,7 +158,7 @@ impl Dladm {

/// Creates a VNIC on top of the etherstub.
///
/// This VNIC is not tracked like [`crate::illumos::vnic::Vnic`], because
/// This VNIC is not tracked like [`crate::illumos::link::Link`], because
/// it is expected to exist for the lifetime of the sled.
pub fn ensure_etherstub_vnic(
source: &Etherstub,
Expand Down Expand Up @@ -221,6 +221,21 @@ impl Dladm {
Ok(())
}

/// Verify that the given link exists
pub fn verify_link(link: &str) -> Result<Link, FindPhysicalLinkError> {
smklein marked this conversation as resolved.
Show resolved Hide resolved
let mut command = std::process::Command::new(PFEXEC);
let cmd = command.args(&[DLADM, "show-link", "-p", "-o", "LINK", link]);
let output = execute(cmd)?;
match String::from_utf8_lossy(&output.stdout)
.lines()
.next()
.map(|s| s.trim())
{
Some(x) if x == link => Ok(Link::wrap_physical(link)),
_ => Err(FindPhysicalLinkError::NoPhysicalLinkFound),
}
}

/// Returns the name of the first observed physical data link.
pub fn find_physical() -> Result<PhysicalLink, FindPhysicalLinkError> {
let mut command = std::process::Command::new(PFEXEC);
Expand Down Expand Up @@ -322,7 +337,7 @@ impl Dladm {
.filter_map(|name| {
// Ensure this is a kind of VNIC that the sled agent could be
// responsible for.
match VnicKind::from_name(name) {
match LinkKind::from_name(name) {
Some(_) => Some(name.to_owned()),
None => None,
}
Expand Down
56 changes: 34 additions & 22 deletions sled-agent/src/illumos/vnic.rs → sled-agent/src/illumos/link.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
smklein marked this conversation as resolved.
Show resolved Hide resolved
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

//! API for controlling a single instance.
//! API for allocating and managing data links.
use crate::illumos::dladm::{
CreateVnicError, DeleteVnicError, VnicSource, VNIC_PREFIX,
Expand Down Expand Up @@ -54,13 +54,13 @@ impl<DL: VnicSource + Clone> VnicAllocator<DL> {
pub fn new_control(
&self,
mac: Option<MacAddr>,
) -> Result<Vnic, CreateVnicError> {
) -> Result<Link, CreateVnicError> {
let allocator = self.new_superscope("Control");
let name = allocator.next();
debug_assert!(name.starts_with(VNIC_PREFIX));
debug_assert!(name.starts_with(VNIC_PREFIX_CONTROL));
Dladm::create_vnic(&self.data_link, &name, mac, None)?;
Ok(Vnic { name, deleted: false, kind: VnicKind::OxideControl })
Ok(Link { name, deleted: false, kind: LinkKind::OxideControlVnic })
}

fn new_superscope<S: AsRef<str>>(&self, scope: S) -> Self {
Expand All @@ -82,22 +82,23 @@ impl<DL: VnicSource + Clone> VnicAllocator<DL> {
}
}

/// Represents the kind of a VNIC, such as whether it's for guest networking or
/// Represents the kind of a Link, such as whether it's for guest networking or
/// communicating with Oxide services.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum VnicKind {
OxideControl,
Guest,
pub enum LinkKind {
Physical,
OxideControlVnic,
GuestVnic,
}

impl VnicKind {
impl LinkKind {
/// Infer the kind from a VNIC's name, if this one the sled agent can
/// manage, and `None` otherwise.
pub fn from_name(name: &str) -> Option<Self> {
if name.starts_with(VNIC_PREFIX) {
Some(VnicKind::OxideControl)
Some(LinkKind::OxideControlVnic)
} else if name.starts_with(VNIC_PREFIX_GUEST) {
Some(VnicKind::Guest)
Some(LinkKind::GuestVnic)
} else {
None
}
Expand All @@ -106,7 +107,7 @@ impl VnicKind {

#[derive(thiserror::Error, Debug)]
#[error("VNIC with name '{0}' is not valid for sled agent management")]
pub struct InvalidVnicKind(String);
pub struct InvalidLinkKind(String);

/// Represents an allocated VNIC on the system.
/// The VNIC is de-allocated when it goes out of scope.
Expand All @@ -115,30 +116,41 @@ pub struct InvalidVnicKind(String);
/// another process in the global zone could also modify / destroy
/// the VNIC while this object is alive.
#[derive(Debug)]
pub struct Vnic {
pub struct Link {
name: String,
deleted: bool,
kind: VnicKind,
kind: LinkKind,
}

impl Vnic {
impl Link {
/// Takes ownership of an existing VNIC.
pub fn wrap_existing<S: AsRef<str>>(
name: S,
) -> Result<Self, InvalidVnicKind> {
match VnicKind::from_name(name.as_ref()) {
Some(kind) => Ok(Vnic {
) -> Result<Self, InvalidLinkKind> {
match LinkKind::from_name(name.as_ref()) {
Some(kind) => Ok(Self {
name: name.as_ref().to_owned(),
deleted: false,
kind,
}),
None => Err(InvalidVnicKind(name.as_ref().to_owned())),
None => Err(InvalidLinkKind(name.as_ref().to_owned())),
}
}

/// Wraps a physical nic in a Link structure.
///
/// It is the caller's responsibility to ensure this is a physical link.
pub fn wrap_physical<S: AsRef<str>>(name: S) -> Self {
Link {
name: name.as_ref().to_owned(),
deleted: false,
kind: LinkKind::Physical,
}
}

/// Deletes a NIC (if it has not already been deleted).
pub fn delete(&mut self) -> Result<(), DeleteVnicError> {
if self.deleted {
if self.deleted || self.kind == LinkKind::Physical {
Ok(())
} else {
self.deleted = true;
Expand All @@ -150,16 +162,16 @@ impl Vnic {
&self.name
}

pub fn kind(&self) -> VnicKind {
pub fn kind(&self) -> LinkKind {
self.kind
}
}

impl Drop for Vnic {
impl Drop for Link {
fn drop(&mut self) {
let r = self.delete();
if let Err(e) = r {
eprintln!("Failed to delete VNIC: {}", e);
eprintln!("Failed to delete Link: {}", e);
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion sled-agent/src/illumos/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ use cfg_if::cfg_if;

pub mod addrobj;
pub mod dladm;
pub mod link;
pub mod running_zone;
pub mod svc;
pub mod vnic;
pub mod zfs;
pub mod zone;
pub mod zpool;
Expand Down
Loading