-
Notifications
You must be signed in to change notification settings - Fork 44
DNS servers should have NS and SOA records #8047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
plus rustfmt, clippy
842455b
to
f349290
Compare
f349290
to
fa47ab1
Compare
impl From<Srv> for DnsRecord { | ||
fn from(srv: Srv) -> Self { | ||
DnsRecord::Srv(srv) | ||
} | ||
} | ||
|
||
#[derive( | ||
Clone, | ||
Debug, | ||
Serialize, | ||
Deserialize, | ||
JsonSchema, | ||
PartialEq, | ||
Eq, | ||
PartialOrd, | ||
Ord, | ||
)] | ||
pub struct Srv { | ||
pub prio: u16, | ||
pub weight: u16, | ||
pub port: u16, | ||
pub target: String, | ||
} | ||
|
||
impl From<v1::config::Srv> for Srv { | ||
fn from(other: v1::config::Srv) -> Self { | ||
Srv { | ||
prio: other.prio, | ||
weight: other.weight, | ||
port: other.port, | ||
target: other.target, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other option here is to use the v1::config::Srv
type directly in v2, because it really has not changed. weaving the V1/V2 types together seems more difficult to think about generally, but i'm very open to the duplication being more confusing if folks feel that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably use the v1 types directly but I can see going either way.
just felt that name implied more than it should
dev-tools/reconfigurator-cli/tests/output/cmds-set-remove-mupdate-override-stdout
Outdated
Show resolved
Hide resolved
@@ -4,9 +4,12 @@ load-example --seed test_expunge_newly_added_external_dns | |||
|
|||
blueprint-show 3f00b694-1b16-4aaa-8f78-e6b3a527b434 | |||
blueprint-edit 3f00b694-1b16-4aaa-8f78-e6b3a527b434 expunge-zone 9995de32-dd52-4eb1-b0eb-141eb84bc739 | |||
blueprint-diff 3f00b694-1b16-4aaa-8f78-e6b3a527b434 366b0b68-d80e-4bc1-abd3-dc69837847e0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, between the diff size and having conflicting changes on main, i had a hard time keeping the output a more legible "file moved and now has some additional lines". instead, git shows the diff as a fully new file even though it's mostly the prior content.
blueprint-diff
includes the DNS output though, which is of course what i actually care about here. if this is a bear to review (and i'm pretty empathetic to it being a lot) i'm open to moving the DNS checking over to a new test and leaving this unchanged, or moving the internal DNS testing to live in this test as well.
|
||
blueprint-show 62422356-97cd-4e0f-bd17-f946c25193c1 | ||
blueprint-edit 62422356-97cd-4e0f-bd17-f946c25193c1 expunge-zone 3fc76516-d258-48bc-b25e-9fca5e37c888 | ||
blueprint-diff 62422356-97cd-4e0f-bd17-f946c25193c1 14b8ff1c-91ff-4ab7-bb64-3c0f5f642e09 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one surprised me and i've added this diff to reiterate that for testing: internal DNS zones are not replaced simply as a result of being expunged, since we might need to reuse the IP that server was listening on. for internal DNS in particular, the expunged zone must be ready_for_cleanup
. i don't know concretely what that means (sled-agent did a collection and saw the zone is gone?), but that's a critical step in actually seeing DNS changes in the diff below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know concretely what that means (sled-agent did a collection and saw the zone is gone?)
Almost! Reconfigurator will mark a zone ready for cleanup during planning if in the most recent inventory collection, sled-agent reported:
- the zone is gone
- the generation of the sled's config is >= the generation in which the zone was expunged (to avoid a race where the zone is gone because it hasn't even started yet)
on one hand: now that DNS servers are referenced by potentially two different AAAA records, both of those records are potentially the target of a SRV record. though, we don't have SRV records for the DNS interface. this test had failed at first because we'd find a DNS server's IP via the `ns1.` record, which means we'd miss that the same zone was referenced by an AAAA record for the illumos zone UUID. on the other hand: #[nexus_test] environments involve a mock of the initial RSS environment construction? so now that the first blueprint adds NS records, this mock RSS environment was out of date, and a test that the first blueprint after "RSS" makes no change failed because the "RSS" environment was wrong.
each name has a list of records so calling the high-level collection "records" makes for some confusing words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice -- this is looking pretty good! I don't think anything here is a real blocker but it would be good to cleanup if we can.
/// | ||
/// this typically does not mean anything different than any other expunged | ||
/// zone, except that internal DNS zones are not replaced until they are | ||
/// definitively marked "ready for cleanup". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd strike this. I don't think that's true. IIRC Nexus zones have a cleanup step that involves re-assigning sagas and Cockroachdb zones have a step that decommissions nodes, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that makes sense. i'd looked for where else ready_for_cleanup
is used but probably missed some details. i hadn't realized at this point that i can write # comments
in the reconfigurator-cli tests anyway, which is really where i wanted to highlight this command.
# Mark the internal DNS zone ready for cleanup. | ||
# This approximates sled-agent performing an inventory collection and seeing the DNS zone has gone away. | ||
# Afterward, diffing should show that the server's records are removed from DNS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the behavior here is correct but the comment seems wrong to me. The DNS records for the internal DNS server expunged at L5 were shown as removed in the diff at L7, right? And there are no DNS changes in the diff at L13. This is the behavior I'd expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed on both counts. i'd meant to emphasize that until we mark-for-cleanup
, a new plan won't add a new internal DNS zone even though the old one had been expunged, but misremembered what i'd seen in the output and said it pretty poorly. lemme clean that up too..
dropshot::HttpError, | ||
> { | ||
let result = Self::dns_config_get(rqctx).await?; | ||
match result.0.try_into() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. It seems like the API versioning stuff worked out nicely here.
internal-dns/types/src/config.rs
Outdated
anyhow::ensure!( | ||
service == ServiceName::ExternalDns, | ||
"This method is only valid for external DNS servers, \ | ||
but we were provided the service '{service:?}'", | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I'd remove this argument altogether.
let dns_config_blueprint = DnsConfigParams { | ||
zones: vec![dns_zone_blueprint], | ||
time_created: chrono::Utc::now(), | ||
generation: blueprint_generation.next(), | ||
serial: new_dns_generation.as_u64().try_into().map_err(|_| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see -- it looks like you split the difference here. The configuration distinguishes between "serial" and "generation", but this is the only place that sets them, and it always makes them the same. So we don't have to worry about maintaining a serial in lockstep with the generation when we update the database.
This seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i really like the status quo that there is not a DnsConfigParams
which can result in the DNS server failing to serve records. to maintain that either DnsConfigParams::generation
should become a u32 (seems very wrong), or serial
ends up a distinct u32
.
internal-dns/types/src/v2/config.rs
Outdated
#[derive(Clone, Debug, Serialize, Deserialize, JsonSchema, PartialEq, Eq)] | ||
pub struct DnsConfigZone { | ||
pub zone_name: String, | ||
pub names: HashMap<String, Vec<DnsRecord>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nitty and unimportant, but: I feel like records
was more accurate. I guess I expect maps to be named either by what each key-value pair represents or what the value represents, not what the key represents. But now I wonder how universal that is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i suppose i was thinking about this as: a "name" is the pair of a label and a collection of records, and we often happen to call the label a "name". that's not totally accurate, since the key here could be multiple labels anyway. but i agree with your instinct and this is why it didn't strike me as confusing at first :)
this was a simple change, i'll probably revert it and add a few comments on the relevant test asserts instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could be right if people read "DNS name" to refer to the (label, records) pair. I tend to use that interchangeably with "label" but maybe that's wrong.
Anyway, not a big deal either way, though there's something to be said for not having different names for the same thing in two different API versions. Then again, we can probably remove API version 1 in the next release anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already reverted it! i expect i'm the outlier here, and either way it ends up ambiguous in some circumstances.
Co-authored-by: David Pacheco <dap@oxidecomputer.com>
confusing name options abound. "names" is ambiguous with the keys, "records" is ambiguous with the values, maybe it would be better to call this "subdomains"???? but for now stick with what we've got and add some clarifying comments. This reverts commit ff63ea1.
* incorrect comments around the internal DNS expunge test * internal DNS config does not need to track external DNS separately
this is probably the more exciting part of the issues outlined in #6944. the changes here get us to the point that for both internal and external DNS, we have:
ns1.<zone>
,ns2.<zone>
, ...)ns*.<zone>
described aboveoxide.internal
(for internal DNS) and$delegated_domain
(for external DNS)we do not support zone transfers here. i believe the SOA record here would be reasonable to guide zone transfers if we did, but obviously that's not something i've tested.
SOA fields
the SOA record's
RNAME
is hardcoded toadmin@<zone_name>
. this is out of expediency to provide something, but it's probably wrong most of the time. there's no way to get an MX record installed for<zone_name>
in the rack's external DNS servers, so barring DNS hijinks in the deployed environment, this will be a dead address. problems here are:it seems like the best answer here is to allow configuration of the rack's delegated domain and zone after initial setup, and being able to update an administrative email would fit in pretty naturally there. but we don't have that right now, so
admin@
it is. configuration of external DNS is probably more important in the context of zone transfers and permitting a list of remote addresses to whom we're willing to permit zone transfers. so it feels like this is in the API's future at some point.bonus
one minorly interesting observation along the way is that external DNS servers in particular are reachable at a few addresses - whichever public address they get in the rack's internal address range, and whichever address they get in the external address range. the public address is what's used for A/AAAA records. so, if you're looking around from inside a DNS zone you can get odd-looking answers like: