-
Notifications
You must be signed in to change notification settings - Fork 59
[reconfigurator-planning] add test to ensure DNS lookups work on example system #9233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this!
} | ||
|
||
/// Returns the list of DNS service names expected in an example system. | ||
fn service_names_to_query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first thought was that we'd use strum
to iterate over the variants and then maybe a match to say what we expected for each one (sort of combining this function with match_service_names
). I guess that won't quite work because of the variants that have data? I see that strum
will happily give these to you anyway with data filled in with default()
but that won't match what's in our system. Is that we have to unroll all this?
Suggestion: do the above anyway, but wherever we have a variant with the default data, we just replace it with the data that we want from the example system. I'm not sure but I hope that'll be less code, clearer, and impossible to get out of sync in the way that you could today by adding something to match_service_names
but not this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So strum is what I first tried to use, but the Default
impl for the variants with data just didn't sit right with me. But I agree that the scheme I did come up with is pretty fragile, so maybe the strum approach is the least bad one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems reasonable.
logctx.cleanup_successful(); | ||
} | ||
|
||
/// Returns the list of DNS service names expected in an example system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite right since there are service names here that aren't expected in the example system, and there are values here too. I guess it's a map with:
- all DNS service names that carry no data, plus
- DNS service names that do carry data, with specific data for the example system
and the values are what you'd expect to find by looking each one up in the example system.
Admittedly that's awkward to describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworded as:
/// Returns the list of potential DNS service names in an example system,
/// along with whether we expect a successful or error response for each of
/// them.
// When adding a new variant, ensure that it is covered by | ||
// service_names_to_query above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is alright but I feel like it's going to be too easy for someone to see this compile error and just add a new entry here without adding the case above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed via strum.
|
||
out.insert(ServiceName::CruciblePantry, Ok(())); | ||
|
||
// XXX the SledAgent service name doesn't appear to be used? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it probably is unused but arguably should be. I think we could incorporate it in the same way that you did for Crucible below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we don't even register the sled agent service names within blueprint_internal_dns_config
today. Have added these to the map with a response of QueryError::NoRecordsFound
.
// is fragmented. Add a short timeout (5s) for this. | ||
let lookup_fut = tokio::time::timeout( | ||
Duration::from_secs(5), | ||
resolver.lookup_all_socket_v6(service), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the client not already do this? I'd rather not add a shorter timeout for a case that we don't expect to come up in a correct test run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client does do this with a 15s timeout. I just thought that wasn't a good experience. I don't care strongly either way so I'll just rely on the client timeout.
); | ||
} | ||
|
||
// TODO: add assertions on the returned addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: add assertions on the returned addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should check that the addresses match what we expect? I kind of skipped over that because I wanted to detect packet fragmentation first.
// Wait for at least one result to be returned. Like above, we | ||
// don't want to wait forever, so set a reasonable timeout. | ||
let backends = match tokio::time::timeout( | ||
Duration::from_secs(5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we probably don't already have a client timeout here, I'd make this more generous -- maybe 30s? To avoid flakes on a busy system. If the test is working, we won't hit it, right?
Or are we querying for ones that we expect to find no records for? I think I would just leave those out. We don't care to verify that we don't have records for stuff that's not in the example system. And it's hard with this particular interface to identify that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used a 15s timeout to match the simple DNS resolver timeout.
Or are we querying for ones that we expect to find no records for? I think I would just leave those out. We don't care to verify that we don't have records for stuff that's not in the example system. And it's hard with this particular interface to identify that case.
No, I skip over those checks with the qorb resolver since that is level-triggered (only test them against the simple resolver)
} | ||
}; | ||
|
||
// TODO: add assertions on the returned addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: add assertions on the returned addresses. |
Created using spr 1.3.6-beta.1
Ensure that DNS lookups for all internal services work on the example system, through both the standard and qorb DNS resolvers.
I did notice that the example system doesn't set up every service. Something worth doing in a followup, but this does catch issues like (manually tested) the repo depot response being too large.