-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate new probe interface into GCP framework #259
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #259 +/- ##
==========================================
+ Coverage 24.82% 25.08% +0.25%
==========================================
Files 23 24 +1
Lines 1740 1802 +62
==========================================
+ Hits 432 452 +20
- Misses 1286 1327 +41
- Partials 22 23 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this entire file is unnecessary, as it's the same as the other machine_images.go
.
I'd really like to find a way to combine more of the common functionality here. For example, Ideally the same probe should be used independently of AWS or GCP, perhaps with some configuration that can be passed in via parameters from the |
/approve Holding off on an lgtm as I don't have time to test this at the moment, but all looks good. |
After meeting with @AlexVulaj I propose a new PR where we groom the GCP egress URL list because
running the network verifier results in:
I think the network verifier is working correctly in identifying blocked egress, and the egress URL list for GCP should be groomed. Addressed in OSD-24918 |
pkg/probes/package_probes.go
Outdated
@@ -9,6 +9,6 @@ type Probe interface { | |||
GetMachineImageID(platformType string, cpuArch cpu.Architecture, region string) (string, error) | |||
GetStartingToken() string | |||
GetEndingToken() string | |||
GetExpandedUserData(map[string]string) (string, error) | |||
GetExpandedUserData(map[string]string, string) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - this will help us remember what these values are supposed to be in the future if we ever implement a new probe.
GetExpandedUserData(map[string]string, string) (string, error) | |
GetExpandedUserData(userDataVariables map[string]string, userDataTemplate string) (string, error) |
pkg/verifier/gcp/entry_point.go
Outdated
if vei.Timeout == 0 { | ||
vei.Timeout = DEFAULT_TIMEOUT | ||
} | ||
|
||
vei.Timeout = DEFAULT_TIMEOUT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section looks a little confusing - it looks like vei.Timeout
will always get set to DEFAULT_TIMEOUT
no matter what. What's the intention here for setting that timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to if vei.Timeout <= 0
and only setting timeout there based on those conditions.
service to print userdata begin runs multiple times despite succeeding. service to run curl fails. printing end token requires curl to succeed so it does not start running. crash recovery kernel arming runs everytime vm is created.
shell script prints starting and ending token and runs curl. shell script checks if command was successful. also added external IP so curl has an output
used a systemd script and shell script that gets GCP NAME and ZONE so the instance self deletes. this ensures that if something happens to the client, the resource is not left on customer accounts.
Egress URLs are fetched from GitHub as done in the AWS verifier
target after multi-user unneeded because other outputs are completely silenced and wont interfere with probe output
systemd service that runs curl should only fail if bash script returns an error code, this happens if curl error code is 1-4, 27, 41-43, 45 which means curl failed, not a network failure
added userDataTemplate to the probe interface so getExpandedUserData can take in different templates for GCP and AWS
Co-authored-by: Alex Vulaj <ajvulaj@gmail.com>
Co-authored-by: Alex Vulaj <ajvulaj@gmail.com>
made minor changes to curl probe to make more readible and added functionality to startup-script to add cacerts and export proxy environment variables if specified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome work @eth1030! I suggested some changes inline. In general, you'll want to explain some of the less-obvious workarounds you (very adeptly) implemented to get this working
pkg/probes/legacy/legacy.go
Outdated
@@ -79,7 +79,7 @@ func (lgp Probe) GetMachineImageID(platformType string, cpuArch cpu.Architecture | |||
// variables listed in the template's "network-verifier-required-variables" directive, or if | |||
// values *are* provided for variables that must be set to a certain value for the probe to | |||
// function correctly (presetUserDataVariables) -- this function will fill-in those values for you. | |||
func (lgp Probe) GetExpandedUserData(userDataVariables map[string]string) (string, error) { | |||
func (lgp Probe) GetExpandedUserData(userDataVariables map[string]string, _ string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to the doc string above noting why this probe doesn't need that last parameter
pkg/probes/curl/curl_json.go
Outdated
@@ -81,7 +77,7 @@ func (clp Probe) GetMachineImageID(platformType string, cpuArch cpu.Architecture | |||
// variables listed in the template's "network-verifier-required-variables" directive, or if | |||
// values *are* provided for variables that must be set to a certain value for the probe to | |||
// function correctly (presetUserDataVariables) -- this function will fill-in those values for you. | |||
func (clp Probe) GetExpandedUserData(userDataVariables map[string]string) (string, error) { | |||
func (clp Probe) GetExpandedUserData(userDataVariables map[string]string, userDataTemplate string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update docstring with new param
pkg/probes/legacy/legacy_test.go
Outdated
@@ -213,7 +213,7 @@ func TestLegacyProbe_GetExpandedUserData(t *testing.T) { | |||
|
|||
prb := Probe{} | |||
// First check if function is returning an error | |||
got, err := prb.GetExpandedUserData(tt.userDataVariables) | |||
got, err := prb.GetExpandedUserData(tt.userDataVariables, userDataTemplate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider passing an empty string here instead of userdataTemplate to reduce potential confusion
pkg/verifier/gcp/entry_point.go
Outdated
const ( | ||
cloudImageIDDefault = "rhel-9-v20240703" | ||
DEFAULT_CLOUDIMAGEID = "rhel-9-v20240703" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already defined inside the probe (machine_images.go)
pkg/verifier/gcp/entry_point.go
Outdated
"ret": "${ret}", | ||
"?": "$?", | ||
"array[@]": "${array[@]}", | ||
"value": "$value", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining why this is necessary
pkg/verifier/gcp/entry_point.go
Outdated
g.Logger.Debug(vei.Ctx, "Generated userdata script:\n---\n%s\n---", userData) | ||
|
||
if vei.CloudImageID == "" { | ||
vei.CloudImageID = cloudImageIDDefault | ||
vei.CloudImageID = DEFAULT_CLOUDIMAGEID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use probe.GetMachineImageID()
here instead of a global constant. See also my above comment on that constant
func Test_get_unreachable_endpoints(t *testing.T) { | ||
type args struct { | ||
consoleOutput string | ||
probe probes.Probe | ||
} | ||
tests := []struct { | ||
name string | ||
args args | ||
wantErr bool | ||
}{ | ||
// TODO: Add test cases. | ||
} | ||
for _, tt := range tests { | ||
t.Run(tt.name, func(t *testing.T) { | ||
if err := get_unreachable_endpoints(tt.args.consoleOutput, tt.args.probe); (err != nil) != tt.wantErr { | ||
t.Errorf("get_unreachable_endpoints() error = %v, wantErr %v", err, tt.wantErr) | ||
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func Test_get_unreachable_endpoints(t *testing.T) { | |
type args struct { | |
consoleOutput string | |
probe probes.Probe | |
} | |
tests := []struct { | |
name string | |
args args | |
wantErr bool | |
}{ | |
// TODO: Add test cases. | |
} | |
for _, tt := range tests { | |
t.Run(tt.name, func(t *testing.T) { | |
if err := get_unreachable_endpoints(tt.args.consoleOutput, tt.args.probe); (err != nil) != tt.wantErr { | |
t.Errorf("get_unreachable_endpoints() error = %v, wantErr %v", err, tt.wantErr) | |
} | |
}) | |
} | |
} |
It's understandable to skip tests here for now, but no need to leave the boilerplate code in the meantime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would appear that two separate copies of this file now exist, which poses a maintainability challenge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately go embed can only handle embedding files from the same directory, a solution is to take in a platformType
parameter in getExpandedUserData
preemptively embed both userdata-template
and startup-script
in the curl probe and then use one of them depending on the platformType
. Any thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the userdata-template
in the curl directory is only used for testing.
pkg/verifier/aws/entry_point.go
Outdated
@@ -177,7 +181,7 @@ func (a *AwsVerifier) ValidateEgress(vei verifier.ValidateEgressInput) *output.O | |||
userDataVariables["DELAY"] = "60" | |||
} | |||
|
|||
unencodedUserData, err := vei.Probe.GetExpandedUserData(userDataVariables) | |||
unencodedUserData, err := vei.Probe.GetExpandedUserData(userDataVariables, userDataTemplate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this has us providing the same value of userDataTemplate to every probe, yeah? My first thought is that different probes need to use different user-data, but then I saw that your LegacyProbe just ignores the userDataTemplate param. So it works, but it's not immediately clear to somebody scanning through the code why this would work. IOW, this needs comments at the very least, but let's have a larger discussion about this first
a workaround of not changing the probe interface and using a different template for GetExpandedUserData is to pass a userDataVariable through the probe interface. In the GetExpandedUserData, based off if a GCP platform exists, the function chooses userdata-template or startup-script. also changed gcp verifier to use GetMachineImageID which sets default machine image.
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really, really good work and very close to merging.
pkg/probes/curl/curl_json.go
Outdated
//go:embed startup-script.sh | ||
var startupScript string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//go:embed startup-script.sh | |
var startupScript string | |
//go:embed systemd-template.sh | |
var systemdTemplate string |
Renaming to make more consistent with userdata-template.yaml. More broadly, I'd like to rebrand your approach as the "systemd curl probe" vs. the "GCP curl probe" so that things don't get confusing if/when we try to use systemd on AWS in the future
pkg/probes/curl/curl_json.go
Outdated
// Check if GCP instance is being used, if so use startup-script instead of userdata-template | ||
// Only GCP should have USE_GCP_STARTUPSCRIPT userDataVariable | ||
// Serves as a workaround for identifying platform type without a new probe interface | ||
if userDataVariables["USE_GCP_STARTUPSCRIPT"] == "true" { | ||
userDataTemplate = startupScript | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check if GCP instance is being used, if so use startup-script instead of userdata-template | |
// Only GCP should have USE_GCP_STARTUPSCRIPT userDataVariable | |
// Serves as a workaround for identifying platform type without a new probe interface | |
if userDataVariables["USE_GCP_STARTUPSCRIPT"] == "true" { | |
userDataTemplate = startupScript | |
} | |
// Use systemd to run curl (instead of cloud-init) if requested. Useful for | |
// platforms that don't include cloud-init in their OS images (e.g., GCP) | |
if userDataVariables["USE_SYSTEMD"] == "true" { | |
userDataTemplate = systemdTemplate | |
} |
See above "rebrand" comment
pkg/probes/curl/machine_images.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: is there a benefit to pinning to a specific image version? Does it still work if we just pass "rhel-9" instead of "rhel-9-v20240709"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work if we pass in "rhel-9"
pkg/verifier/gcp/entry_point.go
Outdated
// Set instance type to default if not specified and validate it | ||
if vei.InstanceType == "" { | ||
vei.InstanceType = DEFAULT_INSTANCE_TYPE | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Set instance type to default if not specified and validate it | |
if vei.InstanceType == "" { | |
vei.InstanceType = DEFAULT_INSTANCE_TYPE | |
} | |
// Set instance type to default if not specified and validate it | |
if vei.InstanceType == "" { | |
vei.InstanceType, err = vei.CPUArchitecture.DefaultInstanceType(helpers.PlatformGCP) | |
if err != nil { | |
return g.Output.AddError(err) | |
} | |
g.Logger.Debug(vei.Ctx, fmt.Sprintf("defaulted to instance type %s", vei.InstanceType)) | |
} |
Use vei.CPUArchitecture.DefaultInstanceType()
instead of a DEFAULT_INSTANCE_TYPE
constant in order to enable ARM-friendliness. We also probably need to do a little work on lining up the "user specified a bad instance type; what do?" logic with how it's done on the AWS side, but this works okay for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks in this one. After you fix those, feel free to post in the Aurora channel asking the team to test the code in their GCP accounts
Co-authored-by: Anthony Byrne <abyrne@redhat.com>
Co-authored-by: Anthony Byrne <abyrne@redhat.com>
// function that tests probe order logic that is part of findUnreachableEndpoints in gcp_verifier.go | ||
// get_tokens checks for the presence of startingToken and endingToken in the consoleOutput | ||
// probe outsput should be between startingToken and endingToken | ||
func get_tokens(consoleOutput string, probe probes.Probe) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick here - use camelcase for function names in Go, i.e. getTokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and confirmed GCP functionality using the following command
GCP_PROJECT_ID=abyrne ./osd-network-verifier egress --platform=gcp-classic --vpc-name=default --subnet-id=default --debug
Output was as-expected: verifiergcp.2024-08-09T15.59.10Z.log.
Also performed a basic test against AWS and confirmed that there were no obvious regressions.
Was not able to test ARM functionality as cmd.go does not pass the value of the --cpu-arch
flag into vei.CPUArchitecture
in the GCP-specific case (only in the AWS case), but that can be addressed in a future PR.
/approve
/lgtm
/hold
Awesome work @eth1030! Feel free to /unhold
whenever you're ready
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abyrne55, AlexVulaj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@eth1030: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/unhold |
/lgtm |
What does this PR do? / Related Issues / Jira
Part of OSD-24065, this PR integrates new probe interface into GCP framework.
Checklist
Reviewer's Checklist
How to test this PR locally / Special Instructions
set environment variable GCP_PROJECT_ID to the project ID of the VPC
run verifier optionally with
--debug
flagLogs
Expected output: verifier is able to log any blocked egresses
If client breaks or loses connection with instance, instance should self-delete within default or specified time