Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add public IPs validation task #2070

Merged
merged 12 commits into from
Oct 31, 2023
Merged

Add public IPs validation task #2070

merged 12 commits into from
Oct 31, 2023

Conversation

AbdelrahmanElawady
Copy link
Contributor

@AbdelrahmanElawady AbdelrahmanElawady commented Oct 3, 2023

Description

Add public IPs validation task while ensuring only runs on one node from the farm.

Steps

  • On node startup create a test namespace and macvlan once for the task to use. Do not create them if they already exist.
  • Decide if the node should run the task or another one in the farm based on the node ID. The node with the least ID and with power target as up should run it. The other will log why they shouldn't run the task and exit with no errors.
  • Get public IPs set on the farm.
  • Inside the test namespace and using the macvlan link, assign each IP and its routes to the link and check reliable source for the real public IP. If they match then the public IP is valid.
  • Delete all IPs and routes between each IP validation and before running the task to ensure they are available to be used and no problems happen between task runs.

Issues

#2069

@xmonader
Copy link
Collaborator

Blocked on docs

Comment on lines 324 to 328
err = ensureTestNamespace(br)
if err != nil {
return nil, fmt.Errorf("failed to create test namespace: %w", err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
err = ensureTestNamespace(br)
if err != nil {
return nil, fmt.Errorf("failed to create test namespace: %w", err)
}
if err := ensureTestNamespace(br); err != nil {
return nil, errors.Wrap(err, "failed to create test namespace")
}

pkg/network/public/public.go Show resolved Hide resolved
log.Err(err).Send()
continue
}
err = macvlan.Install(mv, nil, ipNet, routes, netNS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe you need to pass netNS because u are already inside the namespace. it's then better to pass nil to avoid unpredeceted errors

}
unusedIPs := make(map[string]bool)
err = netNS.Do(func(nn ns.NetNS) error {
mv, err := macvlan.GetByName(testMacvlan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much code in the Do closure, i prefer u create another function that has the logic then do netNS.Do(p.logic) this way the indentation of the code won't also be too much

Comment on lines 136 to 143
for {
nodeID, err = registrar.NodeID(ctx)
if err == nil {
break
}
log.Err(err).Msg("failed to get node id")
time.Sleep(10 * time.Second)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a package backoff that we use you should use here instead of this.

return string(body), nil
}

func deleteIPAndRoutes(publicIP substrate.PublicIP, routes []*netlink.Route, macvlan netlink.Link) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think you should just delee all Ips and routes blindly, so u start by listing the ips on this link and ALL routes inside the namespace and delete them.

This function should also be called before the test to make sure there are no left over ips on the device because of a crash

@AbdelrahmanElawady
Copy link
Contributor Author

Also, I think the directory of the tasks should not be called perf as it is concerned with running tasks and monitoring them. Maybe tasks is better? or moving tasks not related to "performance" to another directory?

Copy link
Member

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also create a sub pkg under perf to include all your files related to this task for organization purposes

Comment on lines 211 to 212
pubIPTask := perf.NewPublicIPValidationTask()
perfMon.AddTask(pubIPTask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do u need the intermediate variable pubIPTask ?

Suggested change
pubIPTask := perf.NewPublicIPValidationTask()
perfMon.AddTask(pubIPTask)
perfMon.AddTask(perf.NewPublicIPValidationTask())

@@ -26,6 +26,8 @@ import (
const (
toZosVeth = "tozos" // veth pair from br-pub to zos
publicNsMACDerivationSuffix = "-public"
testMacvlan = "pubtestmacvlan"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird name, this can just be pub or something. it's inside a namespace anyway so no chances of name collision.

@@ -349,6 +355,24 @@ func EnsurePublicSetup(nodeID pkg.Identifier, inf *pkg.PublicConfig) (*netlink.B
return br, netlink.LinkSetUp(br)
}

func ensureTestNamespace(publicBrdige *netlink.Bridge) error {
netNS, err := namespace.GetByName(testNamespace)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you check GetByName implementation you will see that it returns os.ErrNotExist if the namespace does not exist. So that's the only valid case you should try to create the namespace, then any other error type should be returned

netNS, err := GetByName(name)
if os.IsNotExist(err) {
   err = Create(name)
}

if err != nil {
   return err
}

return "", err
}
defer req.Body.Close()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to check the returned status from the req in case it's not 200

Comment on lines 157 to 165
realIP, err := getRealPublicIP()
if err != nil {
p.farmIPsReport[publicIP.IP] = IPReport{
State: SkippedState,
Reason: FetchRealIPFailed,
}
log.Err(err).Msg("failed to get node real IP")
continue
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always correct, if you failed to get the real ip this can duo to different reasons. and they all need to be handled differently:

  • http.Get fail, this can be duo to name look failure, or failed to establish connection this is hugely mean that the IP is invalid (is not routable to the internet)
  • http.Get succeeded but the response is not with success status (!= 200) this means that u have internet but the check service is donw, in that case we are not sure about the IP, we can mark it as skipped, for now.
  • http.Get succeed but the the returned value does not match the actual IP we are testing means those IPs are natted, so they are also invalid.

the best way to do this is that getRealPublicIP need to return (wrapped) errors with more details on what happened exactly, and then based on that choose the proper state

Comment on lines 398 to 399
}
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably what u want. since u already check the error of the create

Suggested change
}
if err != nil {
} else if err != nil {

@muhamadazmy muhamadazmy merged commit 61cc487 into main Oct 31, 2023
2 checks passed
@muhamadazmy muhamadazmy deleted the pub-ip-task branch October 31, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants