Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Collector and Analyzer to check subnet availability #984

Closed
diamonwiggins opened this issue Jan 26, 2023 · 10 comments · Fixed by #1004
Closed

Host Collector and Analyzer to check subnet availability #984

diamonwiggins opened this issue Jan 26, 2023 · 10 comments · Fixed by #1004
Assignees
Labels
type::feature New feature or request

Comments

@diamonwiggins
Copy link
Member

diamonwiggins commented Jan 26, 2023

Describe the rationale for the suggested feature.

Before container networking components for Kubernetes clusters are installed, it's often necessary to ensure that there is no overlap in subnets between the route table of the machine used for the install and the desired subnets to be used for the container runtime and the container networking interface. We should introduce a host collector and analyzer that allows you to determine if a given subnet is already in use by the host.

Describe the feature

  • A host collector that gathers information from the routing table, checks against the input of the collector and makes a decision if there is an available subnet in the user provided range
  • A host analyzer that checks the result written to the file by the collector
# This example would have the collector check the routing table and make a decision sending the result of `yes/no` to the file
apiVersion: troubleshoot.sh/v1beta2
kind: HostPreflight
metadata:
  name: subnet-available
spec:
  collectors:
    # would output yes/no depending if there is a /22 available in 10.0.0.0/8
    - subnetAvailable:
         CIDRRangeAlloc: "10.0.0.0/8"
         desiredCIDR: "/22"
  analyzers:
    - subnetAvailable:
        outcomes:
          - fail:
              when: "no-subnet-available"
              message: failed to find available subnet
          - pass:
              when: "subnet-available"
              message: available /22 subnet found

Describe alternatives you've considered

Additional context

https://github.com/replicatedhq/kURL/blob/main/kurl_util/cmd/subnet/main.go

@diamonwiggins diamonwiggins added the type::feature New feature or request label Jan 26, 2023
@xavpaice xavpaice moved this from Next to In Progress in Troubleshoot Roadmap Feb 2, 2023
@CpuID
Copy link
Contributor

CpuID commented Feb 5, 2023

Just thinking through the big picture view of this collector/analyzer, let's take the below scenario assuming the following input:

  collectors:
    # would output yes/no depending if there is a /22 available in 10.0.0.0/8
    - subnetAvailable:
         CIDRRangeAlloc: "10.0.0.0/8"
         desiredCIDR: "/22"
  analyzers:
    - subnetAvailable:
        outcomes:
          - fail:
              when: "no-subnet-available"
              message: failed to find available subnet
          - pass:
              when: "subnet-available"
              message: available /22 subnet found

It feels to me like there would be 2 different things potentially selecting a subnet...? And they may not necessarily pick the same subnet? Just thinking through the likelihood of race conditions here, is there a path to say "the host preflight chose this subnet, kURL use this subnet" when it gets further along, rather than the 2 different implementations?

@chris-sanders
Copy link
Member

In the case of kURL, I think kURL templates it's preflights and this might be a good use of that. If Troubleshoot allows you to specify the the inputs listed above and kURL then sets them based on user input and defaults for a given install, that should unify the checks. It would be worth checking with @laverya from the kURL team if the above solves their use case as I'm describing or if we need additional considerations to ensure this is useful.

What do you think Andrew, does a spec as shown above work well for the kURL use case or is there more we should consider?

@CpuID
Copy link
Contributor

CpuID commented Feb 6, 2023

If Troubleshoot allows you to specify the the inputs listed above and kURL then sets them based on user input and defaults for a given install, that should unify the checks.

I wonder what comes first: kURL subnet selection, or running host preflights? I'll need to look over the codebase and try see...

@laverya
Copy link
Member

laverya commented Feb 6, 2023

kURL does template preflights, and to the best of my knowledge this occurs before we finalize the cluster subnet, so this would work. It would certainly be good to centralize the 'pick available subnet' logic, but I'm not sure the centralization is worth the effort. (If troubleshoot and kurl find different available subnets, that's fine, as long as both are actually available)

@CpuID
Copy link
Contributor

CpuID commented Feb 6, 2023

(If troubleshoot and kurl find different available subnets, that's fine, as long as both are actually available)

yea that was probably my biggest concern here... the likelihood of race conditions if they find different available subnets, if we end up with code skew down the track etc between 2 implementations.

@laverya
Copy link
Member

laverya commented Feb 6, 2023

As long as the preflight is just checking for the existence of such a subnet, will it matter? As far as I can tell that specific subnet won't ever be exposed to a user of troubleshoot, no?

@CpuID
Copy link
Contributor

CpuID commented Feb 6, 2023

As long as the preflight is just checking for the existence of such a subnet, will it matter?

it might be fine... I could be overthinking it here :)

As far as I can tell that specific subnet won't ever be exposed to a user of troubleshoot, no?

correct 👍

@CpuID
Copy link
Contributor

CpuID commented Feb 15, 2023

I'm going to slightly vary the input syntax here, and go for an integer for desiredCIDR: "/22" instead.

Example: desiredCIDR: 22

@CpuID
Copy link
Contributor

CpuID commented Feb 17, 2023

Slightly varying this also:

"subnet-available"

Becomes:

"a-subnet-is-available"

To clarify that it's not the "whole 10.0.0.0/8" is available, but just a /22 within it. Hoping it leads to less confusion/more common sense from an end user.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Troubleshoot Roadmap Mar 10, 2023
@CpuID
Copy link
Contributor

CpuID commented Mar 10, 2023

Docs PR replicatedhq/troubleshoot.sh#473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type::feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants