Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

Closed
gdemonet opened this issue Mar 29, 2019 · 3 comments
Assignees
Labels
kind:debt Technical debt topic:ci Continuous integration and build orchestration

Comments

@gdemonet
Copy link
Contributor

The current multi-nodes deployment in Eve is done as follows (roughly):

  1. We spawn an Openstack worker, later named the Bastion
  2. From the Bastion, we run a Terraform plan to spawn a Bootstrap node, with a few other Nodes for cluster expansion

The Bastion is thus responsible for creating the networks used by Bootstrap and the other Nodes. However, contrary to the local Vagrant configuration, the Bastion cannot attach itself to these networks. When we run the test suite remotely from the Bastion (as we would locally, outside of the Vagrant VMs), it thus cannot use the IPs from said networks, which will prevent the tests to access the deployed services through HTTP(S) (currently, most of the tests use SSH to run kubectl directly on the bootstrap node).

Exposing such networks to the Bastion should be done as soon as possible to avoid adding too many workarounds in our test suite.

@gdemonet gdemonet added moonshot topic:ci Continuous integration and build orchestration kind:debt Technical debt labels Mar 29, 2019
@gdemonet gdemonet mentioned this issue Mar 29, 2019
1 task
@nootal
Copy link
Contributor

nootal commented Mar 29, 2019

Another option, maybe technically easier than exposing internal networks to the CI worker, could be by creating a true bastion/client that is part of the internal networks, but not part of the k8s cluster.

So the CI worker would have to create another machine, integrate it into the k8s networks, and execute the test suite from it. That would mean cloning the repo (or at least the test suite) on this new VM, and installing dependencies.

cc @gdemonet

@gdemonet
Copy link
Contributor Author

That is a solution indeed, but it means adding a bunch of ssh and scp steps to run tests and retrieve their results for pushing into artifacts. I'll try to setup an IP-IP tunnel, propose a PR, and if it's deemed readable enough by the team, merge it.

The biggest advantage of such an approach is the similarity with our local Vagrant deployment (no ssh involved).

@gdemonet gdemonet self-assigned this Apr 1, 2019
@nootal nootal added this to the MetalK8s 2.0.0-alpha3 milestone Apr 2, 2019
gdemonet added a commit that referenced this issue Apr 9, 2019
We only generated one for the bootstrap node, but we now also need to
SSH into the router node.

A simple Bash script is introduced to generate the SSH config file (may
be rewritten in Python when we extract this Terraform tooling for use
outside of our CI context).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 9, 2019
We wanted to group all IP addresses under a single output variable, so
that one could `terraform output ips` to get a clear view of what was
spawned (and so we could generate a SSH config file for accessing any of
the spawned VMs by name).

However, the "splat" syntax we wanted to use is not supported in
Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and
router) for now.

For reference: hashicorp/terraform#17048

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 9, 2019
We only generated one for the bootstrap node, but we now also need to
SSH into the router node.

A simple Bash script is introduced to generate the SSH config file (may
be rewritten in Python when we extract this Terraform tooling for use
outside of our CI context).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 9, 2019
We wanted to group all IP addresses under a single output variable, so
that one could `terraform output ips` to get a clear view of what was
spawned (and so we could generate a SSH config file for accessing any of
the spawned VMs by name).

However, the "splat" syntax we wanted to use is not supported in
Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and
router) for now.

For reference: hashicorp/terraform#17048

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 10, 2019
We only generated one for the bootstrap node, but we now also need to
SSH into the router node.

A simple Bash script is introduced to generate the SSH config file (may
be rewritten in Python when we extract this Terraform tooling for use
outside of our CI context).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 10, 2019
We wanted to group all IP addresses under a single output variable, so
that one could `terraform output ips` to get a clear view of what was
spawned (and so we could generate a SSH config file for accessing any of
the spawned VMs by name).

However, the "splat" syntax we wanted to use is not supported in
Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and
router) for now.

For reference: hashicorp/terraform#17048

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
Previously, we only defined a single "private" network for the spawned
nodes in Terraform. We now define one for control-plane and one for
workload-plane, and attach all nodes to them.

This change impacts the `BootstrapConfiguration` we shipped with this
Terraform deployment.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
We previously considered the OpenStack worker from Eve as the bastion
from which to orchestrate multi-nodes deployments in the CI. However,
since we cannot attach this worker to the private networks it deployed,
we introduce another VM, which we call "bastion", that will assume (in
the long-term) the responsibility of installing and testing the product.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
These files (one for the worker, on for the bastion) need data from
the Terraform state, so we decided to generate them using Terraform
templating.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
gdemonet added a commit that referenced this issue Apr 11, 2019
This approach naively copies test files from the worker to the bastion,
which we may not want to do in the long run. In the meantime, this
allows tests to work the same way in single and multi-nodes deployments.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
When voluntarily interrupting a formula, we use the
"test.fail_without_changes" state. To pass extra information, we used
the "msg" keyword, which is not valid. Instead, we now use the "comment"
keyword.

This was detected when the formulas could not find available IPs, and
the state return dict didn't show the expected information message.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
Previously, we only defined a single "private" network for the spawned
nodes in Terraform. We now define one for control-plane and one for
workload-plane, and attach all nodes to them.

This change impacts the `BootstrapConfiguration` we shipped with this
Terraform deployment.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
We previously considered the OpenStack worker from Eve as the bastion
from which to orchestrate multi-nodes deployments in the CI. However,
since we cannot attach this worker to the private networks it deployed,
we introduce another VM, which we call "bastion", that will assume (in
the long-term) the responsibility of installing and testing the product.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
These files (one for the worker, on for the bastion) need data from
the Terraform state, so we decided to generate them using Terraform
templating.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
gdemonet added a commit that referenced this issue Apr 11, 2019
This approach naively copies test files from the worker to the bastion,
which we may not want to do in the long run. In the meantime, this
allows tests to work the same way in single and multi-nodes deployments.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 11, 2019
When voluntarily interrupting a formula, we use the
"test.fail_without_changes" state. To pass extra information, we used
the "msg" keyword, which is not valid. Instead, we now use the "comment"
keyword.

This was detected when the formulas could not find available IPs, and
the state return dict didn't show the expected information message.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 12, 2019
To guard us against leaving hard-coded values, we should make sure to
keep the Terraform and Vagrant environments different in as many
constants as possible (network ranges, hostnames, mountpoints...).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
Previously, we only defined a single "private" network for the spawned
nodes in Terraform. We now define one for control-plane and one for
workload-plane, and attach all nodes to them.

This change impacts the `BootstrapConfiguration` we shipped with this
Terraform deployment.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
We previously considered the OpenStack worker from Eve as the bastion
from which to orchestrate multi-nodes deployments in the CI. However,
since we cannot attach this worker to the private networks it deployed,
we introduce another VM, which we call "bastion", that will assume (in
the long-term) the responsibility of installing and testing the product.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
These files (one for the worker, on for the bastion) need data from
the Terraform state, so we decided to generate them using Terraform
templating.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
gdemonet added a commit that referenced this issue Apr 15, 2019
This approach naively copies test files from the worker to the bastion,
which we may not want to do in the long run. In the meantime, this
allows tests to work the same way in single and multi-nodes deployments.

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
To guard us against leaving hard-coded values, we should make sure to
keep the Terraform and Vagrant environments different in as many
constants as possible (network ranges, hostnames, mountpoints...).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
We also extract the few dhclient calls to a script and attempt to retry
until an IP is found (since we had flaky situations where an IP could
be missing).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
We also extract the few dhclient calls to a script and attempt to retry
until an IP is found (since we had flaky situations where an IP could
be missing).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
We also extract the few dhclient calls to a script and attempt to retry
until an IP is found (since we had flaky situations where an IP could
be missing).

Issue: GH-889
gdemonet added a commit that referenced this issue Apr 15, 2019
The options "--bootstrap-ip" and "--skip-tls-verify" were only
introduced to cope with the limitations of our multi-nodes deployment
in CI. It should now be fixed.

Issue: GH-889
@gdemonet
Copy link
Contributor Author

Closed by #988

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:debt Technical debt topic:ci Continuous integration and build orchestration
Projects
None yet
Development

No branches or pull requests

2 participants