Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

gdemonet · 2019-03-29T08:13:42Z

The current multi-nodes deployment in Eve is done as follows (roughly):

We spawn an Openstack worker, later named the Bastion
From the Bastion, we run a Terraform plan to spawn a Bootstrap node, with a few other Nodes for cluster expansion

The Bastion is thus responsible for creating the networks used by Bootstrap and the other Nodes. However, contrary to the local Vagrant configuration, the Bastion cannot attach itself to these networks. When we run the test suite remotely from the Bastion (as we would locally, outside of the Vagrant VMs), it thus cannot use the IPs from said networks, which will prevent the tests to access the deployed services through HTTP(S) (currently, most of the tests use SSH to run kubectl directly on the bootstrap node).

Exposing such networks to the Bastion should be done as soon as possible to avoid adding too many workarounds in our test suite.

The text was updated successfully, but these errors were encountered:

nootal · 2019-03-29T08:52:06Z

Another option, maybe technically easier than exposing internal networks to the CI worker, could be by creating a true bastion/client that is part of the internal networks, but not part of the k8s cluster.

So the CI worker would have to create another machine, integrate it into the k8s networks, and execute the test suite from it. That would mean cloning the repo (or at least the test suite) on this new VM, and installing dependencies.

cc @gdemonet

gdemonet · 2019-03-29T09:30:29Z

That is a solution indeed, but it means adding a bunch of ssh and scp steps to run tests and retrieve their results for pushing into artifacts. I'll try to setup an IP-IP tunnel, propose a PR, and if it's deemed readable enough by the team, merge it.

The biggest advantage of such an approach is the similarity with our local Vagrant deployment (no ssh involved).

We only generated one for the bootstrap node, but we now also need to SSH into the router node. A simple Bash script is introduced to generate the SSH config file (may be rewritten in Python when we extract this Terraform tooling for use outside of our CI context). Issue: GH-889

We wanted to group all IP addresses under a single output variable, so that one could `terraform output ips` to get a clear view of what was spawned (and so we could generate a SSH config file for accessing any of the spawned VMs by name). However, the "splat" syntax we wanted to use is not supported in Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and router) for now. For reference: hashicorp/terraform#17048 Issue: GH-889

We only generated one for the bootstrap node, but we now also need to SSH into the router node. A simple Bash script is introduced to generate the SSH config file (may be rewritten in Python when we extract this Terraform tooling for use outside of our CI context). Issue: GH-889

We wanted to group all IP addresses under a single output variable, so that one could `terraform output ips` to get a clear view of what was spawned (and so we could generate a SSH config file for accessing any of the spawned VMs by name). However, the "splat" syntax we wanted to use is not supported in Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and router) for now. For reference: hashicorp/terraform#17048 Issue: GH-889

We only generated one for the bootstrap node, but we now also need to SSH into the router node. A simple Bash script is introduced to generate the SSH config file (may be rewritten in Python when we extract this Terraform tooling for use outside of our CI context). Issue: GH-889

We wanted to group all IP addresses under a single output variable, so that one could `terraform output ips` to get a clear view of what was spawned (and so we could generate a SSH config file for accessing any of the spawned VMs by name). However, the "splat" syntax we wanted to use is not supported in Terraform <= 0.11, so we just ignore the nodes (other than bootstrap and router) for now. For reference: hashicorp/terraform#17048 Issue: GH-889

Previously, we only defined a single "private" network for the spawned nodes in Terraform. We now define one for control-plane and one for workload-plane, and attach all nodes to them. This change impacts the `BootstrapConfiguration` we shipped with this Terraform deployment. Issue: GH-889

We previously considered the OpenStack worker from Eve as the bastion from which to orchestrate multi-nodes deployments in the CI. However, since we cannot attach this worker to the private networks it deployed, we introduce another VM, which we call "bastion", that will assume (in the long-term) the responsibility of installing and testing the product. Issue: GH-889

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

Issue: GH-889

This approach naively copies test files from the worker to the bastion, which we may not want to do in the long run. In the meantime, this allows tests to work the same way in single and multi-nodes deployments. Issue: GH-889

When voluntarily interrupting a formula, we use the "test.fail_without_changes" state. To pass extra information, we used the "msg" keyword, which is not valid. Instead, we now use the "comment" keyword. This was detected when the formulas could not find available IPs, and the state return dict didn't show the expected information message. Issue: GH-889

Previously, we only defined a single "private" network for the spawned nodes in Terraform. We now define one for control-plane and one for workload-plane, and attach all nodes to them. This change impacts the `BootstrapConfiguration` we shipped with this Terraform deployment. Issue: GH-889

We previously considered the OpenStack worker from Eve as the bastion from which to orchestrate multi-nodes deployments in the CI. However, since we cannot attach this worker to the private networks it deployed, we introduce another VM, which we call "bastion", that will assume (in the long-term) the responsibility of installing and testing the product. Issue: GH-889

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

Issue: GH-889

This approach naively copies test files from the worker to the bastion, which we may not want to do in the long run. In the meantime, this allows tests to work the same way in single and multi-nodes deployments. Issue: GH-889

When voluntarily interrupting a formula, we use the "test.fail_without_changes" state. To pass extra information, we used the "msg" keyword, which is not valid. Instead, we now use the "comment" keyword. This was detected when the formulas could not find available IPs, and the state return dict didn't show the expected information message. Issue: GH-889

To guard us against leaving hard-coded values, we should make sure to keep the Terraform and Vagrant environments different in as many constants as possible (network ranges, hostnames, mountpoints...). Issue: GH-889

Previously, we only defined a single "private" network for the spawned nodes in Terraform. We now define one for control-plane and one for workload-plane, and attach all nodes to them. This change impacts the `BootstrapConfiguration` we shipped with this Terraform deployment. Issue: GH-889

We previously considered the OpenStack worker from Eve as the bastion from which to orchestrate multi-nodes deployments in the CI. However, since we cannot attach this worker to the private networks it deployed, we introduce another VM, which we call "bastion", that will assume (in the long-term) the responsibility of installing and testing the product. Issue: GH-889

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

Issue: GH-889

This approach naively copies test files from the worker to the bastion, which we may not want to do in the long run. In the meantime, this allows tests to work the same way in single and multi-nodes deployments. Issue: GH-889

To guard us against leaving hard-coded values, we should make sure to keep the Terraform and Vagrant environments different in as many constants as possible (network ranges, hostnames, mountpoints...). Issue: GH-889

We also extract the few dhclient calls to a script and attempt to retry until an IP is found (since we had flaky situations where an IP could be missing). Issue: GH-889

The options "--bootstrap-ip" and "--skip-tls-verify" were only introduced to cope with the limitations of our multi-nodes deployment in CI. It should now be fixed. Issue: GH-889

gdemonet · 2019-04-16T13:07:20Z

Closed by #988

gdemonet added moonshot topic:ci Continuous integration and build orchestration kind:debt Technical debt labels Mar 29, 2019

gdemonet mentioned this issue Mar 29, 2019

Add UI first test #871

Merged

1 task

gdemonet self-assigned this Apr 1, 2019

nootal added this to the MetalK8s 2.0.0-alpha3 milestone Apr 2, 2019

gdemonet added a commit that referenced this issue Apr 11, 2019

Generate SSH config files from Terraform

881d956

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

gdemonet added a commit that referenced this issue Apr 11, 2019

Generate and share bastion SSH keypair

b188144

Issue: GH-889

gdemonet added a commit that referenced this issue Apr 11, 2019

Generate SSH config files from Terraform

ba4484d

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

gdemonet added a commit that referenced this issue Apr 11, 2019

Generate and share bastion SSH keypair

62fea70

Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Generate SSH config files from Terraform

3b488e8

These files (one for the worker, on for the bastion) need data from the Terraform state, so we decided to generate them using Terraform templating. Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Generate and share bastion SSH keypair

6512eb1

Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Clean script-based provisioning in Terraform

c19609e

We also extract the few dhclient calls to a script and attempt to retry until an IP is found (since we had flaky situations where an IP could be missing). Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Clean script-based provisioning in Terraform

2c4f083

We also extract the few dhclient calls to a script and attempt to retry until an IP is found (since we had flaky situations where an IP could be missing). Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Clean script-based provisioning in Terraform

30fd9bd

We also extract the few dhclient calls to a script and attempt to retry until an IP is found (since we had flaky situations where an IP could be missing). Issue: GH-889

gdemonet added a commit that referenced this issue Apr 15, 2019

Remove unneeded Pytest CLI options

e528597

The options "--bootstrap-ip" and "--skip-tls-verify" were only introduced to cope with the limitations of our multi-nodes deployment in CI. It should now be fixed. Issue: GH-889

gdemonet closed this as completed Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

gdemonet commented Mar 29, 2019

nootal commented Mar 29, 2019

gdemonet commented Mar 29, 2019

gdemonet commented Apr 16, 2019

Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

Expose control- and workload-plane networks to bastion VM in multi-nodes deployment #889

Comments

gdemonet commented Mar 29, 2019

nootal commented Mar 29, 2019

gdemonet commented Mar 29, 2019

gdemonet commented Apr 16, 2019