This repo describes a way for provisioning free resources in the Oracle Cloud for tenancies with an Always Free subscription.
- Sign up to Oracle Cloud here.
Choose home region carefully, as it can't be changed in the future, and Always Free tier tenants can't use other regions to provision resources. Prefer regions with multiple availability domains. Check out the list here.
-
Download Terraform CLI and make it available in path. Check terraform is installed with
terraform -v
. -
Create Oracle API signing key pair. To do this, login to your Oracle Cloud account, and go to User Settings -> API Keys.
Then select Generate API Key Pair. Download both public and private keys and put it to .oci
folder in your home dir (e.g. ~/.oci
or %USERPROFILE%\.oci
). Click Add.
Note configuration keys from the displayed snippet. It will be used to configure Terraform oci
provider.
To access the configuration snippet later, press 3 dots button next to key's fingerprint -> View Configuration file.
- Prepare ssh keypair.
To generate one, go to .ssh
folder in your home dir (mkdir -p ~/.ssh && cd ~/.ssh
or mkdir %USERPROFILE%\.ssh & cd %USERPROFILE%\.ssh
), and then call ssh-keygen
.
On Windows it's easier to switch to WSL (run bash
in .ssh
dir) to generate the key pair.
ssh-keygen -t ed25519 -N "" -b 2048 -C ssh-key -f id_ed25519
- Clone the repo.
git clone https://github.com/egorshulga/oci-always-free-k8s && cd oci-always-free-k8s
- Copy the file
variables.auto.tfvars.example
intovariables.auto.tfvars
. Set correct values for all of the variables (please note backslash in paths should be escaped:\ -> \\
).
Variable | Example | Description |
---|---|---|
region | uk-london-1 | Taken from Oracle Cloud API Key configuration snippet (see item 3 above). |
tenancy_ocid | ocid1.tenancy.oc1... | |
user_ocid | ocid1.user.oc1... | |
fingerprint | 60:...:c3 | |
private_key_path | C:\Users\...\.oci\oci-tf.pem | Absolute path to Oracle Cloud private key. |
ssh_key_pub_path | C:\Users\...\.ssh\id_ed25519.pub | Absolute path to public ssh key. Is used to configure access to created compute instances. |
ssh_key_path | C:\Users\...\.ssh\id_ed25519 | Absolute path to private ssh key. Is used to bootstrap k8s and other apps on provisioned compute instances. |
cluster_public_dns_name | cluster.example.com | Optional. Specifies a dns name, which the cluster will be available on. |
letsencrypt_registration_email | email@example.com | Email address, that is used to register in LetsEncrypt (to issue certificates to secure ingress resources, managed by nginx-ingress-controller). |
windows_overwrite_local_kube_config | false | Whether local kube config (%USERPROFILE%\.kube\config) should be overwritten with a new one from the newly created cluster. |
debug_create_cluster_admin | false | Whether admin should be created in the cluster and its token printed to output (to access dashboard right after cluster creation). |
- Run Terraform.
terraform init
terraform apply
Terraform displays a list of changes it is going to apply to resources. Check it carefully, and then answer yes
.
Example output
> terraform init
Initializing modules...
- compute in compute
- governance in governance
- k8s in k8s
- k8s_scaffold in k8s-scaffold
- network in network
Downloading registry.terraform.io/oracle-terraform-modules/vcn/oci 3.1.0 for network.vcn...
- network.vcn in .terraform\modules\network.vcn
- network.vcn.drg_from_vcn_module in .terraform\modules\network.vcn\modules\drg
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/null from the dependency lock file
- Reusing previous version of hashicorp/oci from the dependency lock file
- Installing hashicorp/null v3.1.0...
- Installed hashicorp/null v3.1.0 (signed by HashiCorp)
- Installing hashicorp/oci v4.57.0...
- Installed hashicorp/oci v4.57.0 (signed by HashiCorp)
Terraform has made some changes to the provider dependency selections recorded
in the .terraform.lock.hcl file. Review those changes and commit them to your
version control system if they represent changes you intended to make.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
> terraform apply
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create
<= read (data resources)
Terraform will perform the following actions:
(... lots of resources ...)
Plan: 41 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ cluster_public_ip = (known after apply)
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value:
-
Open output value of
cluster_public_ip
in browser. Nginx should show page 404. -
If you've set up a public dns name, go to
http://{cluster_public_address}/dashboard
. It should redirect to https and open a k8s dashboard login page. Https connection should be established successfully, browser should show a secure lock icon in address bar, meaning that a certificate is correctly issued by LetsEncrypt. -
Run
kubectl cluster-info && kubectl get nodes
Example output
Kubernetes control plane is running at https://cluster.example.com:6443
CoreDNS is running at https://cluster.example.com:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
NAME STATUS ROLES AGE VERSION
leader Ready control-plane,master 25h v1.23.1
worker-0 Ready worker 25h v1.23.1
worker-1 Ready worker 25h v1.23.1
worker-2 Ready worker 25h v1.23.1
- SSH to the leader instance
Example output
> ssh ubuntu@{cluster-public-ip}
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1022-oracle aarch64)
This is a leader instance, which was provisioned by Terraform
ubuntu@leader:~$
- SSH to worker instances. This can be achieved by connecting to workers via leader instance, which acts as a bastion.
Example output
> ssh -J ubuntu@{cluster-public-ip} ubuntu@worker-0.private.vcn.oraclevcn.com
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1022-oracle aarch64)
This is a worker instance, which was provisioned by Terraform
ubuntu@worker-0:~$
Below you can see a list of Oracle Cloud resources, that are provisioned as a result of applying the scripts. Limits are provided for reference, they are up to date as of January 14, 2022.
Please note that if you already have some resources in your tenancy, then the scripts may fail due to limits imposed by Oracle. You may need to change some resources values (e.g. change count of provisioned workers in main.tf).
Module (as in source code) |
Resource | Used Count | Service Limit | Description |
---|---|---|---|---|
Compartment | 1 | 1000 | Separate compartment is created to hold all of the provisioned resources. | |
Network | VCN | 1 | 50 | Compute instances are connected to a Virtual Cloud Network. |
Subnet | 1 | 300 (per VCN) | VCN is configured to have a public subnet. | |
Network Load Balancer | 1 | 3 | Network Load Balancer serves as an entry point for requests coming to the cluster. It works on OSI layers 3/4, and it has no bandwidth configuration requirement. It is connected to a public subnet. | |
Reserved Public IP | 1 | 1 | Reserved public IP is assigned to a Network Load Balancer. | |
Ephemeral Public IP | 4 | 1 (per VNIC), 2 (per VM) | Ephemeral public IPs are assigned to VMs. | |
Internet Gateway | 1 | 1 (per VCN) | Internet gateway enables internet connectivity for resources in a public subnet. | |
NAT Gateway | 0 | 0 | NAT gateway enables outbound internet connectivity for resources in a private subnet. It is not available in Always Free tier (as of January 2022). | |
Service Gateway | 0 | 0 | Service gateway enables private subnet resources to access Oracle infrastructure (e.g. for metrics collection). It is not available in Always Free tier (as of January 2022). | |
Compute | Cores for Standard.A1 VMs | 4 | 4 | Provisioned resources include 4 ARM-based VMs. Each one has 1 OCPU. Leader instance has 2 GB of memory. There are 3 workers, each one has 7 GB of memory. |
Memory for Standard.A1 VMs | 24 | 24 |
As of January 2022 Oracle does not allow creation of NAT and Service gateways in VCNs, which makes private subnets effectively unusable (as without a NAT gateway they cannot access the internet, and without Service gateway Oracle cannot collect metrics from instances).
That is why in the Always Free tier private subnet is not created. Instead, all compute resources are connected to a public subnet. To allow connections to the Internet, they are assigned with ephemeral public IPs.
Load balancer is assigned with a reserved public IP, so all of the traffic is still balanced between workers.
When the account is switched from the Always Free tier to Pay-as-you-go, the limitation is removed, which allows us to provision proper private subnet, and to hide compute instances from being directly accesible from the internet.
The script provisions a K8s cluster on the leader and worker VMs. Below you can see a list of resources that are available in the K8s cluster once it is provisioned.
Resource | Name | Notes |
---|---|---|
Network plugin | Flannel | |
Ingress controller | kubernetes/ingress-nginx | Service is deployed via a NodePort (see ports below) |
ClusterIssuer | LetsEncrypt | Uses cert-manager |
Dashboard | kubernetes/dashboard | Available on https://{cluster-ip}/dashboard/ or https://{cluster-dns-name}/dashboard/ (the latter uses a LetsEncrypt certificate) |
As all of the compute instances are connected to a private subnet, it required a NAT gateway for outbound internet connections. There are no egress security rules imposed (outgoing connections are allowed to go to 0.0.0.0/0
).
Ingress connectivity is achieved via Network Load Balancer, which is available from the internet via public IP. Below there is a list of open ports. There are no security rules to limit source IPs (incoming connections are allowed to originate from 0.0.0.0/0
).
Port | Protocol | Destination | Destination port | Description |
---|---|---|---|---|
22 | TCP | Leader instance | 22 | SSH to leader. Can also be used to connect to worker instances (using the leader as a bastion). |
6443 | TCP | 6443 | Kubectl to K8s control plane (deployed on a leader instance). Kube config is pulled after spinning up the control plane | |
80 | TCP | Workers | 30080 | Forwarding HTTP and HTTPS traffic to NGINX ingress-controller, which is exposed with NodePorts on worker instances. HTTPS offloading is performed by ingress-controller via LetsEncrypt's issued certificate. |
443 | TCP | 30443 |
This error means that Oracle has run out of free ARM compute resources in selected region.
Possible workaround could be to switch to another availability domain for provisioning compute resources (see main.tf), or to retry cluster provisioning in some days (as Oracle promises to deploy new capacity over time).
That's a tricky error to debug, but my guess is that we create lots of resources under the Network Load Balancer (listeners, backend sets, backends). Oracle Cloud creates it sequentially one-by-one. And it appears that sometimes there could be a race condition happening on the Oracle's side (multiple NLB resources compete to be created), which results in the error.
Workaround for this error is to manually retry terraform apply
command once again. Terraform will continue resources provisioning from the point where it stopped.