Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move backup config from ws to operator #89

Merged
merged 2 commits into from
Feb 3, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ local: image push
# Same as above - image still needs to be built and pushed/loaded
.PHONY: deploy
deploy: image push
$(BUILD_BIN) install --context $(KUBECTX) --local --image $(IMG):$(TAG) $(DEPLOY_FLAGS)
$(BUILD_BIN) install --context $(KUBECTX) --local --image $(IMG):$(TAG) $(DEPLOY_FLAGS) \
--backup-provider=gcs --gcs-bucket=$(BACKUP_BUCKET)

# Tail operator logs
.PHONY: logs
Expand Down
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,15 +122,19 @@ Terraform state is stored in a secret using the [kubernetes backend](https://www

Note: Do not define a backend in your terraform configuration - it will conflict with the configuration Etok automatically installs.

### State Persistence
### State Backup and Restore

Persistence of state to cloud storage is supported. If enabled, every update to the state is backed up to a cloud storage bucket.
Backup of state to cloud storage is supported. If enabled, every update to state is backed up to a cloud storage bucket. When a new workspace is created, the operator checks if a backup exists. If so, it is restored.

To enable persistence, pass the name of an existing bucket via the `--backup-bucket` flag when creating a new workspace with `workspace new`. If the secret storing the state cannot be found, the workspace checks if a backup exists in the bucket. If found, it restores the state to the secret.
To enable backups, install the operator with the relevant flags. For example, to backup to a GCS bucket:

```
etok install --backup-provider=gcs --gcs-bucket=backups-bucket
```

Note: only GCS is supported at present.

The operator is responsible for persisting the state. Therefore be sure to provide the appropriate credentials to the operator at install time. Either provide the path to a file containing a GCP service account key via the `--secret-file` flag, or setup workload identity (see below). The service account needs the following permissions on the bucket:
Be sure to provide the appropriate credentials to the operator at install time. Either provide the path to a file containing a GCP service account key via the `--secret-file` flag, or setup workload identity (see below). The service account needs the following permissions on the bucket:

```
storage.buckets.get
Expand All @@ -139,6 +143,8 @@ storage.objects.delete
storage.objects.get
```

To opt a workspace out of automatic backup and restore, pass the `--ephemeral` flag when creating a new workspace with `workspace new`. This is useful if you intend for your workspace to be short-lived.

## Credentials

Etok looks for credentials in a secret named `etok`. If found, the credentials contained within are made available to terraform as environment variables.
Expand Down
7 changes: 3 additions & 4 deletions api/etok.dev/v1alpha1/workspace_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,9 @@ type WorkspaceSpec struct {
// Variables as inputs to module
Variables []*Variable `json:"variables,omitempty"`

// +kubebuilder:validation:Pattern=`^[0-9a-z][0-9a-z\-_]{0,61}[0-9a-z]$`

// GCS bucket to which to backup state file
BackupBucket string `json:"backupBucket,omitempty"`
// Ephemeral turns off state backup (and restore) - intended for short-lived
// workspaces.
Ephemeral bool `json:"ephemeral,omitempty"`
}

// WorkspaceSpec defines the desired state of Workspace's cache storage
Expand Down
26 changes: 11 additions & 15 deletions cmd/install/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,26 +33,17 @@ func WithAnnotations(annotations map[string]string) podTemplateOption {
}
}

func WithEnvFromSecretKey(varName, secret, key string) podTemplateOption {
func WithSecret(secretPresent bool) podTemplateOption {
return func(c *podTemplateConfig) {
c.envVars = append(c.envVars, corev1.EnvVar{
Name: varName,
ValueFrom: &corev1.EnvVarSource{
SecretKeyRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{
Name: secret,
},
Key: key,
},
},
})
c.withSecret = secretPresent

}
}

func WithSecret(secretPresent bool) podTemplateOption {
func WithGCSProvider(bucket string) podTemplateOption {
return func(c *podTemplateConfig) {
c.withSecret = secretPresent

c.envVars = append(c.envVars, corev1.EnvVar{Name: "ETOK_BACKUP_PROVIDER", Value: "gcs"})
c.envVars = append(c.envVars, corev1.EnvVar{Name: "ETOK_GCS_BUCKET", Value: bucket})
}
}

Expand Down Expand Up @@ -115,6 +106,11 @@ func deployment(namespace string, opts ...podTemplateOption) *appsv1.Deployment
},
}

// Add environment variables to container
for _, ev := range c.envVars {
deployment.Spec.Template.Spec.Containers[0].Env = append(deployment.Spec.Template.Spec.Containers[0].Env, ev)
}

// Label selector for operator pod. It must match the pod template's
// labels.
selector := labels.MakeLabels(
Expand Down
15 changes: 15 additions & 0 deletions cmd/install/deployment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,21 @@ func TestDeployment(t *testing.T) {
})
},
},
{
name: "with backup enabled",
namespace: "default",
opts: []podTemplateOption{WithGCSProvider("backups-bucket")},
assertions: func(deploy *appsv1.Deployment) {
assert.Contains(t, deploy.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{
Name: "ETOK_BACKUP_PROVIDER",
Value: "gcs",
})
assert.Contains(t, deploy.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{
Name: "ETOK_GCS_BUCKET",
Value: "backups-bucket",
})
},
},
}
for _, tt := range tests {
testutil.Run(t, tt.name, func(t *testutil.T) {
Expand Down
42 changes: 41 additions & 1 deletion cmd/install/install.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package install

import (
"context"
"errors"
"fmt"
"io/ioutil"
"net/http"
Expand Down Expand Up @@ -53,6 +54,8 @@ var (

// Interval between polling deployment status
interval = time.Second

errInvalidBackupConfig = errors.New("invalid backup config")
)

type installOptions struct {
Expand Down Expand Up @@ -83,6 +86,12 @@ type installOptions struct {

// Print out resources and don't install
dryRun bool

// Toggle state backups
backupProviderName string

// GCS backup bucket
gcsBucket string
}

func InstallCmd(f *cmdutil.Factory) (*cobra.Command, *installOptions) {
Expand All @@ -95,6 +104,10 @@ func InstallCmd(f *cmdutil.Factory) (*cobra.Command, *installOptions) {
Use: "install",
Short: "Install etok operator",
RunE: func(cmd *cobra.Command, args []string) (err error) {
if err := o.validateBackupOptions(); err != nil {
return err
}

o.Client, err = o.CreateRuntimeClient(o.kubeContext)
if err != nil {
return err
Expand All @@ -119,9 +132,27 @@ func InstallCmd(f *cmdutil.Factory) (*cobra.Command, *installOptions) {
cmd.Flags().StringToStringVar(&o.serviceAccountAnnotations, "sa-annotations", map[string]string{}, "Annotations to add to the etok ServiceAccount. Add iam.gke.io/gcp-service-account=[GSA_NAME]@[PROJECT_NAME].iam.gserviceaccount.com for workload identity")
cmd.Flags().BoolVar(&o.crdsOnly, "crds-only", o.crdsOnly, "Only generate CRD resources. Useful for updating CRDs for an existing Etok install.")

cmd.Flags().StringVar(&o.backupProviderName, "backup-provider", "", "Enable backups specifying a provider (only 'gcs' supported currently)")

cmd.Flags().StringVar(&o.gcsBucket, "gcs-bucket", "", "Specify GCS bucket for terraform state backups")

return cmd, o
}

func (o *installOptions) validateBackupOptions() error {
if o.backupProviderName != "" {
if o.backupProviderName != "gcs" {
return fmt.Errorf("%w: %s is invalid value for --backup-provider, valid options are: gcs", errInvalidBackupConfig, o.backupProviderName)
}
}

if (o.backupProviderName == "" && o.gcsBucket != "") || (o.backupProviderName != "" && o.gcsBucket == "") {
return fmt.Errorf("%w: you must specify both --backup-provider and --gcs-bucket", errInvalidBackupConfig)
}

return nil
}

func (o *installOptions) install(ctx context.Context) error {
var deploy *appsv1.Deployment
var resources []runtimeclient.Object
Expand Down Expand Up @@ -150,7 +181,16 @@ func (o *installOptions) install(ctx context.Context) error {
resources = append(resources, serviceAccount(o.namespace, o.serviceAccountAnnotations))

secretPresent := o.secretFile != ""
deploy = deployment(o.namespace, WithSecret(secretPresent), WithImage(o.image))

// Deploy options
dopts := []podTemplateOption{}
dopts = append(dopts, WithSecret(secretPresent))
dopts = append(dopts, WithImage(o.image))
if o.backupProviderName == "gcs" {
dopts = append(dopts, WithGCSProvider(o.gcsBucket))
}

deploy = deployment(o.namespace, dopts...)
resources = append(resources, deploy)

if o.secretFile != "" {
Expand Down
40 changes: 38 additions & 2 deletions cmd/install/install_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package install
import (
"bytes"
"context"
"errors"
"fmt"
"io"
"io/ioutil"
Expand Down Expand Up @@ -38,7 +39,7 @@ func TestInstall(t *testing.T) {
name string
args []string
objs []runtimeclient.Object
err bool
err error
assertions func(*testutil.T, runtimeclient.Client)
}{
{
Expand Down Expand Up @@ -76,6 +77,31 @@ func TestInstall(t *testing.T) {
assert.Equal(t, "bugsbunny:v123", d.Spec.Template.Spec.Containers[0].Image)
},
},
{
name: "fresh install with backups enabled",
args: []string{"install", "--wait=false", "--backup-provider=gcs", "--gcs-bucket=backups-bucket"},
assertions: func(t *testutil.T, client runtimeclient.Client) {
var d = deploy()
client.Get(context.Background(), runtimeclient.ObjectKeyFromObject(d), d)
assert.Contains(t, d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "ETOK_BACKUP_PROVIDER", Value: "gcs"})
assert.Contains(t, d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "ETOK_GCS_BUCKET", Value: "backups-bucket"})
},
},
{
name: "missing backup bucket name",
args: []string{"install", "--wait=false", "--backup-provider=gcs"},
err: errInvalidBackupConfig,
},
{
name: "missing backup provider name",
args: []string{"install", "--wait=false", "--gcs-bucket=backups-bucket"},
err: errInvalidBackupConfig,
},
{
name: "invalid backup provider name",
args: []string{"install", "--wait=false", "--backup-provider=alibaba-cloud-blob"},
err: errInvalidBackupConfig,
},
}
for _, tt := range tests {
testutil.Run(t, tt.name, func(t *testutil.T) {
Expand Down Expand Up @@ -103,7 +129,17 @@ func TestInstall(t *testing.T) {
// Override wait interval to ensure fast tests
t.Override(&interval, 10*time.Millisecond)

t.CheckError(tt.err, cmd.ExecuteContext(context.Background()))
// Run command and assert returned error is either nil or wraps
// expected error
err := cmd.ExecuteContext(context.Background())
if !assert.True(t, errors.Is(err, tt.err)) {
t.Errorf("unexpected error: %w", err)
t.FailNow()
}
if err != nil {
// Expected error occurred; there's no point in continuing
return
}

// get runtime client now that it's been created
client := opts.RuntimeClient
Expand Down
24 changes: 24 additions & 0 deletions cmd/manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (

"github.com/leg100/etok/cmd/flags"
cmdutil "github.com/leg100/etok/cmd/util"
"github.com/leg100/etok/pkg/backup"
"github.com/leg100/etok/pkg/controllers"
"github.com/leg100/etok/pkg/scheme"
"github.com/leg100/etok/pkg/version"
Expand Down Expand Up @@ -38,6 +39,12 @@ type ManagerOptions struct {
EnableLeaderElection bool

args []string

// Toggle state backups
backupProviderName string

// GCS backup bucket
gcsBucket string
}

func ManagerCmd(f *cmdutil.Factory) *cobra.Command {
Expand Down Expand Up @@ -69,11 +76,24 @@ func ManagerCmd(f *cmdutil.Factory) *cobra.Command {

klog.V(0).Info("Runner image: " + o.Image)

var backupProvider backup.Provider
if o.backupProviderName != "" {
switch o.backupProviderName {
case "gcs":
backupProvider, err = backup.NewGCSProvider(cmd.Context(), o.gcsBucket, nil)
if err != nil {
return err
}
}
}

// Setup workspace ctrl with mgr
workspaceReconciler := controllers.NewWorkspaceReconciler(
mgr.GetClient(),
o.Image,
controllers.WithBackupProvider(backupProvider),
controllers.WithEventRecorder(mgr.GetEventRecorderFor("workspace-controller")))

if err := workspaceReconciler.SetupWithManager(mgr); err != nil {
return fmt.Errorf("unable to create workspace controller: %w", err)
}
Expand Down Expand Up @@ -102,5 +122,9 @@ func ManagerCmd(f *cmdutil.Factory) *cobra.Command {
"Enabling this will ensure there is only one active controller manager.")
cmd.Flags().StringVar(&o.Image, "image", version.Image, "Docker image used for both the operator and the runner")

cmd.Flags().StringVar(&o.backupProviderName, "backup-provider", "", "Enable backups specifying a provider")

cmd.Flags().StringVar(&o.gcsBucket, "gcs-bucket", "", "Specify GCS bucket for terraform state backups")

return cmd
}
14 changes: 5 additions & 9 deletions cmd/workspace/workspace_new.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,8 @@ type newOptions struct {
// Timeout for workspace pod to be ready
podTimeout time.Duration

// Timeout for workspace restore failure condition to report either true or
// false (did the restore fail or not?).
restoreTimeout time.Duration
// Timeout for workspace ready condition to be true
readyTimeout time.Duration

// Disable default behaviour of deleting resources upon error
disableResourceCleanup bool
Expand All @@ -77,9 +76,6 @@ type newOptions struct {
variables map[string]string
environmentVariables map[string]string

// backupBucket is the bucket to which the state file will backed up to
backupBucket string

etokenv *env.Env
}

Expand Down Expand Up @@ -131,15 +127,15 @@ func newCmd(f *cmdutil.Factory) (*cobra.Command, *newOptions) {

cmd.Flags().StringVar(&o.workspaceSpec.Cache.Size, "size", defaultCacheSize, "Size of PersistentVolume for cache")
cmd.Flags().StringVar(&o.workspaceSpec.TerraformVersion, "terraform-version", "", "Override terraform version")
cmd.Flags().StringVar(&o.workspaceSpec.BackupBucket, "backup-bucket", "", "Backup state to GCS bucket")
cmd.Flags().BoolVarP(&o.workspaceSpec.Ephemeral, "ephemeral", "e", false, "Disable state backup (and restore)")

// We want nil to be the default but it doesn't seem like pflags supports
// that so use empty string and override later (see above)
o.workspaceSpec.Cache.StorageClass = cmd.Flags().String("storage-class", "", "StorageClass of PersistentVolume for cache")

cmd.Flags().DurationVar(&o.reconcileTimeout, "reconcile-timeout", defaultReconcileTimeout, "timeout for resource to be reconciled")
cmd.Flags().DurationVar(&o.podTimeout, "pod-timeout", defaultPodTimeout, "timeout for pod to be ready")
cmd.Flags().DurationVar(&o.restoreTimeout, "restore-timeout", defaultReadyTimeout, "timeout for restore condition to report back")
cmd.Flags().DurationVar(&o.readyTimeout, "ready-timeout", defaultReadyTimeout, "timeout for ready condition to report true")

cmd.Flags().StringSliceVar(&o.workspaceSpec.PrivilegedCommands, "privileged-commands", []string{}, "Set privileged commands")

Expand Down Expand Up @@ -294,7 +290,7 @@ func (o *newOptions) waitForReady(ctx context.Context, ws *v1alpha1.Workspace) e
lw := &k8s.WorkspaceListWatcher{Client: o.EtokClient, Name: ws.Name, Namespace: ws.Namespace}
hdlr := handlers.WorkspaceReady()

ctx, cancel := context.WithTimeout(ctx, o.restoreTimeout)
ctx, cancel := context.WithTimeout(ctx, o.readyTimeout)
defer cancel()

_, err := watchtools.UntilWithSync(ctx, lw, &v1alpha1.Workspace{}, nil, hdlr)
Expand Down
7 changes: 3 additions & 4 deletions cmd/workspace/workspace_new_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -237,12 +237,11 @@ func TestNewWorkspace(t *testing.T) {
err: handlers.ErrWorkspaceFailed,
},
{
name: "restore timeout exceeded",
args: []string{"foo", "--backup-bucket", "my-bucket", "--restore-timeout", "100ms"},
name: "ready timeout exceeded",
args: []string{"foo", "--ready-timeout", "100ms"},
objs: []runtime.Object{testobj.WorkspacePod("default", "foo")},
overrideStatus: func(status *v1alpha1.WorkspaceStatus) {
// Mock operator failing to provide restoreFailure condition
// status
// Mock operator failing to provide ready condition status
status.Conditions = nil
},
err: errReadyTimeout,
Expand Down
Loading